How to Handwriting recognition of OCR-completed files

  • 78 Views
  • Last Post 27 September 2018
  • Topic Is Solved
lx11020219 posted this 13 September 2018

If you execute handwriting recognition already using the OCR-completed file, it will not be recognized correctly.
I confirmed that handwriting recognition can be correctly performed on files that are not OCRed.
How do I handle handwriting recognition with OCR-completed files?

I tried setting "CRM_DoNotReuse" with "SourceContentReuseMode" property of "ObjectsExtractionParams" but it was ineffective.

The source is based on the answers for the previous question.

https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/

Order By: Standard | Newest | Votes
Helen Osetrova posted this 14 September 2018

Hello!

 

Please tell us, how do you use the ObjectsExtractionParams object? Do you pass it as a parameter of the IFRDocument::Recognize() method?

 

Could you please also post here your source code and the image to be processed? 

lx11020219 posted this 18 September 2018

Hello.

I attach the source code and corresponding file, so please confirm.

 

Code


// This code is an error 
// (HRESULT Exception:0x80040154 (REGDB_E_CLASSNOTREG))
// //ObjectsExtractionParams objExParam = new ObjectsExtractionParams();
//objExParam.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;
// document.Recognize(null, objExParam);
// document.Synthesize(null);

// This code does not work
DocumentProcessingParams dpp = engine.CreateDocumentProcessingParams();
dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = SourceContentReuseModeEnum.CRM_DoNotReuse;
document.Recognize(null, dpp.PageProcessingParams.ObjectsExtractionParams);
document.Synthesize(null);

 

Attached Files

Helen Osetrova posted this 18 September 2018

Hello,

 

Thank you for the provided information!

 

Please pay your attention to the fact that before recognition you should perform layout analysis or build up the page layout by yourself. Without this step, FineReader Engine will not be able to find any block on the page. Please review the Developer's Help  Guided Tour  Advanced Techniques  Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis article for more information about processing stages.

 

As automatic layout analysis is not supported for handprinted texts, please apply the approach described in the topic https://forum.ocrsdk.com/thread/how-to-recognizing-handprinted-texts/ to add necessary blocks on the page layout manually. After adding the blocks call the IFRPage::RecognizeBlocks() method to recognize them. 

 

Please see below the code snippet which demonstrates adding and recognizing the top left handwritten block of your sample file:

// Get the layout of the first page
FREngine.FRPages pages = document.Pages;
FREngine.FRPage page = pages.Item(0);
FREngine.Layout layout = page.Layout;

// Set the block region
FREngine.Region region = engineLoader.Engine.CreateRegion();
region.AddRect(321, 458, 1040, 524);

// Create a new block
FREngine.IBlock newBlock = layout.Blocks.AddNew(FREngine.BlockTypeEnum.BT_Text, region, 0);
FREngine.TextBlock textBlock = newBlock.GetAsTextBlock();

// Specify the text parameters
typetextBlock.RecognizerParams.TextTypes = (int)FREngine.TextTypeEnum.TT_Handprinted;
textBlock.RecognizerParams.SetPredefinedTextLanguage("Digits");

// Specify the type of marking around the letters
textBlock.RecognizerParams.FieldMarkingType = FREngine.FieldMarkingTypeEnum.FMT_SimpleText;
textBlock.RecognizerParams.WritingStyle = FREngine.WritingStyleEnum.WS_Japanese;

// Replace page in the document with the new one
page.Layout = layout;
document.Pages.DeleteAt(0);
document.AddPage(page);

// Tune the ObjectsExtractionParams object
FREngine.DocumentProcessingParams dpp = engineLoader.Engine.CreateDocumentProcessingParams();
dpp.PageProcessingParams.ObjectsExtractionParams.SourceContentReuseMode = FREngine.SourceContentReuseModeEnum.CRM_DoNotReuse;

// Recognize blocks 
document.Pages[0].RecognizeBlocks(null, null, dpp.PageProcessingParams.ObjectsExtractionParams);
document.Synthesize(null);               

// Save the result
document.Export(@"D:\Temp\reuse_recognized.pdf", FREngine.FileExportFormatEnum.FEF_PDF, null);

 

Please find attached the result achieved with the help of the given example.

 

Hope this will help you!  

 

Attached Files

lx11020219 posted this 19 September 2018

Thanks for the survey and sample code!
However, I'm sorry.
In conclusion, the coordinate data was misaligned.
The image embedded in PDF before processing and the image embedded in PDF after processing had different resolutions.

Before processing: 793px * 1121px
After processing: 2478px * 3503px

It seems that handwritten extraction could not be done because it corresponds to the margin if it is the coordinate value before processing.

Before and after processing, correct the coordinate values and attach files 3.
Also, thank you for your detailed advice such as layout analysis. I will refer to it.

Attached Files

Helen Osetrova posted this 21 September 2018

Hi! 

 

For recognition, FineReader Engine uses the binarized copy of the initial image. This is a special format suitable for OCR. For documents scanned at lower resolutions (less than 120 dpi) and documents with small fonts (less than 10 pt), the images may be digitally enlarged to achieve better OCR quality. (See the source image recommendations on the related page.)

 

In this case, the coordinates of the block region should be taken from the binarized image. In order to obtain it, kindly call the SaveToFile() method of the ImageDocument object. Please review the Developer's Help → API Reference → ImageDocument Object article for the description of the internal image format of  FineReader Engine. The Developer's Help → Guided Tour Advanced Techniques → Working with Images section may be also useful for you.

 

Hope this information will be helpful! The binarized copy of your sample document is attached to this post.

Attached Files

lx11020219 posted this 27 September 2018

Hi!

Supplementary information Thank you.
We will try the adjustment based on the information.

Close