OCR English and Japanese on one page

  • 2.6K Views
  • Last Post 12 December 2012
gursharan posted this 27 November 2012

Problem is that we are not able to parse the English and Japanese information in one execution cycle. First we have to run for English text and then for japanese in the snipped right below Is it possible to have only one OCR read cycle for both types of texts?


English Call PageProcessingParams pageprocessingParamsEng = engineLoader.Engine.CreatePageProcessingParams(); pageprocessingParamsEng.RecognizerParams.SetPredefinedTextLanguage("English");

            SynthesisParamsForDocument synthesisParamsForDocument = engineLoader.Engine.CreateSynthesisParamsForDocument();
            synthesisParamsForDocument.CleanRecognizedTextFontNames();
            synthesisParamsForDocument.AddRecognizedTextFontName("MS UI Gothic");
            document.Process(pageprocessingParamsEng, null, synthesisParamsForDocument);

            for (int i = 0; i < document.Pages.Count; i++)
            {
                calculateStatisticsForLayout(document.Pages.Layout);
            }

//JAPANESE CALL PageProcessingParams pageprocessingParams = engineLoader.Engine.CreatePageProcessingParams(); pageprocessingParams.RecognizerParams.SetPredefinedTextLanguage("Japanese");

            document.Process(pageprocessingParams, null, synthesisParamsForDocument);

            ///

            for (int i = 0; i < document.Pages.Count; i++)
            {
                calculateStatisticsForLayout(document.Pages.Layout);
            }

Order By: Standard | Newest | Votes
Dmitry Me posted this 27 November 2012

What happens if you call pageprocessingParams.RecognizerParams.SetPredefinedTextLanguage("English,Japanese"); ?

gursharan posted this 27 November 2012

Thank you for the response but still does not pick up the English text. If I remove the japanse font here then the english text is read but japanese shows up as junk.Please help.

SynthesisParamsForDocument synthesisParamsForDocument = engineLoader.Engine.CreateSynthesisParamsForDocument(); synthesisParamsForDocument.CleanRecognizedTextFontNames(); synthesisParamsForDocument.AddRecognizedTextFontName("MS UI Gothic"); document.Process(pageprocessingParamsEng, null, synthesisParamsForDocument);

Anastasia Galimova posted this 12 December 2012

Thank you for your question! Could you please attach the image sample for which the issue occurs and write the build number of your ABBYY FineReader Engine package to let us reproduce the issue?

To determine the build number please see http://knowledgebase.ocrsdk.com/article/1116

Close