Fraction characters incorrectly recognized

  • 39 Views
  • Last Post 4 weeks ago
itma posted this 15 December 2018

I am a new user of FineReader, using it to create searchable PDFs of my stamps catalogues. A typical search could be "7½p Henry Hudson", the 7½p meaning seven and a half pence. The ½ character generally is recognized as y2, which I suppose looks something like ½.

The catalogues have small print and I was prompted to scan them at 600 dpi, but even going to 1200 dpi didn't correct the problem.

Does anyone have a suggestion how to correct this?

Order By: Standard | Newest | Votes
Nadezhda A. Solovyeva posted this 17 December 2018

Hi Itma,

By default, symbol "1/2" is not in FineReader Engine recognized character set for English. Here is how to create your own recognition language, will all Unicode fraction symbols:

            FREngine.ILanguageDatabase languageDB = engineLoader.Engine.CreateLanguageDatabase();
            FREngine.ITextLanguage englishAndFractional = languageDB.CreateTextLanguage();
            englishAndFractional.CopyFrom(engineLoader.Engine.PredefinedLanguages.Find("English").TextLanguage); // copy all settings from "English"
            FREngine.IBaseLanguage fractionalBL = englishAndFractional.BaseLanguages.AddNew();
            fractionalBL.LetterSet[FREngine.BaseLanguageLetterSetEnum.BLLS_Alphabet] = "0123456789¼½¾⅓⅔⅛⅜⅝⅞";

 

After that, set

FREngine.DocumentProcessingParams.PageProcessingParams.RecognizerParams.TextLanguage = englishAndFractional;

itma posted this 17 December 2018

Hello, Nadezhda:

Many thanks for your response.

Unfortunately, I haven't done any serious programming for a number of years and am really just a "user" rather than a "developer"of FineReader. What do I have to do with this code? Can I do this without an SDK?

itma posted this 17 December 2018

Nadezhda:

It also occurs to me that I have posted this on the wrong forum. I am using the stand-alone FineReader program rather than Cloud.

Frank.

Nadezhda A. Solovyeva posted this 5 weeks ago

Hi Itma,

You may simply use "English; Simple math formulas;" languages for OCR in FineReader Desktop.

itma posted this 5 weeks ago

I had not realized that the Windows and Mac versions are very different animals and the Mac version that I have does not support Unicode, and thus fraction characters.

Sorry for any inconvenience I may have caused. 

Frank.

itma posted this 4 weeks ago

Thanks for your suggestion, Nadezhda. It almost worked. I downloaded FineReader 14 for Windows and tried it. The fractions came up perfectly. The value for 1d (one old penny), however, was consistently recognized as Id. Oddly enough though, Britain has been known to use a seriffed capital I for the value 1 on its stamps. I’ll try setting up a custom language.

But a thought has just come to me. I realize now that I never checked this in my previous tests as I was fixated on values which included a fraction.

Frank.

itma posted this 4 weeks ago

Yes, I’ve just checked and “1d” is recognized as “Id” in the Mac version which does not have a Language utility. It must be picking up the word “Id” from the dictionary. I’ll check with ABBYY’s support to see if it is possible to disable the word “Id”.

So the suggestion of using simple formula is good, thanks.

Close