I am trying to correct recognized words according to original text.

int startIndex = word.getFirstSymbolPosition();
paragraph.Remove(startIndex, startIndex + word.getText().length());
paragraph.Insert(startIndex, correctedWord, null);

Recognized text - 'Cote d'Ivoire', Corrected word - 'Cote' -> 'Côte', Text after correction - 'Côt d'Ivoire'. The Symbol 'e' (position 3) was lost.

Other example with 2 symbols lost: 'I'Insdustrie' -> 'l’Insdustrie' Text after correction - 'l’Insdustr'

Additional information: configured compound language 'English' + UserDictionary, alphabet extended with all unique original text symbols. Changed suffexes and prefixes:

language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Alphabet, newAlphabet);         language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Prefixes, "");
language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Suffixes, "");

asked 16 Oct '15, 16:32

Andrei's gravatar image

Andrei
111

edited 16 Oct '15, 18:01

Hello!

I've tried to reproduce this behavior in FineReader Engine 11 with our Demo images, but the words are inserted correctly. Could you please send to SDK_Suppport@abbyy.com the input document, the version of your FREngine distribution package, and the code snippet which will help to reproduce the issue.

(19 Oct '15, 15:30) Oksana Serdyuk ♦♦

This happens with umlaut symbols and English alphabet. Not all symbols can be used in alphabet extension based on English language. For example this symbol 'ô' cannot be inserted in compound language based on English language.

Workaround: 1. For symbols that can be added in a new alphabet

String beforeZSymbols = alphabet.substring(0, alphabet.indexOf("z") + 1);
char[] chars = newWord.toCharArray();
for (char ch : chars) {
     if (!beforeZSymbols.contains(ch + "")){
         newWord += " "; // add new space
     }
}
  1. For symbols that cannot be added in a new alphabet

     ....    
     newWord += "  "; // add new double space   
     ....
    
link

answered 19 Oct '15, 16:19

Andrei's gravatar image

Andrei
111

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×29
×2
×2
×1
×1

Asked: 16 Oct '15, 16:32

Seen: 690 times

Last updated: 19 Oct '15, 16:19

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal