Strange behavior of paragraph text modification

  • 694 Views
  • Last Post 19 October 2015
Andrei posted this 16 October 2015

I am trying to correct recognized words according to original text.

int startIndex = word.getFirstSymbolPosition();
paragraph.Remove(startIndex, startIndex + word.getText().length());
paragraph.Insert(startIndex, correctedWord, null);

Recognized text - 'Cote d'Ivoire', Corrected word - 'Cote' -> 'Côte', Text after correction - 'Côt d'Ivoire'. The Symbol 'e' (position 3) was lost.

Other example with 2 symbols lost: 'I'Insdustrie' -> 'l’Insdustrie' Text after correction - 'l’Insdustr'

Additional information: configured compound language 'English' + UserDictionary, alphabet extended with all unique original text symbols. Changed suffexes and prefixes:

language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Alphabet, newAlphabet);         language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Prefixes, "");
language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Suffixes, "");

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 19 October 2015

Hello!

I've tried to reproduce this behavior in FineReader Engine 11 with our Demo images, but the words are inserted correctly. Could you please send to SDK_Suppport@abbyy.com the input document, the version of your FREngine distribution package, and the code snippet which will help to reproduce the issue.

Andrei posted this 19 October 2015

This happens with umlaut symbols and English alphabet. Not all symbols can be used in alphabet extension based on English language. For example this symbol 'ô' cannot be inserted in compound language based on English language.

Workaround: 1. For symbols that can be added in a new alphabet

String beforeZSymbols = alphabet.substring(0, alphabet.indexOf("z") + 1);
char[] chars = newWord.toCharArray();
for (char ch : chars) {
     if (!beforeZSymbols.contains(ch + "")){
         newWord += " "; // add new space
     }
}
  1. For symbols that cannot be added in a new alphabet

     ....    
     newWord += "  "; // add new double space   
     ....
    

Close