I am trying to correct recognized words according to original text.

int startIndex = word.getFirstSymbolPosition();
paragraph.Remove(startIndex, startIndex + word.getText().length());
paragraph.Insert(startIndex, correctedWord, null);

Recognized text - 'Cote d'Ivoire', Corrected word - 'Cote' -> 'Côte', Text after correction - 'Côt d'Ivoire'. The Symbol 'e' (position 3) was lost.

Other example with 2 symbols lost: 'I'Insdustrie' -> 'l’Insdustrie' Text after correction - 'l’Insdustr'

Additional information: configured compound language 'English' + UserDictionary, alphabet extended with all unique original text symbols. Changed suffexes and prefixes:

language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Alphabet, newAlphabet);         language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Prefixes, "");
language.setLetterSet(BaseLanguageLetterSetEnum.BLLS_Suffixes, "");

asked 16 Oct '15, 16:32

Andrei's gravatar image


edited 16 Oct '15, 18:01


I've tried to reproduce this behavior in FineReader Engine 11 with our Demo images, but the words are inserted correctly. Could you please send to SDK_Suppport@abbyy.com the input document, the version of your FREngine distribution package, and the code snippet which will help to reproduce the issue.

(19 Oct '15, 15:30) Oksana Serdyuk ♦♦

This happens with umlaut symbols and English alphabet. Not all symbols can be used in alphabet extension based on English language. For example this symbol 'ô' cannot be inserted in compound language based on English language.

Workaround: 1. For symbols that can be added in a new alphabet

String beforeZSymbols = alphabet.substring(0, alphabet.indexOf("z") + 1);
char[] chars = newWord.toCharArray();
for (char ch : chars) {
     if (!beforeZSymbols.contains(ch + "")){
         newWord += " "; // add new space
  1. For symbols that cannot be added in a new alphabet

     newWord += "  "; // add new double space   

answered 19 Oct '15, 16:19

Andrei's gravatar image


Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 16 Oct '15, 16:32

Seen: 690 times

Last updated: 19 Oct '15, 16:19

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal