We just noticed that when exporting to a UTF-8 text file, Fine Reader Engine adds a BOM (Byte Order Mark) character at the beginning of the file.
page.Export(tempTxtFile.getAbsolutePath(), FileExportFormatEnum.FEF_TextUnicodeDefaults, exportParams);
This BOM character (
EF BB BF) indicates the Unicode representation of the text.
But when using UTF-8 it is optionnal and not recommended (ref. Unicode Standard 5.0) . Especially for Java which assumes that UTF8 files don't have a BOM. When reading the file, BOM character will be interpreted as ? in Java which is really annoying.
Currently we have a workaround ( http://stackoverflow.com/questions/4897876/reading-utf-8-bom-marker) but it would be nice to condiser removing it in the future or make it optional ;)