[FR Engine 11 SDK] UTF-8 exported files have a BOM character at the beginning of the text file.

  • 1.2K Views
  • Last Post 01 July 2015
maol posted this 20 April 2015

Hi,

We just noticed that when exporting to a UTF-8 text file, Fine Reader Engine adds a BOM (Byte Order Mark) character at the beginning of the file.

page.Export(tempTxtFile.getAbsolutePath(), FileExportFormatEnum.FEF_TextUnicodeDefaults, exportParams);

This BOM character (EF BB BF) indicates the Unicode representation of the text.

But when using UTF-8 it is optionnal and not recommended (ref. Unicode Standard 5.0) . Especially for Java which assumes that UTF8 files don't have a BOM. When reading the file, BOM character will be interpreted as ? in Java which is really annoying.

More infos here: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html

Currently we have a workaround ( http://stackoverflow.com/questions/4897876/reading-utf-8-bom-marker) but it would be nice to condiser removing it in the future or make it optional ;)

Julia Anikushina posted this 01 July 2015

Sorry for the delay with response.

We have passed your suggestion to our analysts and created reclamation to make BOM character optional. Unfortunately, so far we do nоt have information when this feature will be available and we hope that will be implementing in the future versions.

Close