If using exportFormat = txtUnstructured I do not get spacial characters correct.

In your German Picture_samples (File: NewImage.JPG), there is "Höhlenzeichnungen sind die ältesten Dokumente", which is right in any pdf output-Format but in txt, rtf or txtUnstructured it appears as: "Höhlenzeichnungen sind die ältesten Dokumente,"

Also: In txtUnstructured there are still lots of multi-line breaks?!

asked 22 Apr '15, 20:10

Christian%20S%20Aus%20F's gravatar image

Christian S ...
11


We have not managed to reproduce the described issue. The NewImage.JPG file is recognized with quite high accuracy and correctly saved in all supported export formats. So, could you please send your results together with used recognition settings to Cloudocrsdk@abbyy.com? This information should help us to investigate the issue.

As for differences between txt export formats, the txt export format simulates original layout of a source document with the help of inserted spaces and empty lines, and the txtUnstructured format saves OCR results in the same order as they are recognized, i.e. block by block. Pictures are not saved in these formats, thus you can see empty space instead of the picture.

link

answered 24 Apr '15, 15:01

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦
1.5k16

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×5

Asked: 22 Apr '15, 20:10

Seen: 1,748 times

Last updated: 24 Apr '15, 15:01

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal