I'm using processFields in Cloud OCR API to extract text from a PDF document.

Output XML contains extra spaces like this:

<value>MLTMRC5 7E2 9H2 6 4O</value>

while PDF text field is:

alt text

how can I prevent this?

asked 15 Jul '15, 10:47

danyolgiax's gravatar image

danyolgiax
154


Please try to use the following description for this text field in your XML file with recognition settings:

<oneTextLine>true</oneTextLine>
<oneWordPerLine>true</oneWordPerLine>

The oneTextLine element specifies whether the field contains only one text line. And the oneWordPerLine element specifies whether the field contains only one word in each text line.

Also in your case the letterSet and regExp elements of the text tag can be useful. Please see more details here.

Hope this will help!

link

answered 15 Jul '15, 18:36

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦
1.5k16

edited 15 Jul '15, 18:36

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×195
×37

Asked: 15 Jul '15, 10:47

Seen: 756 times

Last updated: 15 Jul '15, 18:36

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal