I'm running a processImage on a PDF file. Using the following URL http://cloud.ocrsdk.com/processImage?correctOrientation=true&language=English&exportFormat=xml&profile=textExtraction The PDF is rotated so am passing the correctOrientation (which it seems to do just fine). I have AsyncProcessTask processImage outPutFormat = xml The output file snippet image is attached below (sure would be nice if one could attach an XML file!). It is running on an Android device so I'm using the XmlPullParser class. I get the following error from the parser: AsyncProcessTask.parseOCRResults exception = org.xmlpull.v1.XmlPullParserException: Unexpected token (position:TEXT ?@1:2 in java.io.FileReader@4210f5d0) I then loaded the full xml file into XMLPad, choose XML->Validate and get the following errors. What am I missing? Is the namespace incorrect? Its set to xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" Thanks for any help. alt text alt text

asked 12 Aug '15, 03:44

nwixsom's gravatar image

nwixsom
113

Thank you for this information. We've reproduced the issue and now we are consulting with our developers in order to clarify the situation.

(12 Aug '15, 14:34) Oksana Serdyuk ♦♦

I would like to inform you that we have just update the version of XML scheme and now the issue should be solved.

link

answered 14 Aug '15, 10:56

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦
1.4k16

I just tried to run it again and get the same error. The xml file that results from the processImage contains the tag rotation within the page tag. I looked at your schema (http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml) and it does not contain a tag called rotation which is causing my parser to fail.

<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 11" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml &lt;a href=" http:="" www.abbyy.com="" finereader_xml="" finereader10-schema-v1.xml"="">">http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml"> <page width="7200" height="10800" resolution="300" originalcoords="1" rotation="RotatedCounterclockwise">

</page> </document>

link

answered 18 Aug '15, 23:25

nwixsom's gravatar image

nwixsom
113

I've just processed your Restrooms1.pdf file using the same recognition settings and gotten a correct XML file. WMHelp XMLPad validates it successfully:

alt text

Would you mind processing the file once again?

link

answered 19 Aug '15, 14:44

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦
1.4k16

Thanks, I found the problem was in the XmlPullParser. If I pass a FileReader to the CTOR it generates that error (and I have no idea why). If I use a FileInputStream instead I'm able to parse the file. Thanks again for all your help.

link

answered 19 Aug '15, 20:09

nwixsom's gravatar image

nwixsom
113

More ways to Parse XML

Dell

link

answered 02 May '16, 08:46

Dell%20Mercant's gravatar image

Dell Mercant
1

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×49
×2

Asked: 12 Aug '15, 03:44

Seen: 2,071 times

Last updated: 02 May '16, 08:46

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal