I'm running a processImage on a PDF file. Using the following URL http://cloud.ocrsdk.com/processImage?correctOrientation=true&language=English&exportFormat=xml&profile=textExtraction The PDF is rotated so am passing the correctOrientation (which it seems to do just fine). I have AsyncProcessTask processImage outPutFormat = xml The output file snippet image is attached below (sure would be nice if one could attach an XML file!). It is running on an Android device so I'm using the XmlPullParser class. I get the following error from the parser: AsyncProcessTask.parseOCRResults exception = org.xmlpull.v1.XmlPullParserException: Unexpected token (position:TEXT ?@1:2 in java.io.FileReader@4210f5d0) I then loaded the full xml file into XMLPad, choose XML->Validate and get the following errors. What am I missing? Is the namespace incorrect? Its set to xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" Thanks for any help.
Xml parse error of processImage results file
- 2.3K Views
- Last Post 12 August 2015
Thank you for this information. We've reproduced the issue and now we are consulting with our developers in order to clarify the situation.
I would like to inform you that we have just update the version of XML scheme and now the issue should be solved.
I just tried to run it again and get the same error. The xml file that results from the processImage contains the tag rotation within the page tag. I looked at your schema (http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml) and it does not contain a tag called rotation which is causing my parser to fail.<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 11" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml <a href=" http:="" www.abbyy.com="" finereader_xml="" finereader10-schema-v1.xml"="">">http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml"> <page width="7200" height="10800" resolution="300" originalcoords="1" rotation="RotatedCounterclockwise">
I've just processed your Restrooms1.pdf file using the same recognition settings and gotten a correct XML file. WMHelp XMLPad validates it successfully:
Would you mind processing the file once again?
Thanks, I found the problem was in the XmlPullParser. If I pass a FileReader to the CTOR it generates that error (and I have no idea why). If I use a FileInputStream instead I'm able to parse the file. Thanks again for all your help.
1257 questions, 4142 answers.