[Cloud] Abbyy OCR using Python

  • 313 Views
  • Last Post 25 November 2016
  • Topic Is Solved
MarGul posted this 22 November 2016

Hi there.

I'm using the wrapper created by this guy: http://forum.ocrsdk.com/questions/1458/python-wrapper-to-abbyy-cloudocr and the reason why is because your python on github is for Python2.7 and I'm using Python3.5.

Anyhow. I just want to grab the text and barcodes and get it back in XML. I keep getting back (even though I change my profiles) a full XML with stuff I don't need or want.

Do I have to parse through all the blocks, par, line, charParams?

Isn't there just a XML format like: < text> The OCR read text < /text> < barcode> value of a barcode < /barcode>?

I thought by changing my profile to documentArchiving or textExtraction it would give me something like that.

I don't care about the structure of the document. I just want ALL the text it can find and potentially any barcodes.

Thanks, Marcus

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 24 November 2016

As I have already answered by email, ABBYY Cloud OCR SDK supports only this XML export format, and at the moment there are no any plans to add a new variant of XML export. You can create your own file in the needed format using our XML output.

  • Liked by
  • MarGul
MarGul posted this 25 November 2016

Thanks for the answer Oksana. I have parsed your XML output to my own now and my biggest concern was that you will change the current XML format you have to something else. Because if you would (say remove the <par> tags) my parser will break.

Close