Hi All, 

I am new to Abbyy and python.

I need to convert an image PDF to text. I have tried a few samples.

Sample code below:

from ABBYY import CloudOCR
ocr_engine = CloudOCR(application_id='b9fc6c0*******', password='h********')
pdf = open("filename.pdf", 'rb')
file = {pdf.name: pdf}
result = ocr_engine.process_and_download(file, exportFormat='txt,pdfTextAndImages',language="ChinesePRC")
print(result)

And I am getting the output as follows:

{'txt': <_io.BytesIO object at 0x03AEF270>, 'pdfTextAndImages': <_io.BytesIO object at 0x031ADCF0>}

 

Can someone please tell me how to wrap this? i.e. readable in python.

My use case is:

I have an image PDF (scanned), which I need to convert to text and do some string operations.

Currently, I am using the trial version and we are already having the corporate licence, which I will be using after getting a positive result.

 

Thanks a lot in advance> :)