How to use API?

  • 186 Views
  • Last Post 26 December 2019
Gosforth posted this 18 December 2019

How to use this API?

https://www.ocrsdk.com/documentation/api-reference/process-image-method-v2/

[POST] https://<PROCESSING_LOCATION_ID>.ocrsdk.com/v2/processImage

I understand nothing from this tutorial. Suppose I want to connect to http://cloud-eu.ocrsdk.com/

So what URL will be? Where to enter ApplicationId and Password

Image should be send in body (post)?

 

EDIT:

I tried, tried and somewhere found that url is:

https://nnnnnn:xxxx@cloud-eu.ocrsdk.com/v2/processImage/?exportFormat=txt

As a result I get JSON but how to get result!?

I do not get andy url as written here:

https://www.ocrsdk.com/documentation/specifications/status-codes-v2/

Just some taskID. What should I do with that?

Why such simple things are so complicated?!

Order By: Standard | Newest | Votes
Koen de Leijer posted this 19 December 2019

Hi

I was waiting for an answer for your previous question

- https://forum.ocrsdk.com/thread/executing-in-python-stalls/

until I saw these two new ones.

- https://forum.ocrsdk.com/thread/python-subprocess-run-what-i-should-get-as-a-result/

- https://forum.ocrsdk.com/thread/how-to-use-api/

 

It should not be that hard after all with Python, please follow these steps:

https://www.ocrsdk.com/documentation/quick-start-guide/python-ocr-sdk/

Then tell me what output you get after step "5"


Or have a look at another sample from ABBYY

- https://pypi.org/project/ABBYY/

- https://github.com/samueltc/ABBYY


Best regards

Koen de Leijer

 

Gosforth posted this 19 December 2019

Your answer is not to the point. I ask about http API.

I started with Python but since this code is not handling errors I decided to use http protocol. Forgive me but your API tutorial is complete mess. As API user I always start from authorization (not even word about that in main tutorial). Then I'd like to know what each method returns and how to read it. Nothing about this. In next level tutorial I have info that 'processImage' method returns URL. Not true - there is no url in result JSON.

 

BWT each post starts with 'Error creating post'. Have to click at least twice to post (sometimes page reload needed)

 

Gosforth posted this 19 December 2019

I send image in POST to url:

https://nnnnnn:xxxx@cloud-eu.ocrsdk.com/v2/processImage/?exportFormat=txt

In result I get the answer:

{"taskId":"1cxxxxc0-cbfe-4b64-ae57-7a46f6682f1","registrationTime":"2019-12-19T16:20:06Z","statusChangeTime":"2019-12-19T16:20:06Z","status":"Queued","filesCount":1,"requestStatusDelay":10000}

Then I use method:

https://d69xxxx111:fvbko1Ts0O@cloud-eu.ocrsdk.com/v2/getTaskStatus/?taskId=1cxxxxc0-cbfe-4b64-ae57-7a46f6682f1

BUT I get error:

{"taskId":"1cxxxxc0-cbfe-4b64-ae57-7a46f6682f1","registrationTime":"2019-12-19T16:19:00Z","statusChangeTime":"2019-12-19T16:19:01Z","status":"ProcessingFailed","error":"Internal error","filesCount":1,"requestStatusDelay":0}

?

Koen de Leijer posted this 26 December 2019

Due to circumstances I was not able to respond earlier.
Keep in mind that I am not related to ABBYY and am I volunteering in helping on this forum.
Here I have a working example (relying on https://pypi.org/project/ABBYY/ ):

A wrapper that needs your parameters (see ....AS_PROVIDED) from ABBYY:

from ABBYY import CloudOCR


class ABBYYWrapper(object):

    def __init__(self, pdf_):
        self._pdf = pdf_
        self._language = 'en'
        self._exportFormat = 'pdfSearchable'
        self._cloudurl = 'ABBYY_CLOUD_URL_AS_PROVIDED'
        self._cloudapplicationid = 'ABBYY_CLOUD_ID_AS_PROVIDED'
        self._cloudpassword = 'ABBYY_CLOUD_PASSWORD_AS_PROVIDED'

    def process_and_download(self):
        """
        Performs the OCR on the PDF via the Cloud
        """

        # set file-pointer to first byte of the file
        self._pdf.seek(0)

        # create a dictionary holding  the PDF
        post_file = {'ocred_pdf': self._pdf.read()}

        # get handle to ABBYYs CloudOCR
        ocr_engine = CloudOCR(
            application_id=self._cloudapplicationid,
            password=self._cloudpassword)

        # override URL of ABBYYs CloudOCR
        ocr_engine.base_url = self._cloudurl

        # process the PDF and download/return the result
        result = ocr_engine.process_and_download(
            file=post_file,
            exportFormat=self._exportFormat,
            language=self._language)
        return result

The wrapper can be called like:

from io import BytesIO


from .abbyy import ABBYYWrapper


def perform_ocr(file_obj, settings, pdf_process_option):
    """Peforms OCR on PDF with ABBYY.

    :param file_obj: a file object open for reading.
    :return: a file object open for reading contained OCRed PDF.
    """

    ocr_engine = ABBYYWrapper(file_obj, settings, pdf_process_option)
    ocr_result = ocr_engine.process_and_download()

    if not ocr_result:
        raise Exception("No stream found by OCR engine")
    elif len(ocr_result) > 1:
        raise Exception("Multiple streams found by OCR engine")
    return [value for value in ocr_result.values()][0]


def get_ocred_pdf(file_obj):
    with perform_ocr(file_obj) as f:
        ocr_data = f.read()
    return BytesIO(ocr_data)

In my case it will convert a scanned PDF to a searchable PDF.

Close