Determine PDF Page Count

  • Last Post 09 January 2013
etipaced posted this 07 November 2012

I'm sending PDFs to the OCR service in order to extract the plain text to store in my db. I also want to store the page count of the PDF file. Is there a way to determine this number using the cloud-based SDK?

Order By: Standard | Newest | Votes
Vasily Panferov posted this 07 November 2012

You can retrieve that information from server xml response with task details:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<response xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
    <task id=”22345200-abe8-4f60-90c8-0d43c5f6c0f6”
        error=”{An error message.}”
        resultUrl=”http://<domain>/<blob ID>”
        description=”My first OCR task”/>
    <task …/>

The credits attribute contains task cost in internal units. So, to get number of pages you need to divide this number by 5. This will work for all new documents. But if you send your document more than once, you'll get free recognition and hence 0 credits for the whole document.

This is basically a workaround. The more natural way for that information is to provide it in a separate attribute. We'll consider adding this feature in future.

  • Liked by
  • Andrey Isaev
etipaced posted this 07 November 2012

Thank you for the quick reply, Vasily. This works fine for now. However, it is a little disappointing that I can't re-process a file to get its page count because the credits show up as 0. I understand why this is so, but I'm definitely casting my vote for the page count value to be returned in the XML response. Thank you again.

shailesh posted this 08 January 2013

how to search OCR non OCR file in my folder

shailesh posted this 08 January 2013

please help

Andrey Isaev posted this 09 January 2013

You should start separate topic for new question.