I'm sending PDFs to the OCR service in order to extract the plain text to store in my db. I also want to store the page count of the PDF file. Is there a way to determine this number using the cloud-based SDK?

asked 07 Nov '12, 03:48

etipaced's gravatar image

etipaced
87125


You can retrieve that information from server xml response with task details:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<response xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
    <task id=”22345200-abe8-4f60-90c8-0d43c5f6c0f6”
        registrationTime=”2001-01-01T13:18:22Z”
        statusChangeTime=”2001-01-01T13:18:22Z”
        status=”InProgress”
        error=”{An error message.}”
        filesCount=”10”
        credits=”10”
        estimatedProcessingTime=”3600”
        resultUrl=”http://<domain>/<blob ID>”
        description=”My first OCR task”/>
    <task …/>
</response>

The credits attribute contains task cost in internal units. So, to get number of pages you need to divide this number by 5. This will work for all new documents. But if you send your document more than once, you'll get free recognition and hence 0 credits for the whole document.

This is basically a workaround. The more natural way for that information is to provide it in a separate attribute. We'll consider adding this feature in future.

link

answered 07 Nov '12, 07:54

Vasily%20Panferov's gravatar image

Vasily Panferov ♦♦
5422516

Thank you for the quick reply, Vasily. This works fine for now. However, it is a little disappointing that I can't re-process a file to get its page count because the credits show up as 0. I understand why this is so, but I'm definitely casting my vote for the page count value to be returned in the XML response. Thank you again.

link

answered 07 Nov '12, 20:16

etipaced's gravatar image

etipaced
87125

edited 07 Nov '12, 20:17

-1

how to search OCR non OCR file in my folder

link

answered 08 Jan '13, 17:46

shailesh's gravatar image

shailesh
91

please help

(08 Jan '13, 17:48) shailesh

You should start separate topic for new question.

(09 Jan '13, 00:26) Andrey Isaev ♦♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×47
×6
×2

Asked: 07 Nov '12, 03:48

Seen: 2,079 times

Last updated: 09 Jan '13, 00:26

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal