Hello,

I wondering why the PDF files, produced by ABBYY OCR SDK have the metadata that tells that the file was produced by "ABBYY FineReader Engine 11"?

As I understand the ABBYY FineReader Engine and ABBYY OCR SDK are different technologies, so they need to be marked differently.

If I open PDF file with Adobe Acrobat and do "Ctrl + D" - in the opened window I see the "Application: ABBYY FineReader Engine 11"... Is it right?

I expected to find something as "ABBYY OCR SDK Cloud" or so...

We plan to sort by metadata our PDFs, so I wondering if we can understand which PDFs were processed by FineReader Engine, which by OCR SDK etc.?

Thanks, Vitalie

asked 04 Jan '16, 15:58

Vitalie's gravatar image

Vitalie
451214

edited 05 Jan '16, 13:24


Hello Vitalie,

We still wish you a Happy New Year and we are sorry for some delay with response due to Russian national winter holidays.

We can differ two following terms: OCR technology and product. OCR technology is a very sophisticated artificial intelligence algorithm; it consists of multiple different processing steps:

  • Open & Load
  • Image Preprocessing
  • Document/Layout Analysis
  • Character Recognition
  • Verification
  • Export

There are people in ABBYY company who develop only OCR technology, and there are the product developers who create our products or interfaces so that our users can use this core OCR technology.

ABBYY Cloud OCR SDK is based on ABBYY FineReader Engine 11 (our “big” SDK) and utilizes the same OCR technology generation as FineReader Engine 11. For this reason you can see in the metadata of output PDF file that it is produced by FineReader Engine 11. For your information you can also see what the difference is between the Cloud OCR SDK and FineReader Engine here.

As for sorting output PDFs by metadata, you can try to use the description parameter of the processImage, processDocument, processBusinessCard methods. It can accept up to 255 characters. This string is returned along with the task status information. So you can write any identifier for the outputs processed by Cloud OCR SDK to that field and get it back along with task status.

link
This answer is marked "community wiki".

answered 11 Jan '16, 15:38

Oksana%20Serdyuk's gravatar image

Oksana Serdyuk ♦♦
1.5k16

Thank you very much, Oksana. Now I understand the difference. Wish to you and to ABBYY a Happy New 2016 Year!

(11 Jan '16, 16:21) Vitalie
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×49
×23
×2

Asked: 04 Jan '16, 15:58

Seen: 788 times

Last updated: 11 Jan '16, 16:21

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal