I wondering why the PDF files, produced by ABBYY OCR SDK have the metadata that tells that the file was produced by "ABBYY FineReader Engine 11"?
As I understand the ABBYY FineReader Engine and ABBYY OCR SDK are different technologies, so they need to be marked differently.
If I open PDF file with Adobe Acrobat and do "Ctrl + D" - in the opened window I see the "Application: ABBYY FineReader Engine 11"... Is it right?
I expected to find something as "ABBYY OCR SDK Cloud" or so...
We plan to sort by metadata our PDFs, so I wondering if we can understand which PDFs were processed by FineReader Engine, which by OCR SDK etc.?
We still wish you a Happy New Year and we are sorry for some delay with response due to Russian national winter holidays.
We can differ two following terms: OCR technology and product. OCR technology is a very sophisticated artificial intelligence algorithm; it consists of multiple different processing steps:
There are people in ABBYY company who develop only OCR technology, and there are the product developers who create our products or interfaces so that our users can use this core OCR technology.
ABBYY Cloud OCR SDK is based on ABBYY FineReader Engine 11 (our “big” SDK) and utilizes the same OCR technology generation as FineReader Engine 11. For this reason you can see in the metadata of output PDF file that it is produced by FineReader Engine 11. For your information you can also see what the difference is between the Cloud OCR SDK and FineReader Engine here.
As for sorting output PDFs by metadata, you can try to use the description parameter of the processImage, processDocument, processBusinessCard methods. It can accept up to 255 characters. This string is returned along with the task status information. So you can write any identifier for the outputs processed by Cloud OCR SDK to that field and get it back along with task status.
This answer is marked "community wiki".
answered 11 Jan '16, 15:38
Oksana Serdyuk ♦♦