11 January 2016
- Last edited 11 January 2016
We still wish you a Happy New Year and we are sorry for some delay with response due to Russian national winter holidays.
We can differ two following terms: OCR technology and product. OCR technology is a very sophisticated artificial intelligence algorithm; it consists of multiple different processing steps:
- Open & Load
- Image Preprocessing
- Document/Layout Analysis
- Character Recognition
There are people in ABBYY company who develop only OCR technology, and there are the product developers who create our products or interfaces so that our users can use this core OCR technology.
ABBYY Cloud OCR SDK is based on ABBYY FineReader Engine 11 (our “big” SDK) and utilizes the same OCR technology generation as FineReader Engine 11. For this reason you can see in the metadata of output PDF file that it is produced by FineReader Engine 11. For your information you can also see what the difference is between the Cloud OCR SDK and FineReader Engine here.
As for sorting output PDFs by metadata, you can try to use the description parameter of the processImage, processDocument, processBusinessCard methods. It can accept up to 255 characters. This string is returned along with the task status information. So you can write any identifier for the outputs processed by Cloud OCR SDK to that field and get it back along with task status.