I'm new to ABBYY and would like to present my use cases, the solution I imagine and see if you experts think I'm going in the right direction.
My company is receiving a lot of documents in a dozen different languages. All the documents are available in image ".tif" format through a service (so page by page, one document can have hundreds of pages).
Once the documents (pages) are OCRed a human based process is put in place to correct the result (correction of badly recognized character, removal of unrecognized drawings or stuff like that).
1 : Linux FineReader 12.
The company is actually moving into AWS cloud storage, so picking the Linux FineReader Engine 12 makes sense for its cloud ready compatibility.
2 : java rest API using ABBYY BatchProcessor and parallel processing using a pool of engines.
Reading through documentation, for processing a huge quantity of one-page-documents (it's actually several pages, but the input is page by page .tif images) the best architectural solution is to use the BatchProcessor with a pool of engines for parallel processing. The business logic (re create a single document once all the pages are processed) can be abstracted to another project.
If anyone has gone through a similar thing, i'd love to hear about your inputs, problems you faced, limitations etc...
- 27 Views
- Last Post 4 weeks ago
2213 questions, 6750 answers.