Inconsistent processing time

  • Last Post 06 December 2016
Choon_L posted this 30 November 2016

1) We noticed that one the documents that we uploaded for OCR, took about 9 minutes (excluding the upload and download times). When we re-ran the test with the same document, the result came back in 6 seconds this time. Do results of OCR get cached on the server side? If yes, does Abbyy reduce the # of available pages in my account for such requests? If there is no caching, why is there so much discrepancy in response times for the same document? We even observed that the server status to be a constant 103% during this test - so why so much variation if there is no cached results?

2) Let's say, I upload 500 documents to Abbyy Cloud, which are very similar in structure to the test document that took about 9 minutes. What would be the ball park estimate for how much time it could take Abbyy to process such a load? Let's assume that there is no one else on the server submitting a lot of tasks, the server load is constant and we send these 500 requests in parallel. In such a scenario, given that this sample document took 9 minutes at 105% server load, could you please give us a rough estimate on what to expect w.r.t overall processing time? Consider your infrastructure, your VM spin up times on Azure during auto scale, and give us this number.

3) Would it be possible for Abbyy to allocate more resources for faster dedicated processing on our documents, if we give you a 24/48 hour notice that we expect more documents? Can we may be negotiate for on-demand performance lines at a workable pricing model that is mutually beneficially?

IvanPopov posted this 06 December 2016

  1. Indeed, recognition results are cached after image recognition and stored for 24 hours similarly to the image itself. If re-recognition is performed with the same settings, the cached result would be returned. If different settings are used, the image is re-recognized for free as long as the image is stored on the server. Please note that if you delete the recognized image (using the deleteTask method) or if it is deleted automatically after 24 hours, next recognition of this image will be charged according to the Price List. Please find more information here:
  2. Overall, the service is built in such a way that it scales up automatically when the incoming number of tasks increases. You can find some more details on the scalability on the ABBYY Technology Portal: In general, the average time of the recognition of one A4 page in English is considered to be around 7 seconds. One A4 page means a simple 1-page office scanned document which met our recommendations of quality. In case of multipage documents, the processing time is basically the total time of processing all its pages. As for all our products, the processing time strongly depends on the document itself (image quality, page size, amount of text etc.) If you could send a few examples of the images that you are recognizing to, we might be able to provide more specific advice.
  3. As mentioned in, the service is able to scale up automatically and should be able to deal with the loads similar to what you described (500 multipage documents). In case you expect document batches that are much greater than that, it is possible to request to scale up the service ahead of time, but you should contact ABBYY sales for details and pricing.