I was thinking of using the service for generating records from received invoices, which have the same information in multiple different possible formats, including freetext documents written by consultants. As they share the fields required by the law, I was thinking of using regExp queries, but failed noticing that field extraction works well only when field position is accurately specified.

Are you going to work in the direction of identifying fields by the expected characteristics such as nearby titles or expected type and dimension, or should I think of mapping the entire page to a database o words and their positions and manage the search myself?


asked 06 Jun '12, 18:04

IgetOra's gravatar image



Current Cloud OCR SDK API is about text recognition only, would it be full text or just a field. It deals nothing with finding a zone on an image.

We have data capture SDK (ABBYY FlexiCapture Engine) which has required ability. It is not mapped to the Cloud yet, but we are thinking about that. That will take certain time.

Right now I see two possible ways of doing what you want:

  1. Do full-text recognition and then apply regExp search to recognized text.
  2. If data you work with is structured or semi-structured you can pre-sort it and then apply known layouts of fields. Pre-sorting may be done using full-text recognition and applying key-word search. To save time and efforts only part of a document could be OCRed (first page of a multi-page document or a zone of single-page document).

Best regards, Dmitry. ABBYY, Lead Product Analyst, SDK products.


answered 07 Jun '12, 13:03

Chudik79's gravatar image


Actually, ABBYY is long time working in that direction. We have product called FlexiCapture and SDK called FlexiCapture Engine They all salve taks you have just described - they can help extracting particular data from semi-structured documents. Using FlexiLayout Studio you can define fields you want to extract and rules how to locate them on image. It is not just regular expression, it can define complicate dependencies with voting amond different layout hypotises, and even fields cross-checking and database look-ups for values.

Unfortunately this is not yet available in the Cloud since it does require special training on FlexiLayout programming.

So just please contact nearest ABBYY representative to talk about FlexiCapture product or Engine.


answered 07 Jun '12, 14:41

Andrey%20Isaev's gravatar image

Andrey Isaev ♦♦

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 06 Jun '12, 18:04

Seen: 3,067 times

Last updated: 07 Jun '12, 14:41

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal