Packages of pages - let's say 50 per package. They are all forms

I need to identify the form I'm interested in from the package of 50 (classify images). These are all template forms.

Once I've got the right page, I then extract a text snippet.

I can see how the second part happens, but not the first. Any pointers on doing a form or document classification stage to a processing pipeline?

asked 10 Apr '12, 09:12

dynap12's gravatar image

dynap12
213

edited 09 Jun '12, 11:46

Vasily%20Panferov's gravatar image

Vasily Panferov ♦♦
5422516


Is there any update on the template implementation integrated with cloud ocr sdk?

link

answered 14 Jun '12, 22:03

Mithun's gravatar image

Mithun
212

What you describe is a step further from OCR and closer to data capture scenarios. We are planning to implement layout training and document classification features to ABBYY Cloud OCR SDK, but i can't say anything about the timing right now. Meanwhile, i've got two suggestions on solving your task:

  1. Do a full OCR for your document (or a piece of document) and look for the document specific text (form ID code, form title, specific question, etc.). and select the template type respectively. That's one of the high-level approaches used in the technologies from the product in the next list item.

  2. Alternatively, have a look at ABBYY FlexiCapture Engine, it's a non-cloud based data capture SDK designed to solve the task you describe.

I beleive both approaches would do the job for you, i suggest you use the first approach, as it would be easier to implement and if you feel that you need more data capture functionality - go for FlexiCapture Engine.

link

answered 10 Apr '12, 10:23

Nikolay_Kh's gravatar image

Nikolay_Kh ♦♦
1817

edited 11 Apr '12, 10:55

It has been more than two years by now but no sign of such additionl to the cloud API. This is quite disappointing.

(29 Nov '14, 16:44) Alexey Zimarev
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×4
×1

Asked: 10 Apr '12, 09:12

Seen: 2,202 times

Last updated: 29 Nov '14, 16:44

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal