Any way to improve recognition of known phrases in a pdf?

  • 54 Views
  • Last Post 21 July 2017
Amy Anuszewski posted this 19 July 2017

I am trying to process pdf forms that have the text of interest in a tabular image.  The left column will always be exactly the same.  Phrases such as Sitting and reading, Watching TV, Sitting in a public place, etc.  

The right columns is the column of interest.  It will always have options that look like Slight chance of dozing (1), Would never doze (0), Moderate chance of dozing (3) or High chance of dozing (4).

Each image is a digitally generated pdf that I have no control over. I am just trying to extract the numeric contents from the right side of the table.  

Using the cloud sdk, I'm having various levels of success.  It's not always putting the tabular items in the correct order, and it's not always including the phrases from the right column at all. 

Is there anything I can do to improve the results?  I'm attaching a sample file.  

Attached Files

Diana Khammatova posted this 21 July 2017

To let us investigate the issue, kindly send to CloudOCRSDK@abbyy.com the following additional information:

  • the name of your application
  • the processing settings that were used for recognition
  • please attach also the image you recognize


Close