I am trying to process pdf forms that have the text of interest in a tabular image. The left column will always be exactly the same. Phrases such as Sitting and reading, Watching TV, Sitting in a public place, etc.
The right columns is the column of interest. It will always have options that look like Slight chance of dozing (1), Would never doze (0), Moderate chance of dozing (3) or High chance of dozing (4).
Each image is a digitally generated pdf that I have no control over. I am just trying to extract the numeric contents from the right side of the table.
Using the cloud sdk, I'm having various levels of success. It's not always putting the tabular items in the correct order, and it's not always including the phrases from the right column at all.
Is there anything I can do to improve the results? I'm attaching a sample file.