I have documents which have headings and margin notes that affect how cleanly they are ocr'd. Even if the text is correctly ocr'd they are so close to the main body of text that they get "run into it". I have found the best way to get a clean scan of great ocr is to "mask out" the text I don't care about (I have an automatic process that can do this), the image is the same size as the "original" unmasked image. Obviously if I request a pdf document I get the "masked" image with the text underlay - what I want is the original image with the text underlay.

Is there any easy way to substitute the "masked" image for the original image, or is there an easy way to use the xml output to add a text underlay onto the original image - all the box positions will be the same.

I should say what I am after is the "pdfSearchable" style of pdf (but with the original image).



asked 07 Sep '13, 07:57

robin's gravatar image


closed 09 Sep '13, 15:35

Anastasia%20Galimova's gravatar image

Anastasia Ga... ♦♦

The question has been closed for the following reason "Question is not relevant" by Anastasia Galimova 09 Sep '13, 15:35

Unfortunately, it goes beyond the functionality of ABBYY products.


answered 09 Sep '13, 15:34

Anastasia%20Galimova's gravatar image

Anastasia Ga... ♦♦

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 07 Sep '13, 07:57

Seen: 1,224 times

Last updated: 09 Sep '13, 15:35

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal