I have documents which have headings and margin notes that affect how cleanly they are ocr'd. Even if the text is correctly ocr'd they are so close to the main body of text that they get "run into it". I have found the best way to get a clean scan of great ocr is to "mask out" the text I don't care about (I have an automatic process that can do this), the image is the same size as the "original" unmasked image. Obviously if I request a pdf document I get the "masked" image with the text underlay - what I want is the original image with the text underlay.
Is there any easy way to substitute the "masked" image for the original image, or is there an easy way to use the xml output to add a text underlay onto the original image - all the box positions will be the same.
I should say what I am after is the "pdfSearchable" style of pdf (but with the original image).