I'm extracting text from an image, and to extract as much text as possible from the image (a pdf document) i enable the
My problem is illustrated in this image (I've added the red marks to illustrate the problems). Commas are often confused with dots, colons or semicolons:
I need the
EnableTextExtractionMode enabled to get all the text of the image, without it, text is sometimes confused for pictures. I can disable
EnableTextExtractionMode and then also disable
DetectPictures, but it is not quite as good at getting all the text as it is when
EnableTextExtractionMode is enabled.
The problem with
EnableTextExtractionMode is that it also enables the
ProhibitModelAnalysis flag, and as far as i can tell that is what is causing my problems.
Just to make sure that it was the
ProhibitModelAnalysis flag that was causing the problems, I tried to run the image through with
EnableTextExtractionMode = false and
ProhibitModelAnalysis = true, and that did indeed cause the same problem.
So my question is this: Is there anyway to get the additional text extraction provided by
EnableTextExtractionMode without the reduced recognition quality from