mpraining: Some junk chars are detected, e.g.: "ï»¿PINOT NOIR" - this is the first line of the result of the attached image. Another one "Joan dâ€™Anguera". Here we need the text after such junk char removed. So is there any option to avoid such characters?
junk chars are detected
- 1.8K Views
- Last Post 07 February 2014
The issue is not reproduced on our side. We recommend to recognize your image with the URL "http://cloud.ocrsdk.com/processImage?language=english,french&profile=textextraction&exportFormat=txt". In this case the result is
PINOT NOIR BURGUNDY A1020 Roblet-Monnot “Vieilles Vignes" 2010 72 Al 021 Paul Pernot et ses Fils 2008 122 Pommard-Noizons A1022 Domaine Antonin Guyon 2009 Clos de la Chaume Gaufriot, Beaune A1023 Domaine Ardhuy 2009 Gevrey-Chambertin U5 172 C10-24 Domaine de Lambrays Grand Cru 2009 Clos des Lambrays, Morey 260 C1025 Camille Giroud Grand Cru 2008 Chapelle-Chambertin 430 an 18% gratuity is included on all checks
Hello Anastasia, Thanks for your feedback, I got it working better, but still there is one thing I do not understand is that, please check the following entry which I got from my result
A1022 Domaine Antonin Guyon 2009 Clos de la Chaume Gaufriot, Beaune A1023 Domaine Ardhuy 2009 Gevrey-Chambertin 145 172
Here actually, we expect something like this,
A1022 Domaine Antonin Guyon 2009 Clos de la Chaume Gaufriot, Beaune 145 A1023 Domaine Ardhuy 2009 Gevrey-Chambertin 172
But result is not fine, can you please check why this is happening otherwise my algorithm to detect this line will fail due to this OCR mistake. And I checked the xml format, that is not suitable for us. I'm just expecting the contents as in the image. Please check and help me.
The automatic analysis recognize this picture as several separate areas, that's why the text order is not from left to right and from top to bottom. Unfortunately, now it's impossible to export text in this order automatically. So the only way to get this order is to sort the words using its coordinates on your side.
1351 questions, 4403 answers.