XLSX broken structure

  • Last Post 29 January 2015
imutyshev posted this 21 January 2015

Hey guys

We upload our PDFs to your cloud OCR; we get XLSX from you, we try to read this document with our code and what we get is some cells with important data are either empty or contain only some chars or numbers (for instances, instead of "5648945 Аппарат магнитный SD12223 34-a 88" we get just "88").

Now to the weird part. If we manually open the same document with M$ Excel or OpenOffice we see that data is not lost, i mean "5648945 Аппарат магнитный SD12223 34-a 88" is here. After that, we click "Save" and close the document.

And now to the weirdest part. If we try to read THIS VERY SAME XLSX with our code, we get full and correct value "5648945 Аппарат магнитный SD12223 34-a 88" NOT "88" that we got before opening and saving it with Excel or OpenOffice. YOU GET IT? IT IS THE SAME FILE!

I wonder how exactly you generate XLSX files. What tool do you use? PLEASE HELP US! We tried to read your XLSX with OpenXML by M$, EPPlus, ExcelReader and our own libraries, result is always the same.

Julia Anikushina posted this 29 January 2015

In order we could better assist you please send us to CloudOCRSDK@abbyy.com an image you process with Cloud OCR SDK, the result file and XLSX file with correct XML structure. Looking forward to hearing from you!