Best arguments for this table document?

  • 136 Views
  • Last Post 2 weeks ago
ppalacios posted this 4 weeks ago

Hi!

I'm trying to convert a document with two different tables. If table is like the first table of file attached, I get a correct xlsx. But if table is like the second table I can't get the xlsx output file with the "Detailed information" column followed by the "Additional information" column. My goal is to get that pair of values from source document.

 

Any kind of help is welcome

Attached Files

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 3 weeks ago

Hi,

I have just processed your image using the DocumentConversion_Accuracy predefined profile and export the result to the XLSX format. The output is good enough if we take into account the quality of the source image. Please find my output by the following link: https://share.abbyy.com/index.php/s/pchMR2FfkFAgLCy. The Sheet1 contains the first table, the Sheet2 - the second table. Could you please specify what exactly you do not like in the second table?

ppalacios posted this 3 weeks ago

 Thanks @Oksana.

I'm testing output of all document and I still have the same problem. How the document is confidential I can't attached, but the output is not what I expect when there is a series of cells inside a cell. In tis case, the number code (Additional Information column) and its description (Detailed information column) appear in the same column: all numbers code first, all descriptions after.

I've tested with other profiles: DocumentArchiving_Accuracy and TexctExtraction_Accuracy but the output is always the same. I've executed this command followind the advise for -lpp arg "It must be used before all other keys"

abbyyocr11 -lpp DocumentConversion_Accuracy -if Doc_Errorcode.pdf -f XLSX -tet UTF8 -ido -recc -of Doc_Eerrorcode.xlsx

Could you tell me another parameter to get two columns as output even though one cell contains more cells?

 

Thanks

Kseniya Leontyeva posted this 2 weeks ago

Hi,

You can change not only profile but also other settings like PageAnalysisParams or TableAnalysisProfile.

For example, please try the following options:

  • PageAnalysisParams::AggressiveTableDetection = true
  • ObjectsExtarctionParams::SourceContentReuseMode = CRM_ContentOnly, if your document contains text layer already.
  • Changing resolution with PrepareImageMode OverwriteResolution, XResolutionToOvewrite and XResolutionToOvewrite properies.

You could read more about it and other useful settings in Help → Guided Tour → Advanced Techniques → Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis. 

As far as we understood, you're using command line interface sample. However, some options aren't implemented in the CLI keys. If you don't want to change the code significantly, you may use User profile. It allows correct recognition settings through the config file. You can read more about it in Help → Guided Tour → Advanced Techniques → Working with Profiles.

Unfortunately, it's hard to give more specific recommendations without image example. So if you want, you may send us to SDK_Support@abbyy.com the image example and we'll be glad to assist you.

 

ppalacios posted this 2 weeks ago

Hi Kseniya.

Thanks for your recomendation. I'll trying asap.

Anyway, in my first message I attached the input documents examples and them don't works as I expected with DocumentConversion_Accuracy predefined profile. Maybe you can use it with your recomendations.

Let me know if do it.

Thanks

Close