Best arguments for this table document?

  • 195 Views
  • Last Post 05 March 2018
ppalacios posted this 30 January 2018

Hi!

I'm trying to convert a document with two different tables. If table is like the first table of file attached, I get a correct xlsx. But if table is like the second table I can't get the xlsx output file with the "Detailed information" column followed by the "Additional information" column. My goal is to get that pair of values from source document.

 

Any kind of help is welcome

Attached Files

Order By: Standard | Newest | Votes
Oksana Serdyuk posted this 06 February 2018

Hi,

I have just processed your image using the DocumentConversion_Accuracy predefined profile and export the result to the XLSX format. The output is good enough if we take into account the quality of the source image. Please find my output by the following link: https://share.abbyy.com/index.php/s/pchMR2FfkFAgLCy. The Sheet1 contains the first table, the Sheet2 - the second table. Could you please specify what exactly you do not like in the second table?

ppalacios posted this 06 February 2018

 Thanks @Oksana.

I'm testing output of all document and I still have the same problem. How the document is confidential I can't attached, but the output is not what I expect when there is a series of cells inside a cell. In tis case, the number code (Additional Information column) and its description (Detailed information column) appear in the same column: all numbers code first, all descriptions after.

I've tested with other profiles: DocumentArchiving_Accuracy and TexctExtraction_Accuracy but the output is always the same. I've executed this command followind the advise for -lpp arg "It must be used before all other keys"

abbyyocr11 -lpp DocumentConversion_Accuracy -if Doc_Errorcode.pdf -f XLSX -tet UTF8 -ido -recc -of Doc_Eerrorcode.xlsx

Could you tell me another parameter to get two columns as output even though one cell contains more cells?

 

Thanks

Kseniya Leontyeva posted this 13 February 2018

Hi,

You can change not only profile but also other settings like PageAnalysisParams or TableAnalysisProfile.

For example, please try the following options:

  • PageAnalysisParams::AggressiveTableDetection = true
  • ObjectsExtarctionParams::SourceContentReuseMode = CRM_ContentOnly, if your document contains text layer already.
  • Changing resolution with PrepareImageMode OverwriteResolution, XResolutionToOvewrite and XResolutionToOvewrite properies.

You could read more about it and other useful settings in Help → Guided Tour → Advanced Techniques → Tuning Parameters of Page Preprocessing, Analysis, Recognition, and Synthesis. 

As far as we understood, you're using command line interface sample. However, some options aren't implemented in the CLI keys. If you don't want to change the code significantly, you may use User profile. It allows correct recognition settings through the config file. You can read more about it in Help → Guided Tour → Advanced Techniques → Working with Profiles.

Unfortunately, it's hard to give more specific recommendations without image example. So if you want, you may send us to SDK_Support@abbyy.com the image example and we'll be glad to assist you.

 

ppalacios posted this 13 February 2018

Hi Kseniya.

Thanks for your recomendation. I'll trying asap.

Anyway, in my first message I attached the input documents examples and them don't works as I expected with DocumentConversion_Accuracy predefined profile. Maybe you can use it with your recomendations.

Let me know if do it.

Thanks

Kseniya Leontyeva posted this 05 March 2018

Hi, 

We looked through your document. Unfortunately, up to date, recognition of table inside a table is not supported within the library. As a solution a may try to interpret inner table as a part of the outer. In order to do this, you may need to change the layout. It could be done as follows:

  • Through the Visual Components if you're using Windows version of FRE.
  • Through the API for working with a layout. You can find more information about this way in Help → Guided Tour → Advanced Techniques → Working with Layout and Blocks. This way could be used in any OS.

Close