Tables only from PDF

  • 30 Views
  • Last Post yesterday
Rama Reddy posted this 2 weeks ago

 Hi,

 

I have a pdf with two tables and text. I want to extract only tables and leave text by using ABBYY Finereader JAVA API. how can we do that.Can you suggest me any java code ?

Order By: Standard | Newest | Votes
Helen Osetrova posted this 2 weeks ago

Hello!

 

Please create the PageAnalysisParams object and tune its properties: set IPageAnalysisParams::DetectText = false and IPageAnalysisParams::DetectTables = true. Please learn more about the PageAnalysisParams object in Developer’s Help → API Reference → Parameter Objects → Preprocessing, Analysis, Recognition, and Synthesis Parameters →  PageAnalysisParams

 

You can also create a user profile with the  required settings, save it as an .ini file and then load it using the IEngine::LoadProfile method:

private IEngine engine = null;
...
engine = Engine.GetEngineObject( SamplesConfig.GetDllFolder(), SamplesConfig.GetDeveloperSN() );
...
engine.LoadProfile( "../profile.ini" );

 

“profile.ini” should contain the following strings:

[PageAnalysisParams]
DetectText = false
DetectTables = true

 

It is possible to specify also the recognition language and many other options in the profile. Please learn more about profiles usage in Developer’s Help → Guided Tour → Advanced Techniques → Working with Profiles.

 

You could find very comprehensive Java samples for different scenarios under the %ABBYY FineReader Engine folder%/Samples/Java directory.

 

Rama Reddy posted this 5 days ago

how to use that profile.ini file to extract only tables from pdf?

Helen Osetrova posted this 4 days ago

Hello,

 

Please note that the profiles usage is described in detail in Developer’s Help → Guided Tour → Advanced Techniques → Working with Profiles.

 

When some new objects are created, the properties of newly created objects are usually set to reasonable defaults. But default values are not always optimal for all usage scenarios. You may need to change these properties in some cases. This can be done either via the API or with the help of a profile. A profile contains a list of new default values for object properties. The LoadProfile() method of the Engine object allows you to load a user profile file (profile.ini). After this file is loaded, newly created objects will have the new default values specified in the file.

 

So, to use the profile for processing, you should implement the LoadProfile( String FileName ) with the only parameter FileName. FileName contains the path to the profile file. You can specify either a full path or a path relative to the current directory. 

 

Please find below Java code snippet based on our Java sample included to the FineReader Engine distribution pack:

 

    private void processImage() {

        String imagePath = SamplesConfig.GetSamplesFolder() + "\\SampleImages\\Demo.tif";
        String profilePath = SamplesConfig.GetSamplesFolder() + "\\SampleImages\\profile.ini";  // you should put profile.ini to the specified directory

        try {

            // Load Engine
            engine = Engine.GetEngineObject( SamplesConfig.GetDllFolder(), SamplesConfig.GetDeveloperSN() ); // you should specify a valid Developer Serial Number in SamplesConfig.java        

            // Load profile.ini
           Engine.LoadProfile(profilePath);

            // Create document
            IFRDocument document = engine.CreateFRDocument();            

            try {

                // Add image file to document
                displayMessage( "Loading image...");
                document.AddImageFile( imagePath, null, null );

                // Process document
                displayMessage( "Process...");
                document.Process();
            
                // Save results
                displayMessage( "Saving results...");

                // Save results to rtf with default parameters
                String rtfExportPath = SamplesConfig.GetSamplesFolder() + "\\SampleImages\\Demo.docx";
                document.Export( rtfExportPath, FileExportFormatEnum.FEF_DOCX, null );

            } finally {

                // Close document
                document.Close();

                displayMessage("Done ...");

                // Unload Engine
                engine = null;
                Engine.DeinitializeEngine();

           }

        } catch( Exception ex ) {

            displayMessage( ex.getMessage() );

        }
    }

 

Rama Reddy posted this 2 days ago

can we do this using Layout and Blocks?

and how to implement that?

Rama Reddy posted this yesterday

How to extract this table in proper way? The process we discussed above is extracting this table but it is dividing 'Firefox 1.0' as two cells and giving as two columns. How can I avoid that and get proper table?

Oksana Serdyuk posted this yesterday

Hi Rama,

Please try to add following strings to the profile.ini file:

[RTFExportParams]
KeepLines = true
PageSynthesisMode = PSM_RTFEditableCopy

Close