Get paragraph text

  • 42 Views
  • Last Post 10 July 2017
imenmelki posted this 03 July 2017

Please 

 

I would like to get each block text and save it in a seperate txt. However the obtained text contains some encoding errors: the "é" is remplaced by a ? for exemple. 

 

Here is my code (C++) : 

// Extract the recognized text for each block separately

        for (int j = 0; j < nb_blocks; j++)

        {

            CSafePtr<IBlock> block;

            CheckResult(blocks->Item(j, &block));

 

            BSTR b_name;

            block->get_Name(&b_name);

 

            wchar_t resultFilePath[200];

            swprintf(resultFilePath, 200, L"%S_%S.txt", epath, b_name);

            displayMessage(resultFilePath);

 

            CSafePtr<ITextBlock> textBlock;

            block->GetAsTextBlock(&textBlock);

 

            CSafePtr<IText> text;

            CheckResult(textBlock->get_Text(&text));

 

            CSafePtr<IParagraphs> pars;

            CheckResult(text->get_Paragraphs( &pars ) );

 

            CSafePtr<IParagraph> par;

            int nb_par;

            CheckResult(pars->get_Count(&nb_par));

 

            //Get block text

            wchar_t re_text[1024];

 

            char* fileName = new char[wcslen( resultFilePath ) + 1];

   assert( wcstombs( fileName, resultFilePath, wcslen( resultFilePath ) + 1 ) != -1 );

       FILE* f = fopen( fileName , "w+");

            for(int p = 0; p< nb_par; p++)

            {

                CheckResult(pars->Item(p, &par ));

                BSTR parText;

                CheckResult(par->get_Text(&parText ));

                fwprintf( f, L"%ls\n", parText );

                displayMessage( parText);

            }

 

            fclose( f);

        }

 

What am I doing wrong? 

Diana Khammatova posted this 10 July 2017

Could you please send your input image to SDK_Support@abbyy.com? We will check this on our side.

Close