Arabic text recognition

  • 220 Views
  • Last Post 10 May 2017
  • Topic Is Solved
Leif posted this 13 April 2017

We have a developer trial licence for the ABBYY engine, and according to the licence manager, it supports Arabic. I tried our development product with some Arabic text and it did not recognise it. Obviously our product works fine with Latin text. Do we have to specifically configure the ABBYY OCR engine to recognise Arabic, or does it do it by default? If so, how please? We are using the C++ interface, creating an instance of FREngine::IEngine etc.

Thank you. Leif

Order By: Standard | Newest | Votes
Leif posted this 13 April 2017

I think I have solved this one myself:


        FREngine::IRecognizerParamsPtr recognizerParamsPtr = enginePtr->CreateRecognizerParams();
        recognizerParamsPtr->SetPredefinedTextLanguage("Arabic");

It appears to work. :)

Leif posted this 21 April 2017

Incidentally, the text recognition accuracy depends on the font. I find that it does not work with Iranian and Saudi ID cards, but it did with a sample in a different Arabic font.

I also find thatfor font on a non uniform background, we need these settings:


void SetParameters(FREngine::IObjectsExtractionParamsPtr objectsExtractionParamsPtr)
{
    objectsExtractionParamsPtr->DetectTextOnPictures = VARIANT_TRUE;
    objectsExtractionParamsPtr->RemoveGarbage = VARIANT_TRUE;
    objectsExtractionParamsPtr->EnableAggressiveTextExtraction = VARIANT_TRUE;
}

Please let me know if you find any 'tricks'. And I hope you are not one of our competitors. ;)

Oksana Serdyuk posted this 24 April 2017

You can also try to use the TextExtraction_Accuracy predefined profile. Please read more about how to work with profiles in the Developer's Help→Guided Tour→Advanced Techniques→Working with Profiles and Specifications→Predefined Profiles Specification.

Sharath Kumar Chakali posted this 09 May 2017

We are also facing the same issue,  ABBYY OCR engine is not recognizing Arabic. Even we tried below code but no luck.

 engine.LoadPredefinedProfile( "DocumentConversion_Accuracy" );
 engine.CreateRecognizerParams().SetPredefinedTextLanguage("Arabic");

 

Please suggest

Leif posted this 09 May 2017

Sharath

 

Check your ABBYY licence manager to see that you are licenced to use Arabic. The licence manager is in the ABBYY installation folder.

Sharath Kumar Chakali posted this 09 May 2017

Thanks for the quick response.

Yes, our received license is provisioned to use "Arabic".

Leif posted this 09 May 2017

Make sure you use the recogniser params! For example:

 

        FREngine::IPageAnalysisParamsPtr pageAnalysisParamsPtr = enginePtr->CreatePageAnalysisParams();
        SetParameters(pageAnalysisParamsPtr);

        FREngine::IRecognizerParamsPtr recognizerParamsPtr = enginePtr->CreateRecognizerParams();
        if (extractTextParameters._languages.length() > 0)
        {
            recognizerParamsPtr->SetPredefinedTextLanguage(extractTextParameters._languages.c_str());
        }

        FREngine::IRegionPtr regionPtr = enginePtr->CreateRegion();
        RECT& rect = regions.back().Rect;
        regionPtr->AddRect(rect.left, rect.top, rect.right, rect.bottom);
        documentPtr->Pages->Item(0)->AnalyzeRegion(regionPtr, pageAnalysisParamsPtr, NULL, recognizerParamsPtr);

Alternatively:

    FREngine::IPageProcessingParamsPtr pageProcessingParamsPtr = enginePtr->CreatePageProcessingParams();

    if (extractTextParameters._languages.length() > 0)
    {
        pageProcessingParamsPtr->RecognizerParams->SetPredefinedTextLanguage(extractTextParameters._languages.c_str());
    }

    // Set other parameters here ...

    // First page
    documentPtr->Pages->Item(0)->PreprocessAnalyzeRecognize(pageProcessingParamsPtr);

Sharath Kumar Chakali posted this 10 May 2017

Thank you Leif. It worked with the below code. OCR is working for "Arabic" language but results are not accurate.

Do let me know If I am missing something here. 

 

 

//Set Language

IPageProcessingParams oIPagePreprocessingParams = engine.CreatePageProcessingParams();

if(engine.getPredefinedLanguages().getCount()>0)

{

oIPagePreprocessingParams.getRecognizerParams().SetPredefinedTextLanguage(predefinedTextLanguage);

}

IDocumentProcessingParams oIDocumentProcessingParams = engine.CreateDocumentProcessingParams();

oIDocumentProcessingParams.setPageProcessingParams(oIPagePreprocessingParams);

 

 

// Process document

logger.info( "Process..." );

document.Process( oIDocumentProcessingParams );

 

 

Leif posted this 10 May 2017

Sharath

You might be doing fine. Arabic character recognition is very variable, as ABBYY acknowledge. It works great with some fonts, not so well with others, and it is affected by the background too.

Oksana Serdyuk posted this 10 May 2017

Hi Sharath,

I would recommend you to contact your region Technical Support and discuss your situation in more details by email. All ABBYY contacts are available here. Kindly attach to your message the sample image for reproducing the issue and specify the build number of FineReader Engine.

Close