difference betwwen PDF and PowerPoint OCR

  • 28 Views
  • Last Post 04 April 2019
  • Topic Is Solved
moran.yuva posted this 02 April 2019

Hello

 

We do the following test

1. convert a PowerPoint slide to an image

2. Do an OCR on the image

 

When the OCR is to PDF the results are reasonable however when the OCR is to PowerPoint the results are much worse. Why there are differences and how can we get good results in PowerPoint ?

 

Moran

Order By: Standard | Newest | Votes
Nadezhda A. Solovyeva posted this 03 April 2019

Dear Moran,

I assume you have used 72dpi or 96dpi resolution for your PowerPoint images. Please check with PowerPoint experts, how to save the images into different resolution. The recommended value is 300dpi.

The OCR quality also depends on fonts and formatting. If you use input PDF with a text layer, this does not matter. However, for the raster image this is very important. For decorate fonts and irregular formatting, the overall OCR quality is considerably worse. 

moran.yuva posted this 04 April 2019

Nadezhda Hi

My question related to something else.

Taking the same image, one time doing an OCR to PDF and second doing OCR of the same image to PowerPoint. The outcome in the PDF (text) is much better than the OCR

Moran

Nadezhda A. Solovyeva posted this 04 April 2019

Dear Moran,

Though we try to do our best when restoring document formatting, ABBYY OCR technologies are still not as good as a human layout designer/imposer. This does not matter for the PDF export, because in searchable PDFs, we don't do any layout design/impose, we simply keep the page image "as is" and add the text layer. Since this option is not available in any other format but PDF, ABBYY OCR performs the layout imposing as good as it is able to.

Close