We have a large number of files, some of them are scanned images into PDF and some are full/partial text PDF.

Is there a way to check these files to ensure that we are only processing files which are scanned images and not those that are full/partial text PDF files already?

We are using C# .NET 4.

Thanks

asked 24 Apr '12, 10:21

Jacob's gravatar image

Jacob
111

edited 09 Jun '12, 11:35

Vasily%20Panferov's gravatar image

Vasily Panferov ♦♦
5422516


ABBYY Cloud OCR SDK currently doesn't provide API for the task you describe. You can try using Adobe Reader COM API for saving pdf as text or look for some other solution. Have a look at this for example. Please let me know if you have any more questions.

link

answered 24 Apr '12, 13:05

Nikolay_Kh's gravatar image

Nikolay_Kh ♦♦
1817

You can use OCR program & see who better it is ? Hopefully you will get the a good result. good luck.

link

answered 26 Apr '12, 17:40

mattopson's gravatar image

mattopson
91

edited 27 Apr '12, 15:19

Nikolay_Kh's gravatar image

Nikolay_Kh ♦♦
1817

Hello Mat, please avoid discussing non-ABBYY OCR software unless you provide a solution for the described task. The link you provide doesn't clearly state how Jacob could look for text layer in his PDF files.

You can refer to our FAQ page for details: http://forum.ocrsdk.com/faq/

(27 Apr '12, 15:22) Nikolay_Kh ♦♦

Use iTextSharp to pre-process/check your PDF. We do this before we send anything to OCR with our own servers, because it saves a lot of time and reduces our queue.

(I am looking at this service as a replacement for our standard installation, but that is what we do right now.)

-AJ

link

answered 15 May '12, 01:53

AJW's gravatar image

AJW
111

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×49

Asked: 24 Apr '12, 10:21

Seen: 7,553 times

Last updated: 09 Jun '12, 11:35

© 2016 ABBYY. All rights Reserved. www.ABBYY.com | Privacy Policy | Legal