Following scenario: I have an image-only PDF file (a scanned book) with 500 pages and 500 Alto-XML files with OCR-Text for each corresponding page of that PDF File. That OCR-XML files were exported from the original searchable OCR-PDF-file. I don't have that source OCR-PDF file, it comes from a German library (StaBi Berlin). Unfortunately, they don't offer to download the OCR-PDF file directly. You can just download an image-only PDF file of a book and the corresponding OCR-XML-files from separately. Or all OCR-Text in one txt-file. (If you don't believe me, see for yourself: See here You can change the language to english on the bottom right corner)
So now I am looking for a way to import those 500 XML-Files back to each corresponding page of that image-only PDF so that I get a searchable OCR-PDF file in the end. Is there a way to do it with Finereader (or, if not, maybe with assistant tools?)
How to import ready OCR text (XML or TXT) in a PDF?
- 120 Views
- Last Post 27 October 2019
2240 questions, 6786 answers.