OCRmyPDF

13 Sep 2014

      Just came across this (haven't tried, but seemed worth sharing).

OCRmyPDF combines several Linux tools to generate PDF/A files (PDF
format intended for long term archiving) from PDF files that contain
just bitmaps with text, e.g., of scanned document pages. The generated
PDF/A files are then searchable.

Original article in German describing the tool:
http://www.heise.de/open/artikel/Toolbox-Texterkennung-mit-OCRmyPDF-2356670....

Google translation:
http://www.google.com/translate?hl=en&ie=UTF8&sl=auto&tl=en&u=http%3A%2F%2Fwww.heise.de%2Fopen%2Fartikel%2FToolbox-Texterkennung-mit-OCRmyPDF-2356670.html

Project homepage on github:
https://github.com/fritz-hh/OCRmyPDF

Cheers, Peter
-- 
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cms.waikato.ac.nz/~fracpete/          Ph. +64 (7) 858-5174

Peter Reutemann

tags

participants (1)