
Just came across this (haven't tried, but seemed worth sharing). OCRmyPDF combines several Linux tools to generate PDF/A files (PDF format intended for long term archiving) from PDF files that contain just bitmaps with text, e.g., of scanned document pages. The generated PDF/A files are then searchable. Original article in German describing the tool: http://www.heise.de/open/artikel/Toolbox-Texterkennung-mit-OCRmyPDF-2356670.... Google translation: http://www.google.com/translate?hl=en&ie=UTF8&sl=auto&tl=en&u=http%3A%2F%2Fwww.heise.de%2Fopen%2Fartikel%2FToolbox-Texterkennung-mit-OCRmyPDF-2356670.html Project homepage on github: https://github.com/fritz-hh/OCRmyPDF Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cms.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
participants (1)
-
Peter Reutemann