Changes

PDF: The Portable Document Format

1,067 bytes added, 17:26, 3 October 2019
The following lines were added (+) and removed (-):
=== OCR Scanned Images for your PDF Pages =======tesseract===Tesseract is an optical character recognition utility that will work in Linux and Microsoft Windows as well as other operating systems.Tesseract up to and including version 2 could only accept TIFF images of simple one-column text as inputs. Since version 3.00 Tesseract has supported output text formatting and besides TIFF allows for a number of new image formats.Tesseract is suitable for use as a backend and can be used for more complicated OCR tasks including layout analysis by using a frontend such as OCRopus or gImageReader.Before using Tesseract is is very important to properly process all the images so they will be most efficiently read by tesseract.  *text x-height is at least 20 pixels*reduce or eliminate rotation or skew of the text*high contract is recommended*eliminate any border or dark boxes around textsee: [[Tesseract]] for usage and examples of this powerful OCR tool that beats many expensive commercial software products including Adobe.  It is pretty impressive!
Administrator
4,579
edits