Changes

PDF: The Portable Document Format

5,278 bytes added, 23:12, 17 November 2021
/* Linux PDF Tools: pdfcrack */
The following lines were added (+) and removed (-):
=== The GUI Way: Using Gimp and LibreOffice Draw ===It is fast, simple, and can all be accomplished without dropping to console, the creation of PDF documents from scanned images and other data sources.This method is for people that wish to:  Scan documents to images, make any modifications to the images, order the images and generate a custom multiple page PDF document. Learn how to [[Create PDF Documents with Gimp and LibreOffice Draw]].=== The GUI Way: Using Simple Scan and PDF Chain  ===If all you are looking to do is scan some documents page by page, then combine them as a single ordered PDF without the need to make any edits or do any fancy OCR, compression, or other modification related activity, you can accomplish this quite quickly and easily using two programs:* Simple Scan* PDF ChainWith Simple Scan you can scan each page, and save each page as a PDF.  You can even skip using PDF Chain and scan a number of pages to save as a PDF.  However, if you need to re-order you can load each PDF you save into PDF chain and do some order changes, annotation, or other basic PDF related modification.You can compress an existing PDF (like one made with Gimp) into a smaller file size (ref: [https://www.shellhacks.com/linux-compress-pdf-reduce-pdf-size/ Compress PDF File In Linux]) ps2pdf big.pdf smaller.pdfThis particular method I highly recommend if you are comfortable with the linux shell.  I found found this to yield the best results with the least amount of labor.It also works with png files convert *.png document.pdfThe PDF contracts.pdf is black and white and contains multiple pages, we can generate a tiff image for each page and add parameters so there isn't a bunch of quality loss. convert -colorspace rgb -density 300 contracts.pdf -monochrome  contracts-%03d.tiff<big>See also: [[Create PDF Documents with ImageMagick and Ghostscript]]</big>=== Linux PDF Tools: tiff2pdf and tiffcp ===The tiff2pdf utility can convert a single tiff file into a pdf document.  For multiple pages it will be necessary to create a multi-page tiff file.  Yes, a single tiff file can contain multiple pages.A 12 page black and white document was scanned into jpeg images.  Although jpeg was not the best choice for black and white documents, this is how it was presented and thus needed to be converted to a pdf.  imagemagick convert produced a large pdf over 6mb that was not optimized for black and white.  This is not referring to compression, as applying jpeg compression or changing the dpi is not the correct way to optimize black and white scanned images.Our fat pdf that was created from jpeg and not optimized for black and white is called: document.pdf  It will be deconstructed back to images, except this time into optimized for black and white tiff images.  A larger multi-page tiff file will then be created from the multiple tiff images.  The single multi-page tiff file will then be converted back into a much smaller optimized pdf document. convert -colorspace rgb -density 300 document.pdf -monochrome document-%03d.tiff tiffcp document-???.tiff multipage.tiff tiff2pdf -o documentfinal.pdf multipage.tiffWhile the original document.pdf is over 6 mb, the documentfinal.pdf is less than 1mb.=== Linux PDF Tools: pdfcrack ===To unlock a password protected PDF file when you do NOT know the password.  PDFCrack is a GNU/Linux tool for recovering passwords and content from PDF-files. It is small, command line driven without external dependencies. pdfcrack -f 2020CrackMe.pdfIf you see the error The specific version is not supported (Standard - 6)Then the version of pdfcrack does not support 256-bit''Other resources, look into John the Ripper to brute force crack a protected PDF.  John the Ripper is a fast password cracker.  Its primary purpose is to detect weak Unix passwords.''=== OCR Scanned Images for your PDF Pages ===Tesseract is an optical character recognition utility that will work in Linux and Microsoft Windows as well as other operating systems.Tesseract up to and including version 2 could only accept TIFF images of simple one-column text as inputs. Since version 3.00 Tesseract has supported output text formatting and besides TIFF allows for a number of new image formats.Tesseract is suitable for use as a backend and can be used for more complicated OCR tasks including layout analysis by using a frontend such as OCRopus or gImageReader.Before using Tesseract is is very important to properly process all the images so they will be most efficiently read by tesseract.  *text x-height is at least 20 pixels*reduce or eliminate rotation or skew of the text*high contract is recommended*eliminate any border or dark boxes around textsee: [[Tesseract]] for usage and examples of this powerful OCR tool that beats many expensive commercial software products including Adobe.  It is pretty impressive!* [https://www.moreno.marzolla.name/software/scan-to-pdf/ Creating multi-page PDF documents from scanned images in Linux] - discusses tiff2pdf == Related Pages ==* [[GIMP]]* [[Ubuntu How Do I: A Linux Q&A]]* [[PDF: The Portable Document Format]]* [[Create PDF Documents with Gimp and LibreOffice Draw]]* [[Create PDF Documents with ImageMagick and Ghostscript]]
Bureaucrat, administrator
16,192
edits