Difference between revisions of "PDF: The Portable Document Format"
m (→Linux PDF Tools: imagemagick) |
m (→Linux PDF Tools: imagemagick) |
||
Line 76: | Line 76: | ||
You can install imagemagick with apt | You can install imagemagick with apt | ||
sudo apt install imagemagick | sudo apt install imagemagick | ||
+ | |||
+ | See also: [[Create PDF Documents with ImageMagick and Ghostscript]] | ||
=== Linux PDF Tools: qpdf PDF transformation software === | === Linux PDF Tools: qpdf PDF transformation software === |
Revision as of 20:01, 2 October 2019
The Portable Document Format (PDF) is the file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two-dimensional documents in a device-independent and display resolution-independent fixed-layout document format. Each PDF file encapsulates a complete description of a 2-D document (and, with Acrobat 3-D, embedded 3-D documents) that includes the text, fonts, images, and 2-D vector graphics that compose the document.
PDF is an open standard, and is now being prepared for submission as an ISO standard. Adobe is an evil company.
Contents
- 1 PDF Types
- 2 PDF Document Viewers
- 3 PDF Authoring
- 4 PDF Utilities
- 4.1 The GUI Way: Using Gimp and LibreOffice Draw
- 4.2 The GUI Way: Using Simple Scan and PDF Chain
- 4.3 Linux PDF Tools: tiff2ps and ps2pdf
- 4.4 Linux PDF Tools: imagemagick
- 4.5 Linux PDF Tools: qpdf PDF transformation software
- 4.6 Print to PDF in Windows
- 4.7 Print to PDF in Linux
- 4.8 Convert Images to PDF in Windows
- 4.9 Convert PDF to Images in Windows
- 5 References
- 6 Related Pages
PDF Types
Consider the two main distinctions in PDF file types, scanned versus native. A native PDF file is superior to a scanned PDF file in capabilities, flexibility, and efficiency. This is due to a distinction of true text in the PDF from a PDF that is only images of text.
PDF Types:
- Native
- Scanned
Native PDF
A native PDF file will contain literal text as part of the structure, including information about the text. This is not to say that there are no images. It is to stay that the text itself is actual text and not just part of an image. A native PDF has an internal structure that can be read and interpreted. Only a native PDF can utilize all of the capabilities that the format lends to the reader software.
Scanned PDF
PDF files created by scanning hard-copy documents containing primarily text do not have the same structure as a PDF file of the same document created directly. The scanned document internally contains a picture of the document, with no information about the text. As far as a user can see it is just another PDF file, with a name and extension indistinguishable from any other; a good scan may look exactly the same as a native PDF file, although a visually poor-quality file, often with skewed pages, gives away its nature. However, the file size will be different, and it will not be possible to search for text. For a scan of adequate quality it is possible with suitable software to regenerate the text of the document with Optical character recognition (OCR), and embed it in the file so as to make it searchable, subject to the accuracy of the OCR.
Conversion
To use software to convert a Scanned PDF into a Native PDF involves Optical character recognition (OCR) technology. OCR will analyze the "image" of each character and match it to an electronic character-based file. The level of accuracy depends on the quality of the scan and the font used. OCR works primarily on typeset characters and not hand written text.
PDF Document Viewers
Evince PDF
Windows, FreeBSD, Linux
Evince is a document viewer for multiple document formats. The goal of evince is to replace the multiple document viewers that exist on the GNOME Desktop with a single simple application.
Evince currently supports PDF, Postscript, djvu, tiff, dvi, XPS, SyncTex with gedit, comics books (cbr,cbz,cb7 and cbt), and many more.
Review: Evince opens PDF files into a well laid out reader. The DRM flag is ignored making Evince far more useful than Sumatra PDF or Adobe reader. Loading speed was similar to Sumatra. One notable glitch occurs when text is selected, the text becomes distorted. This can somewhat hinder text selection. It has been reported that the Windows version will only open PDF files. In our test on Microsoft Windows we confirmed Evince was unable to open .epub an eBook format.
The fact that Evince PDF is not handicapped by DRM restrictions makes it far more useful as a PDF reader when compared to Sumatra PDF. For this reason Evince is our choice for a Windows PDF reader.
An annoying flaw in Evidence costs it half a star. On some PDF documents when print is selected, the printer outputs only blank paper. Certain PDF files will not print correctly using Evince. This is a reoccurring problem. Ultimately this is a serious issue with Evidence and results in the software being inadequate.
PDFlite
PDFlite can be used to read any PDF file. Simple design. View PDF documents with all common features such as search, print, zoom. Use the PDFlite printer so you can convert any document to PDF file.
PUP alert: Malware in installer. Even if you uncheck the toolbar and other software it still installs PUP in the background! Avoid unless you want to take the time to install it yourself from the sourcecode they provide.
Sumatra PDF
Microsoft Windows Only
A minimalistic PDF reader. Sumatra PDF has a minimalistic design, and its simplicity is attained at the expense of many other features. As is characteristic of many portable applications, Sumatra takes up little disk space - it has a 1mb setup file (compared to Adobe Reader's 27.5mb setup file), and it starts up rapidly. It was designed for portable use in the sense that it's just one file with no external dependencies so you can easily run it from external USB drive[1]. This would classify it as a portable application.
One interesting feature of Sumatra PDF is that it remembers exactly the last opened page for each pdf file. This helps it be a very useful pdf e-book reader.
Review: Sumatra PDF contains anti-features. It enforces DRM restrictions. As stated on a Sourceforge review, "it supports DRM of "protected" PDF files, and the author stubbornly refuses to make it optional. So you can't print PDFs for offline reading, and you can't copy text to the clipboard for pasting into Google translate, saving to your notes, quoting in a paper, etc."
The Sumatra PDF software developers are crybabies. Read their little rant about PDFLite is a SumatraPDF ripoff. The title should be Sumatra PDF developers do not understand Open Source.
GhostScript
Windows, FreeBSD, Linux
Command Line. Ghostscript is a suite of software. You can view, convert, and manipulate PDF files. Ghostscript is an interpreter for PostScript and Portable Document Format (PDF) files. Postscript can be picky and inconsistent about the PDF files it will open.
Example: view a PDF on Windows XP
gswin32c.exe -dSAFER -dBATCH "C:\Program Files\GPLGS\test3.pdf"
The example will open the pdf document in a GUI window for viewing.
PDF Authoring
PDF Utilities
The GUI Way: Using Gimp and LibreOffice Draw
It is fast, simple, and can all be accomplished without dropping to console, the creation of PDF documents from scanned images and other data sources.
This method is for people that wish to: Scan documents to images, make any modifications to the images, order the images and generate a custom multiple page PDF document.
Learn how to Create PDF Documents with Gimp and LibreOffice Draw.
The GUI Way: Using Simple Scan and PDF Chain
If all you are looking to do is scan some documents page by page, then combine them as a single ordered PDF without the need to make any edits or do any fancy OCR, compression, or other modification related activity, you can accomplish this quite quickly and easily using two programs:
- Simple Scan
- PDF Chain
With Simple Scan you can scan each page, and save each page as a PDF. You can even skip using PDF Chain and scan a number of pages to save as a PDF. However, if you need to re-order you can load each PDF you save into PDF chain and do some order changes, annotation, or other basic PDF related modification.
Linux PDF Tools: tiff2ps and ps2pdf
On Linux the tiff2ps command is part of libtiff-tools. The command line tools in libtiff-tools include tiffcp, tiff2ps', tiffdump and tiffsplit. Windows executables for libtiff-tools can be found at stillhq.com, e.g. http://www.stillhq.com/libtiff/win32/3.5.4/tiffcp.exe and http://www.stillhq.com/libtiff/win32/3.5.4/tiff2ps.exe
The Linux ps2pdf command is part of Ghostscript. Those command line tools are ps2pdf, gs or gswin32 (Win32 version). Ghostscript for Windows is gs651w32.exe
Netpbm for Windows is netpbm-9.19-bin.zip and requires Cygwin.
make pdf: from tiff, Use Tiff to PS (in linux)
tiff2ps *.tiff > tiffs.ps
from PS to PDF
ps2pdf tiffs.ps
You can compress an existing PDF (like one made with Gimp) into a smaller file size (ref: Compress PDF File In Linux)
ps2pdf big.pdf smaller.pdf
Linux PDF Tools: imagemagick
From the imagemagick package, use the convert command to perform tasks such as taking a folder of jpg images and creating a single PDF document. If the images are numbered in a way such as 01 02 03 04 05 (use leading zeros) then the page order will concur.
convert *.jpg document.pdf
It also works with png files
convert *.png document.pdf
The PDF contracts.pdf is black and white and contains multiple pages, we can generate a tiff image for each page and add parameters so there isn't a bunch of quality loss.
convert -colorspace rgb -density 300 contracts.pdf -monochrome contracts-%03d.tiff
You can install imagemagick with apt
sudo apt install imagemagick
See also: Create PDF Documents with ImageMagick and Ghostscript
Linux PDF Tools: qpdf PDF transformation software
The qpdf program is used to convert one PDF file to another equivalent PDF file. It is capable of performing a variety of transformations such as linearization (also known as web optimization or fast web viewing), encryption, and decryption of PDF files. It also has many options for inspecting or checking PDF files, some of which are useful primarily to PDF developers.
For example, I have a password protected PDF and I know the password, I simply wish to remove password protection:
qpdf –password=password –decrypt /home/nicole/Documents/resume.pdf /home/nicole/Documents/resume2.pdf
Replace "password" with the actual password of the document. qpdf was installed by default on my Linux Mint 18 system. If it is not installed on yours:
sudo apt install qpdf
Print to PDF in Windows
CutePDF Writer
There is a free version and a more feature rich pay version on their web site, http://www.cutepdf.com/Products/CutePDF/writer.asp
Print to PDF in Linux
One simple option that works in Debian distributions such as the popular Ubuntu Linux is to use cups-pdf.
See: Install and Use cups-pdf in Ubuntu for a detailed guide.
Convert Images to PDF in Windows
Free Image to PDF Converter. Supported formats are BMP, DIB, GIF, JPEG, JPG, JPE, JFIF, PNG, TIFF,TIF. Multiple files to a multi-page PDF. The tool combines multiple directories and images into one PDF.
Installer: PDFdu_Image_To_PDF_setup.exe
Developer Web Site: http://pdfdu.com/app/image-to-pdf-converter.aspx
Convert PDF to Images in Windows
Windows Print Driver: PDF to TIFF
The Virtual Image Printer driver by tariel will allow you to convert a PDF to multiple page image files in several image formats. This is not all The Virtual Image Printer and it is not exclusively for converting PDF to images. However, it is very handy for performing this task under the Windows XP operating system.
GhostScript
The installer "gs915w32.exe" is the Win32 installer as of Dec 2014 for Microsoft Windows 32-bit Operating Systems such as Windows XP. Using GhostScript a PDF can be converted to PNG for example.
gswin32c.exe -dNOPAUSE -dBATCH -sDEVICE=pnggray -sOutputFile="test.png" "test.pdf"
GhostScript requires a proper PDF. Some PDF files are broken, in that they will open in some viewers, but are not completely compliant with the standard. In short, GhostScript is picky.
References
Related Pages
- GIMP
- Ubuntu How Do I: A Linux Q&A
- PDF: The Portable Document Format
- Create PDF Documents with Gimp and LibreOffice Draw