Difference between revisions of "PDF: The Portable Document Format"

From Free Knowledge Base- The DUCK Project: information for everyone
Jump to: navigation, search
Line 2: Line 2:
  
 
PDF is an open standard, and is now being prepared for submission as an ISO standard.  Adobe is an evil company.
 
PDF is an open standard, and is now being prepared for submission as an ISO standard.  Adobe is an evil company.
 +
 +
== PDF Types ==
 +
Consider the two main distinctions in PDF file types, scanned versus native.  A native PDF file is superior to a scanned PDF file in capabilities, flexibility, and efficiency.  This is due to a distinction of true text in the PDF from a PDF that is only images of text.
 +
 +
PDF Types:
 +
* Native
 +
* Scanned
 +
 +
=== Native PDF ===
 +
A native PDF file will contain literal text as part of the structure, including information about the text.  This is not to say that there are no images.  It is to stay that the text itself is actual text and not just part of an image.  A native PDF has an internal structure that can be read and interpreted.  Only a native PDF can utilize all of the capabilities that the format lends to the reader software.
 +
 +
=== Scanned PDF ===
 +
PDF files created by scanning hard-copy documents containing primarily text do not have the same structure as a PDF file of the same document created directly. The scanned document internally contains a picture of the document, with no information about the text. As far as a user can see it is just another PDF file, with a name and extension indistinguishable from any other; a good scan may look exactly the same as a native PDF file, although a visually poor-quality file, often with skewed pages, gives away its nature. However, the file size will be different, and it will not be possible to search for text. For a scan of adequate quality it is possible with suitable software to regenerate the text of the document with Optical character recognition (OCR), and embed it in the file so as to make it searchable, subject to the accuracy of the OCR.
  
 
== PDF Document Viewers ==
 
== PDF Document Viewers ==

Revision as of 10:48, 26 June 2015

The Portable Document Format (PDF) is the file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two-dimensional documents in a device-independent and display resolution-independent fixed-layout document format. Each PDF file encapsulates a complete description of a 2-D document (and, with Acrobat 3-D, embedded 3-D documents) that includes the text, fonts, images, and 2-D vector graphics that compose the document.

PDF is an open standard, and is now being prepared for submission as an ISO standard. Adobe is an evil company.

PDF Types

Consider the two main distinctions in PDF file types, scanned versus native. A native PDF file is superior to a scanned PDF file in capabilities, flexibility, and efficiency. This is due to a distinction of true text in the PDF from a PDF that is only images of text.

PDF Types:

  • Native
  • Scanned

Native PDF

A native PDF file will contain literal text as part of the structure, including information about the text. This is not to say that there are no images. It is to stay that the text itself is actual text and not just part of an image. A native PDF has an internal structure that can be read and interpreted. Only a native PDF can utilize all of the capabilities that the format lends to the reader software.

Scanned PDF

PDF files created by scanning hard-copy documents containing primarily text do not have the same structure as a PDF file of the same document created directly. The scanned document internally contains a picture of the document, with no information about the text. As far as a user can see it is just another PDF file, with a name and extension indistinguishable from any other; a good scan may look exactly the same as a native PDF file, although a visually poor-quality file, often with skewed pages, gives away its nature. However, the file size will be different, and it will not be possible to search for text. For a scan of adequate quality it is possible with suitable software to regenerate the text of the document with Optical character recognition (OCR), and embed it in the file so as to make it searchable, subject to the accuracy of the OCR.

PDF Document Viewers

Evince PDF

35star.png

Windows, FreeBSD, Linux

Evince is a document viewer for multiple document formats. The goal of evince is to replace the multiple document viewers that exist on the GNOME Desktop with a single simple application.

Evince currently supports PDF, Postscript, djvu, tiff, dvi, XPS, SyncTex with gedit, comics books (cbr,cbz,cb7 and cbt), and many more.

Review: Evince opens PDF files into a well laid out reader. The DRM flag is ignored making Evince far more useful than Sumatra PDF or Adobe reader. Loading speed was similar to Sumatra. One notable glitch occurs when text is selected, the text becomes distorted. This can somewhat hinder text selection. It has been reported that the Windows version will only open PDF files. In our test on Microsoft Windows we confirmed Evince was unable to open .epub an eBook format.

The fact that Evince PDF is not handicapped by DRM restrictions makes it far more useful as a PDF reader when compared to Sumatra PDF. For this reason Evince is our choice for a Windows PDF reader.

An annoying flaw in Evidence costs it half a star. On some PDF documents when print is selected, the printer outputs only blank paper. Certain PDF files will not print correctly using Evince. This is a reoccurring problem. Ultimately this is a serious issue with Evidence and results in the software being inadequate.

PDFlite

05star.png

PDFlite can be used to read any PDF file. Simple design. View PDF documents with all common features such as search, print, zoom. Use the PDFlite printer so you can convert any document to PDF file.

PUP alert: Malware in installer. Even if you uncheck the toolbar and other software it still installs PUP in the background! Avoid unless you want to take the time to install it yourself from the sourcecode they provide.

Sumatra PDF

20star.png

Microsoft Windows Only

A minimalistic PDF reader. Sumatra PDF has a minimalistic design, and its simplicity is attained at the expense of many other features. As is characteristic of many portable applications, Sumatra takes up little disk space - it has a 1mb setup file (compared to Adobe Reader's 27.5mb setup file), and it starts up rapidly. It was designed for portable use in the sense that it's just one file with no external dependencies so you can easily run it from external USB drive[1]. This would classify it as a portable application.

One interesting feature of Sumatra PDF is that it remembers exactly the last opened page for each pdf file. This helps it be a very useful pdf e-book reader.

Review: Sumatra PDF contains anti-features. It enforces DRM restrictions. As stated on a Sourceforge review, "it supports DRM of "protected" PDF files, and the author stubbornly refuses to make it optional. So you can't print PDFs for offline reading, and you can't copy text to the clipboard for pasting into Google translate, saving to your notes, quoting in a paper, etc."

The Sumatra PDF software developers are crybabies. Read their little rant about PDFLite is a SumatraPDF ripoff. The title should be Sumatra PDF developers do not understand Open Source.

GhostScript

40star.png

Windows, FreeBSD, Linux

Command Line. Ghostscript is a suite of software. You can view, convert, and manipulate PDF files. Ghostscript is an interpreter for PostScript and Portable Document Format (PDF) files. Postscript can be picky and inconsistent about the PDF files it will open.

Example: view a PDF on Windows XP

gswin32c.exe -dSAFER -dBATCH "C:\Program Files\GPLGS\test3.pdf"

The example will open the pdf document in a GUI window for viewing.

PDF Authoring

PDF Utilities

Linux PDF Tools: tiff2ps and ps2pdf

On Linux the tiff2ps command is part of libtiff-tools. The command line tools in libtiff-tools include tiffcp, tiff2ps', tiffdump and tiffsplit. Windows executables for libtiff-tools can be found at stillhq.com, e.g. http://www.stillhq.com/libtiff/win32/3.5.4/tiffcp.exe and http://www.stillhq.com/libtiff/win32/3.5.4/tiff2ps.exe

The Linux ps2pdf command is part of Ghostscript. Those command line tools are ps2pdf, gs or gswin32 (Win32 version). Ghostscript for Windows is gs651w32.exe

Netpbm for Windows is netpbm-9.19-bin.zip and requires Cygwin.

make pdf: from tiff, Use Tiff to PS (in linux)

tiff2ps *.tiff > tiffs.ps

from PS to PDF

ps2pdf tiffs.ps

Print to PDF in Windows

CutePDF Writer

There is a free version and a more feature rich pay version on their web site, http://www.cutepdf.com/Products/CutePDF/writer.asp

Print to PDF in Linux

One simple option that works in Debian distributions such as the popular Ubuntu Linux is to use cups-pdf.

See: Install and Use cups-pdf in Ubuntu for a detailed guide.

Convert Images to PDF in Windows

Free Image to PDF Converter. Supported formats are BMP, DIB, GIF, JPEG, JPG, JPE, JFIF, PNG, TIFF,TIF. Multiple files to a multi-page PDF. The tool combines multiple directories and images into one PDF.

Installer: PDFdu_Image_To_PDF_setup.exe
Developer Web Site: http://pdfdu.com/app/image-to-pdf-converter.aspx

Convert PDF to Images in Windows

Windows Print Driver: PDF to TIFF

The Virtual Image Printer driver by tariel will allow you to convert a PDF to multiple page image files in several image formats. This is not all The Virtual Image Printer and it is not exclusively for converting PDF to images. However, it is very handy for performing this task under the Windows XP operating system.

GhostScript

The installer "gs915w32.exe" is the Win32 installer as of Dec 2014 for Microsoft Windows 32-bit Operating Systems such as Windows XP. Using GhostScript a PDF can be converted to PNG for example.

 gswin32c.exe -dNOPAUSE -dBATCH -sDEVICE=pnggray -sOutputFile="test.png" "test.pdf"

GhostScript requires a proper PDF. Some PDF files are broken, in that they will open in some viewers, but are not completely compliant with the standard. In short, GhostScript is picky.

References