Create PDF Documents with ImageMagick and Ghostscript

From Free Knowledge Base- The DUCK Project: information for everyone
Jump to: navigation, search

Imagemagick has a command called 'convert' and using it we can convert scanned images into a PDF document.

Ghostscript has the command 'gs' which is obviously for Ghostscript and it can be used to compress the PDF document into a more efficient file size depending on your needs.

You can install imagemagick with apt

sudo apt install imagemagick

From the imagemagick package, use the convert command to perform tasks such as taking a folder of jpg images and creating a single PDF document. If the images are numbered in a way such as 01 02 03 04 05 (use leading zeros) then the page order will concur.

convert *.jpg document.pdf

It also works with png files

convert *.png document.pdf

to compress:

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=output.pdf input.pdf

Note the -dPDFSETTINGS paramter which can use a predefined value preceded by a slash. These values are:

  1. /screen - yields a terrible lowest possible resolution which looks like crap
  2. /ebook - the lowest you should consider and still pretty bad
  3. /printer - the one used in the example above, provides good compression and acceptable quality
  4. /prepress - very high quality

For most applications the /printer option will provide the desired result.

Although you can then use a tool such as Tesseract to add a searchable text layer over the PDF image text for the purpose of indexing, it does not work directly on a PDF document itself.