Difference between revisions of "Create PDF Documents with ImageMagick and Ghostscript"

From Free Knowledge Base- The DUCK Project: information for everyone
Jump to: navigation, search
(Created page with "Imagemagick has a command called 'convert' and using it we can convert scanned images into a PDF document. Ghostscript has the command 'gs' which is obviously for Ghostscript...")
 
m
 
(One intermediate revision by one user not shown)
Line 22: Line 22:
  
 
For most applications the /printer option will provide the desired result.
 
For most applications the /printer option will provide the desired result.
 +
 +
Although you can then use a tool such as [[Tesseract]] to add a searchable text layer over the PDF image text for the purpose of indexing, it does not work directly on a PDF document itself.
  
  

Latest revision as of 11:15, 3 October 2019

Imagemagick has a command called 'convert' and using it we can convert scanned images into a PDF document.

Ghostscript has the command 'gs' which is obviously for Ghostscript and it can be used to compress the PDF document into a more efficient file size depending on your needs.

You can install imagemagick with apt

sudo apt install imagemagick

From the imagemagick package, use the convert command to perform tasks such as taking a folder of jpg images and creating a single PDF document. If the images are numbered in a way such as 01 02 03 04 05 (use leading zeros) then the page order will concur.

convert *.jpg document.pdf

It also works with png files

convert *.png document.pdf

to compress:

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=output.pdf input.pdf

Note the -dPDFSETTINGS paramter which can use a predefined value preceded by a slash. These values are:

  1. /screen - yields a terrible lowest possible resolution which looks like crap
  2. /ebook - the lowest you should consider and still pretty bad
  3. /printer - the one used in the example above, provides good compression and acceptable quality
  4. /prepress - very high quality

For most applications the /printer option will provide the desired result.

Although you can then use a tool such as Tesseract to add a searchable text layer over the PDF image text for the purpose of indexing, it does not work directly on a PDF document itself.