Create PDF Documents with ImageMagick and Ghostscript
Imagemagick has a command called 'convert' and using it we can convert scanned images into a PDF document.
Ghostscript has the command 'gs' which is obviously for Ghostscript and it can be used to compress the PDF document into a more efficient file size depending on your needs.
You can install imagemagick with apt
sudo apt install imagemagick
From the imagemagick package, use the convert command to perform tasks such as taking a folder of jpg images and creating a single PDF document. If the images are numbered in a way such as 01 02 03 04 05 (use leading zeros) then the page order will concur.
convert *.jpg document.pdf
It also works with png files
convert *.png document.pdf
to compress:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=output.pdf input.pdf
Note the -dPDFSETTINGS paramter which can use a predefined value preceded by a slash. These values are:
- /screen - yields a terrible lowest possible resolution which looks like crap
- /ebook - the lowest you should consider and still pretty bad
- /printer - the one used in the example above, provides good compression and acceptable quality
- /prepress - very high quality
For most applications the /printer option will provide the desired result.
You can then use a tool such as Tesseract to add a searchable text layer over the PDF image text for the purpose of indexing.