With Wings As Eagles: Craig P. Steffen's Blog

copy that into your spell book

2007 August 06 23:49

Using software under Linux is quite often like owning a vintage car (I've done both). There are things that don't quite work the way you think they would, and so you have to sort out how to do something new.

There is most likely somewhere an open source software package that will scan in a bunch of images from a scanner and put the resulting images together into a pdf. The machine I have hooked up to my scanner is Knoppix, and so the scanning software is pretty basic. I scanned the pages of the document that I wanted as .png files, and then moved them to my laptop to put them together.

The imagemagick tool "convert" is quote capable of this. Here is the commands I used to batch images together into pdfs of four pages at a time:

convert -crop 3250x2550+257+0 kscan_0001.png kscan_0002.png kscan_0003.png kscan_0004.png -density 300 -adjoin mp1.pdf convert -crop 3250x2550+257+0 kscan_0005.png kscan_0006.png kscan_0007.png kscan_0008.png -density 300 -adjoin mp2.pdf

The -crop directive cut off some artifacts from the left side of the scans. the "-density 300" is to tell "convert" that the photos are scanned at 300 dpi, to get the embedded image characteristics right. The reason that I didn't just put all 14 scan pages on the same commmand line is that convert seems to have trouble doing that without using so much RAM that it brings the machine to a halt. I tried to use convert to put all the pdf files together at the end of the process, and apparently, due to the odd dpi setting, it couldn't do it.

Fortunately, this problem was no problem for my google-fu. I found a page that talks about contatenating pdf files: The ghostscript option they listed worked for me, it only used a little bit of RAM for a 14 page document, and the output file is exactly what I wanted. Here was the final version of the command I used:

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf mp1.pdf mp2.pdf mp3.pdf mp4.pdf

There's something for your toolbox.