Although scanned images tend to be quite large, books mostly consist of text. Hence, monochrome representations are still legible. Unfortunately, it can be tricky to get a small PDF from your input file. Here is (one) easy way to do it:
gs -sDEVICE=pnggray -dNOPAUSE -dQUIET -r120 -sOutputFile=data/m%d.png inputfile.ps
Yes, there is pngmono, but this device has issues with rescaled text. Now the folder foo contains a number of single pages. If you enter this directory, the following command (assuming you to have imagemagick installed) creates even more but smaller png files
for i in m*.png; do ID=$(echo $i | sed 's/^m//;s/\..*//'); convert $i -monochrome s$ID.png; echo $ID '\includegraphics{./s'$ID'.png}'; done | sort -n | cut -d" " -f 2
This command prints the lines you will have to enter into this LaTeX template
\documentclass[a4paper,10pt]{scrartcl} \usepackage[utf8]{inputenc} \usepackage{graphics} \usepackage{geometry} \geometry{top=0cm,bottom=0cm,left=0cm,right=0cm,nohead,nofoot} \begin{document} \pdfcompresslevel=9 \begin{center} \includegraphics{./s1.png} \includegraphics{./s2.png} \includegraphics{./s3.png} \includegraphics{./s4.png} \includegraphics{./s5.png} \end{center} \end{document}
Compiling this document with pdflatex gives you your final file. I know, that this method is unusual (using ghostscript, imagemagick, pdflatex), but on most machines, these tools should be available without further installation. I just tried this on a 640 page file. Here are the reference file sizes:
Method (Format) | File size |
---|---|
original file (PS) | 767 MB |
CUPS export (PDF) | 106 MB |
pdfsizeopt (PDF) | 72 MB |
this method (PDF) | 22 MB |