Although scanned images tend to be quite large, books mostly consist of text. Hence, monochrome representations are still legible. Unfortunately, it can be tricky to get a small PDF from your input file. Here is (one) easy way to do it:
gs -sDEVICE=pnggray -dNOPAUSE -dQUIET -r120 -sOutputFile=data/m%d.png inputfile.ps
Yes, there is pngmono, but this device has issues with rescaled text. Now the folder foo contains a number of single pages. If you enter this directory, the following command (assuming you to have imagemagick installed) creates even more but smaller png files
for i in m*.png;
do
ID=$(echo $i | sed 's/^m//;s/\..*//');
convert $i -monochrome s$ID.png; echo $ID '\includegraphics{./s'$ID'.png}';
done | sort -n | cut -d" " -f 2
This command prints the lines you will have to enter into this LaTeX template
\documentclass[a4paper,10pt]{scrartcl}
\usepackage[utf8]{inputenc}
\usepackage{graphics}
\usepackage{geometry}
\geometry{top=0cm,bottom=0cm,left=0cm,right=0cm,nohead,nofoot}
\begin{document}
\pdfcompresslevel=9
\begin{center}
\includegraphics{./s1.png}
\includegraphics{./s2.png}
\includegraphics{./s3.png}
\includegraphics{./s4.png}
\includegraphics{./s5.png}
\end{center}
\end{document}
Compiling this document with pdflatex gives you your final file. I know, that this method is unusual (using ghostscript, imagemagick, pdflatex), but on most machines, these tools should be available without further installation. I just tried this on a 640 page file. Here are the reference file sizes:
| Method (Format) | File size |
|---|---|
| original file (PS) | 767 MB |
| CUPS export (PDF) | 106 MB |
| pdfsizeopt (PDF) | 72 MB |
| this method (PDF) | 22 MB |