The truth is rarely pure and never simple

calculate PDF coverage

Sometimes it is useful to calculate the area covered by ink for a given document e.g. when estimating the total printing costs. I played around with standard tools and to my knowledge the following way is the easiest yet simplest assuming that you are working on a linux machine.

First of all, you need to render the PDF or PS file as it may (should) contain vector graphics. The most reliable tool is ghostscript. In the same step, we can reduce the colorspace to grayscale.

gs -sDEVICE=pnggray -dNOPAUSE -dQUIET -r300 -sOutputFile=page%d.png INPUTFILE

This will produce a separate PNG file for each page. Afterwards, this short python script will do the rest:

#!/usr/bin/env python
import Image, sys

if len(sys.argv) == 1:
    print 'Filename(s) missing.'
    exit(4)

values = []
files = sys.argv[1:]
for f in files:
    im = Image.open(f)
    s = 1.0
    for i in im.getdata():
        s += (255-i)

    s /= (im.size[0]*im.size[1])
    values.append(s)

print sum(values)/len(values)/3

You will have to specify the files to read by passing them as command line parameters.

Technical remark: Yes, the python bindings of cairo could render the PDF as well – but they are terribly slow when compared to ghostscript.
Another technical remark: You may adjust the resolution to your needs. However, low resolutions yield less accurate numbers. For the documents I tested, low resolutions overestimate the actual coverage.

Leave a comment

Your email address will not be published.