The truth is rarely pure and never simple

Find unused images in a LaTeX document

Do you know this problem: A paper draft has seen changes by many different authors over an extended period of time and now has a bunch of old figure files that still lurk around in the file system but that are no longer relevant to the manuscript? Here’s a way to clean up that does not involve hacky parsing of the LaTeX file.

If you use pdflatex, the run will provide you with a *.fls file which contains one line for each potential input file. This makes sure all \input directives and \graphicspath directives and commented-out lines are honored correctly. I use latexmk -pdf to run pdflatex. After doing so, this command would give you a list of the files that are present in the current directory and subdirectories but have not been used by pdflatex:

comm -23 <(realpath $(find . -type f) | sort) <(realpath $(grep INPUT *.fls | awk '{ print $2 }') | sort)

Admittedly, it used bash-only features, but the concept is the following. The first argument to comm gives a list of the full paths of all files in this and all subdirectories. The second argument to comm finds all used files in *.fls and builds a list of their absolute paths. Then comm prints all entries from the first set that are not in the second one. Here is an example

$ realpath --relative-to=. $(comm -23 <(realpath $(find . -type f) | sort) <(realpath $(grep INPUT *.fls | awk '{ print $2 }') | sort))
Figures/energy_diagram-crop.pdf
Figures/energy_diagram.pdf
Figures/fromX.pdf
Figures/hist_long.pdf
Figures/intdiste2.pdf
Figures/intdistsn2.pdf
Figures/internal.pdf
Figures/main.bib
Figures/nreactpath.pdf
Figures/overmatrE2.pdf
Figures/overplotSN2.pdf
Figures/toY.pdf
Figures/TS_geometries_2.pdf
Figures/ts_validation.pdf
Figures/val_sn2_e2.pdf
literatur.bib
main.bib
main.blg
main.fdb_latexmk
main.fls
main.log
mainNotes.bib
main.pdf
special.bib

Leave a comment

Your email address will not be published.