The truth is rarely pure and never simple

doxygen and sphinx: input and output encoding

When using doxygen together with sphinx via breathe, you may encounter this error

Exception occurred:
  File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
ExpatError: not well-formed (invalid token): line 963, column 201

It took me some time to figure out what the error behind this message is supposed to be. Although doxygen gives UTF-8 XML output, it does not change the content of the source code listing in case you specified the wrong INPUT_ENCODING in the Doxyfile. Thence, the output of doxygen contains byte sequences from other encodings which are invalidating the whole XML file. In the next processing step, expat is complaining with the error message shown above. In my case, INPUT_ENCODING = UTF_8 was the wrong descision (because the umlaut รถ was saved as \xf6), as file prints

$ file -i x.cpp
x.cpp: text/x-c; charset=iso-8859-1

So double-check your settings – and do not trust yourself ๐Ÿ˜‰

Leave a comment

Your email address will not be published.