Dynamic analysis with Clang¶
This document describes how to use Clang to perform analysis on Python and its libraries.
This document does not cover interpreting the findings. For a discussion of interpreting results, see Marshall Clow’s Testing libc++ with -fsanitize=undefined. The blog posting is a detailed examinations of issues uncovered by Clang in libc++.
The document focuses on Clang, although most techniques should generally apply to GCC’s sanitizers as well.
The instructions were tested on Linux, but they should work on macOS as well. Instructions for Windows are incomplete.
What is Clang?¶
Clang is the C, C++ and Objective C front-end for the LLVM compiler. The front-end provides access to LLVM’s optimizer and code generator. The sanitizers - or checkers - are hooks into the code generation phase to instrument compiled code so suspicious behavior is flagged.
What are sanitizers?¶
Clang sanitizers are runtime checkers used to identify suspicious and undefined behavior. The checking occurs at runtime with actual runtime parameters so false positives are kept to a minimum.
There are a number of sanitizers available, but two that should be used on a regular basis are the Address Sanitizer (or ASan) and the Undefined Behavior Sanitizer (or UBSan). ASan is invoked with the compiler option -fsanitize=address, and UBSan is invoked with -fsanitize=undefined. The flags are passed through CFLAGS and CXXFLAGS, and sometimes through CC and CXX (in addition to the compiler).
A complete list of sanitizers can be found at Controlling Code Generation.
Note
Because sanitizers operate at runtime on real program parameters, its important to provide a complete set of positive and negative self tests.
Clang and its sanitizers have strengths (and weaknesses). Its just one tool in the war chest to uncovering bugs and improving code quality. Clang should be used to complement other methods, including Code Reviews, Valgrind, etc.
Clang/LLVM setup¶
Pre-built Clang builds are available for most platforms:
On macOS, Clang is the default compiler.
For mainstream Linux distros, you can install a clang package. In some cases, you also need to install llvm separately, otherwise some tools are not available.
On Windows, the installer for Visual Studio (not Code) includes the “C++ clang tools for windows” feature.
You can also build clang from source; refer to the clang documentation for details.
The installer does not install all the components needed on occasion. For example, you might want to run a scan-build or examine the results with scan-view. If this is your case, you can build Clang from source and copy tools from tools/clang/tools to a directory on your PATH.
Another reason to build from source is to get the latest version of Clang/LLVM, if your platform’s channels don’t provide it yet. Newer versions of Clang/LLVM introduce new sanitizer checks.
Python build setup¶
This portion of the document covers invoking Clang and LLVM with the options required so the sanitizers analyze Python with under its test suite.
Set the compiler to Clang, in case it’s not the default:
If you want to use additional sanitizer options (found in Clang documentation), add them to the CFLAGS variable. For example, you may want the checked process to exit after the first failure:
Then, run ./configure with the relevant flags:
ASan: --with-address-sanitizer --without-pymalloc
UBsan: --with-undefined-behavior-sanitizer
The --without-pymalloc option is not necessary (tests should pass without it), but disabling pymalloc helps ASan uncover more bugs (ASan does not track individual allocations done by pymalloc).
It is OK to specify both sanitizers.
After that, run make and make test as usual. Note that make itself may fail with a sanitizer failure, since the just-compiled Python runs during later stages of the build.
Build setup for enabling sanitizers for all code¶
Some parts of Python (for example, _testembed, _freeze_importlib, test_cppext) may not use the variables set by configure, and with the above settings they’ll be compiled without sanitization.
As a workaround, you can pass the sanitizer options by way of the compilers, CC (for C) and CXX (for C++). This is used below. Passing the options through LDFLAGS is also reported to work.
For ASan, use:
And for UBSan:
It’s OK to specify both sanitizers.
After this, run ./configure, make and make test as usual.
Analyzing the output¶
Sanitizer failures will make the process fail and output a diagnostic, for example:
If you are using the address sanitizer, an additional tool is needed to get good traces. Usually, this happens automatically through the llvm-symbolizer tool. If this tool is not installed on your PATH, you can set ASAN_SYMBOLIZER_PATH to the location of the tool, or pipe test output through asan_symbolize.py script from the Clang distribution. For example, from Issue 20953 during compile (formatting added for clarity):
Note
If asan_symbolize.py is not installed, build Clang from source, then look in the Clang/LLVM build directory for it and use it directly or copy it to a directory on PATH.
Ignoring findings¶
Clang allows you to alter the behavior of sanitizer tools for certain source-level by providing a special ignorelist file at compile-time. The ignorelist is needed because it reports every instance of an issue, even if the issue is reported 10’s of thousands of time in un-managed library code.
You specify the ignorelist with -fsanitize-ignorelist=XXX. For example:
my_ignorelist.txt would then contain entries such as the following. The entry will ignore a bug in libc++’s ios formatting functions:
As an example with Python 3.4.0, audioop.c will produce a number of findings:
One of the function of interest is audioop_getsample_impl (flagged at line 422), and the ignorelist entry would include:
Or, you could ignore the entire file with:
Unfortunately, you won’t know what to ignorelist until you run the sanitizer.
The documentation is available at Sanitizer special case list.