Note: This is the English translation of the article first published in German
Intro
The ESE Congress is one of the lead events for Embedded Software Engineering in Germany.
This year it was held digitally for the first time, so that participation was also possible via video. Over five days, there were 3 keynotes and 96 technical presentations from all areas of embedded software development.
Anton Kreuzkamp from KDAB talked about custom code refactoring with clang tooling. Keep reading, for our presentation of his contribution to the ESE conference proceedings.
Good static analysis can save a lot of effort and time. With customized static code analysis, the project code can be checked not only for general programming errors but also for project-specific conventions and best practices. The Clang Compiler Framework provides the ideal basis for this.
The programming language C++ manages the balancing act between maximum performance, which is essential in the embedded sector, on the one hand, and maximum code correctness through a high level of abstraction on the other. The balancing act is achieved by focusing on compile-time checks and opportunities for optimization. Calculations that could be carried out more efficiently by low-level code should not, where possible, be rewritten by the developer, but by the compiler. Additionally, errors should already be excluded during compilation, instead of taking up valuable computing time for checks at runtime.
Clang has become very popular in recent years and has long since established itself as one of the most important C and C++ compilers. This success is due not least to the architecture of Clang itself. Clang is not just another compiler, but a compiler framework. The essential parts of the compiler are a carefully designed library, thus enabling the diverse landscape of analysis and refactoring tools that has already emerged around the framework based on the LLVM project.
The command-line tool, clang-tidy,
offers static code analysis and checks compliance with coding conventions, among other things, but can also refactor code independently. The clang-format
tool can automatically standardize the coding style. The Clazy
tool, which was developed by the author's company, supplements the compiler with a variety of warnings around the Qt software framework and warns of frequent anti-patterns in the use of the same. Many other useful tools exist in the Clang universe, as well. Even integrated development environments, such as Qt Creator or CLion, rely on the Clang Compiler Framework for syntax highlighting, code navigation, auto-completion, and refactoring.
Anyone who knows the tools of the Clang world in their entirety is well positioned as a C or C++ developer. But if you want to get everything out of the technology, that is not the end of the story. The LibTooling library, on which most Clang tools are based, also allows you to create your own customized code analysis and refactoring tools, with little effort.
I'll give you an example. A small but recurring piece of the puzzle of embedded software is the exponentiation of real numbers, mostly with static, natural exponents. Of course, the std::pow
function would be used for this, had it not been determined in extensive profiling that on-the-target architecture std::pow(x,
4)
is many times slower than x*x*x
and forms a bottleneck in particularly performance-critical code. The senior developer of the project has therefore created a template function, usable as utils::pow<4>(x).
And thanks to compiler optimizations, it's just as nimble as the manual variant[1]. Nevertheless, since then the usual std::pow
variant has crept in again at various places in the code, and even several hundred thousand lines of code have not been ported consistently.
The first attempt to automate the refactoring is, of course, the search and replace with a regular expression. std::pow\((.*), (\d+)\)
already finds the simplest cases. But what about the cases where the "std::
" is omitted or the second parameter is more complicated than an integer literal?
[1] Note: On many common platforms the same optimization can be achieved by using the compiler flag -ffast-math
. The compiler will then independently replace the std::pow
call with appropriate CPU instructions.
Installing LLVM and Clang
Those who cannot install Clang or LLVM via your trusted package manager can get the framework via Github. Prerequisites for a successful installation are Git, CMake, Ninja and an existing C++ compiler.
The first steps
As a basis for our own Clang tool, we use a code example from the Clang documentation. Here, it's reduced to the essentials. [1]
With this, we already have the first executable program. CMake makes it easy to create the necessary build scripts. All we have to do is find the Clang
package and link our program to the imported targets clang-cpp,
LLVMCore
und LLVMSupport
:
Using our development environment or the command line, we can now compile our program and run it against our code.
Before we test the newly created tool, it is recommended to install it into the same directory where the clang compiler is located (e.g. /usr/bin
). This is because clang-based tools need some built-in headers, which they look for relative to their installation path, depending on the version, e.g. in ../lib/clang/10.0.1/include
. In the analyzed code, for example, the header stddef.h
would be missing. Those who get errors when starting the program have, in all probability, fallen into this trap.
So far, our tool checks the syntax of the C++ file and throws errors, if, for example, non-existent functions are called. Next, we want to find the code passages that cause our problem.
Find relevant code points with AST matchers
The AST, the Abstract Syntax Tree, is a data structure consisting of a multitude of classes with links to each other, which represents the structure of the code that will be analyzed. For example, an IfStmt
links to an Expr
object that represents the condition of an if statement and a Stmt
object that represents the "then" or the "else" branch.
An AST matcher can be thought of as a regular expression on the AST; it's a data structure that represents and finds a particular pattern in the AST. AST matchers are programmed for Clangs LibTooling in a special syntax. For each type of language construct or node in the AST, there is a function that returns a matcher of the corresponding type. These functions, in turn, take as parameters other matchers that impose additional conditions on the code. Multiple parameters are treated as an AND operation. For instance, the following code snippet creates a matcher that matches function declarations that are called "draw" and have void as the return type.
This fits, for example, the following two declarations:
In order to be able to access the individual parts of the interesting code segment later, the sub matchers can be assigned names with a bind
statement, which can then be used to reference the AST node that matches the matcher. For example, if we want to find function calls whose second argument is an integer literal and want to access this later, we can prepare this with the following matcher:
A complete list of all available matchers can be found at [2].
To speed up the creation of AST matchers, Clang comes with the command line tool clang-query
, which can be used to interactively test matchers and inspect the found AST section. The enable output detailed-ast
command enables the output of the AST section found by the AST matcher, and the match
command creates and starts an AST matcher. The syntax used in clang-query
is similar to the C++ syntax.
The matcher can thus be refined interactively, piece by piece. For our goal of finding calls to std::pow
which can be replaced by a call to the templated function utils::pow
, the following matcher is goal-directed:
This matcher finds function calls to std:: pow, if it has a second argument (index 1) that is an arbitrary expression. The name of the called function is "pow" and the function is defined in the namespace, std. We title the arbitrary expression "exponent," the called function the "callee," and the function call itself is "funcCall."
Analysis, diagnosis and automatic code correction
.
In order to be able to do something with the found code ranges, a MatchCallback
must still be registered to the matcher. The callback is a class we will implement, which is derived from MatchFinder::MatchCallback
and implements the method run(const MatchFinder::MatchResult &Result)
. This is where our analysis of the found code snippets takes place. In addition, we define a SupercedeStdPowAction
class, which (in order to be able to apply our code fixes later) derives from the FixitAction
class and contains both our MatchCallback
and a MatchFinder
, through which we can initiate the search of the AST. Finally, we replace the clang::SyntaxOnlyAction
in the main
function with our SupercedeStdPowAction
.
We now fill the function StdPowChecker::run
with our actual check code. First, we can get the AST nodes as pointers using the names assigned to the sub matchers:
The objects obtained by this means provide extensive information about the entities they represent, for example, the number, names and types of the function parameters; the type and value-category (LValue-/RValue) of the expression; and the value of an integer literal. In addition to the value of a literal, the value of any expression can also be queried if it is known at compile time. In our case, we are interested in whether the second argument could also be in a template parameter. For this, the expression must be constexpr
. exponent->isCXX11ConstantExpr(*result.Context)
gives us the answer. If the answer is true
, we know that utils::pow
is applicable and the more performant alternative.
In order to issue a warning, as we know it from compiler warnings, we use the so-called DiagnosticsEngine
, which we can access via the AST context:
If we want to not only to warn, but directly improve the code, we can add a so-called FixitHint
to the report. In our case, we need to reorder the arguments of the function call. To do this, we need the code of the arguments as a string. This can be achieved with the following code:
From this, we can build a FixitHint
by taking the character range of the function call as input and using the argument code to assemble the new code. We can pass the FixitHint
created in this way via the stream operator to the diagnostic object that the DiagEngine.Report()
call created earlier. llvm::Twine
helps to assemble strings efficiently.
The Practical Test
After putting all parts together and compiling the code, we would also like to test our result on actual code. To avoid making it too easy for Clang, we pass a macro and a call to a function to std::pow
, each of which can be deduced to an integer constant. In addition, we alias the standard namespace and call std::pow
via it.
If the software we're analyzing also uses CMake as a build system, then we can get it to create a so-called compilation database with the parameter, DCMAKE_EXPORT_COMPILE_COMMANDS=ON
, which our Clang tool can use to get the necessary include paths and compiler flags. We pass this database to our tool by passing the build directory where we previously ran CMake as a parameter. If this is not available, we can manually pass the compiler parameters to the tool by appending double hyphens, followed by the compiler parameters, after the source files that will be analyzed.
Conclusion
Putting all the pieces together, we have created a refactoring tool that is tailored to our project-specific needs, with just under 100 lines of code. Unlike a purely text-based refactoring tool, our implementation is capable of interpreting macros, aliases, and constexpr
expressions. With Clang's LibTooling as the foundation, the whole world of static code analysis and full code understanding is at our disposal. Via use of ASTContext,
we have access to symbol tables. And with a single call to the CFG::buildCFG
function, we can generate a control flow graph from the AST. The Preprocessor
class allows us to inspect macro expansions and includes. In the other direction, clang::EmitLLVMOnlyAction
gives us access to the LLVM Intermediate Representation, a language and machine independent abstraction of the generated machine code.
To get an overview of the capabilities of Clang's internal libraries, the "Internals Manual" of the Clang documentation [3] is recommended. The complete code of the refactoring tool created in this article can be found at [4].
Bibliography
- https://clang.llvm.org/docs/LibTooling.html
- https://clang.llvm.org/docs/LibASTMatchersReference.html
- http://clang.llvm.org/docs/InternalsManual.html
- https://github.com/akreuzkamp/kdab-supercede-stdpow-checker
Author
Anton Kreuzkamp is a software developer at KDAB, where he develops, among other things, tooling for the analysis of C++ and Qt-based software and works as a trainer and technical consultant. KDAB is one of the leading software consulting companies for architecture, development and design of Qt, C++ and OpenGL applications on desktop, embedded and mobile platforms. KDAB is also one of the largest independent contributors to Qt. KDAB's tools and extensive experience in building, debugging, profiling and porting complex applications help developers worldwide to realise successful projects.
Do you need similar support?
If you want to solve a similar problem in your software project, don't hesitate to
Contact Us
Trusted software excellence across embedded and desktop platforms
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.