One of the useful outcomes the work Bertjan did on tooling for program understanding and refactoring is a list of considerations we can use to assess the suitability of new tools.
Requirements for a porting system
Section 1.3.5 of his thesis details the requirements for a similar porting system:
GR1: Scalability
The qt4to5 porting tool is scalable. It is designed by Google engineers to be used on large codebases, and to operate on multiple translation units at a time. As the tool is based on a compiler, if you have the resources to compile it, you have the resources to port it.
GR2: C++ Language understanding
The tool has a full understanding of the C++ language. This does not really mean that everything can easily be ported however. For example:
In cases where T is a QRect
, the intersect()
should be changed to intersected()
, but not if T is a QSet
. This is determined by the caller, and the porting tool can not port such cases automatically (see also FX 6).
However, the intent of the tool is not to port everything automatically, but to port the boring and easily automatable parts automatically.
GR1: Simplicity of use
The tool works, but the command line arguments are not very convenient, and
probably never can be. I've written a python script to make it easier, but a
graphical tool would be better:
The graphical tool would be able to tell the engineer using it the refactoring
steps that can be done before starting the switch to Qt 5 (eg porting away
from Qt 3 support), so that it can be recompiled in between.
A Qt based tool could also link statically to the tooling framework and
avoid calling external processes. It would be possible to integrate such tooling into Qt creator.
The tooling is integrated with CMake already. The tool needs access to the
actual command line that should be used to build each compilation unit.
CMake provides that when compiled with -DCMAKE_EXPORT_COMPILE_COMMANDS
since
CMake 2.8.5.
Currently it only works with the CMake "Unix Makefiles" generator and Ninja generator. The feature of creating such a compilation database could similarly be added to qmake presumably, but that has compilications
because for some porting steps you need to be compiling against Qt 4 (see FX 1 below), so it would need to be added to the qmake in Qt4 and probably Qt 5 too.
Alternatively, I also created a compiler wrapper which creates such a compilation database when invoked. There may even be a simpler solution by using the porting tool as the actual compiler.
GR4: Customizability
Although the porting tool can port enums, the upstream stuff in the clang
repo can not.
I had to extend it, which was quite trivial:
I created the EnumeratorConstant functor which can then be used to match AST
nodes of type EnumConstantDecl, which
basically means use of an enum.
That EnumeratorConstant can then be used just like the rest of the syntax
for matching, can contain inner matchers, etc:
will find all used of QSsl::SslProtocol::TlsV1
.
So, the tooling and API is fully extensible.
GR5: Predictable minimal impact
The tool only makes changes on parts of the code that are specified in the
matching expressions (such as the enum matcher above).
Therefore, the predictability of the impact of running the tool depends on the exactness of the
matching expressions.
The tool might also try to edit your system headers for example if renaming
a virtual method or changing its arguments. To prevent that, the tool takes a source-dir as an
argument and verifies that edited files are below the provided source dir. That feature is not fully used yet
however, and there is scope for clang upstream to add such a feature to ensure correct support for symlinks and relative paths etc.
GR6: Transparency
The python wrapper script provided in the repository creates git commits with commit messages
describing the change being made at each step (micro commits).
The user of the tool can review all commits in gitk, or build any step by
checking it out (or build all steps by running a git script).
Any other graphical user tool could do the same thing. The result of such an execution of the script can be seen in kdelibs.
Requirements of a fact extraction system
Section 2.2 details requirements for a fact extraction system. This is a necessary
component of a porting tool, as it relies on having accurate information to complete a port.
However, not every requirement of a fact extraction system is a requirement of a porting tool.
FX1: Fault tolerance
The tool has some fault tolerance, which comes from clang. If you remember
Chandlers GoingNative talk, he mentioned that if clang encounters an error, it tries to ignore it and keep processing.
Indeed, the python porting tool currently only runs cmake, but not make, which means that cmake dependency scanning and file generation is not
invoked. That means that moc files and ui_ files are not generated. When
running the tool clang will give an error about missing includes for the moc files, but will continue to
process the file.
However, while there is some fault tolerance, the specific code to be ported
does have to be correct. I think mostly we'd be porting code which already
compiles with Qt 4, so I don't think fault tolerance is a huge problem for a
porting tool (as opposed to a fact extraction system).
FX2: Completeness and correctness
All parseable code in the software targetted for porting is extracted correctly, because it is based on a compiler.
However, this is only true for one particular target platform at a time (see also FX 9).
When used on linux, anything in #ifdef Q_OS_WIN
for example will not be
ported. It may be possible to use Wine headers or a cross compilation build to port such code on a linux host.
FX3: Compliance
The parser is compiliant to at least most of C++03 and parts of C++11. The
C++11 parts might be relevant if we ever have to port code which
uses C++11 and Qt 4, which is not unheard of.
While clang can find C++11 features such as lambdas, the new tooling system does not yet have API for matching or processing it.
FX4: Cross references
This item is not relevant to our porting tool. This is only relevant to fact extraction
systems to for example find the amount of uses of a method. The porting tool
compiles each translation unit in isolation.
In our case the fact extraction framework is clang, and it extracts all
available information.
FX5: Preprocessing
My investigation of the new clang tooling APIs did not include playing with preprocessor constructs much,
but I know that clang stores all information extracted from the preprocessor.
FX6: Coverage
The C++ grammar is partially context dependent, and that can lead to ambiguity as described before when using templates for example:
Is T a QImage or a QPaintDevice?
FX7: Output completeness
This item is not relevant to porting tools. This is only relevant to the fact extraction
framework. In our case the fact extraction framework is clang, and it
extracts all available information.
FX8: Performance and scalability
The python wrapper script provided in the repository assumes the existance of a git repository, and uses git grep
to find uses of a method which needs to be ported. This way not all files need to be processed for each step of porting, but only the ones which contain
something to be ported or a false positive (eg QSet::intersect).
This works well for renaming, but perhaps other tricks and heuristics would
be needed for more complex porting steps. For example, grep can't find
QAtomicInt::operator int(), but if we grep for QAtomicInt, which might be
used in a header file, then grep for uses of that header file we might get
all relevant files that need to be ported to use QAtomicInt::loadAcquire()
.
Any user interface tool should keep this git dependency to create
the clean patches. Even where code bases which do not use git exist,
the patches can still be created and reviewed in a local git repo created with git init
.
FX9: Portability
Clang can currently only generate code on Unix systems, so it can't be used to
generate Windows binaries.
There have been some windows specific patches on the
mailing list though, and it's possible that the parser works just fine on
windows (the only part we need).
This still needs to be fully investigated.
FX10: Availability
clang and the required tooling for the porting tool are covered by a weak copyleft licence. This is not reversible, but does not prevent the code from being part of proprietry forks. The license choice assumes that doing so is not worth it anyway as a social measure if not a legal one.
The actual code of the porting tooling is currently in a branch, not in clang trunk, which
is a minor temporary inconvenience.