(NOTE: this blog post has been edited many times since its original publication)
You've heard of valgrind before, its default tool (memcheck) is such a life saver, being able to detect memory-related bugs in your code (leaks, double deletions, use of deleted memory, use of uninitialized memory, etc.).
Well, it turns out that valgrind also comes with a tool to detect race conditions between threads, in multithreaded applications. That tool is called helgrind.
Alternatives
Before we talk about helgrind, please know that these days I recommend thread sanitizer (TSAN) as the primary way to detect data races (it makes the application run much faster than helgrind, and doesn't have false positives). But it requires a 64-bit architecture, so if you're on 32-bit, or if you don't want to use TSAN for some reason, here's a howto about helgrind.
Basic usage
In theory, provided that you're on a Unix platform, using helgrind is as simple as
However, if you do just that on a Qt application, you'll end up digging through lots of false positives, making this a rather painful experience. So here's more information about the necessary steps to debug Qt applications with helgrind.
Suppressions
It wasn't possible to make helgrind perfect for Qt. In particular, helgrind has no way to distinguish a raw store to an int, from the use of an atomic store on the int, because on x86 there is no difference. For this reason, I used the poor man's solution: defining suppressions for all uses of the Qt Atomic classes. If we can already use helgrind to fix all the abuse of normal (non-atomic) variables in multithread apps, it's already a huge step forward, even if we (wrongly) tell it that "any use of the atomic api is fine".
Note that I only ever tested these suppressions on a debug build of Qt. I don't know if they work on a release (optimized and stripped) build of Qt.
The export should probably go into your ~/.zshrc or ~/.bashrc, so you have it set up once and for all.
Helgrind alias
In addition to detecting race conditions, helgrind also tried to detect potential deadlocks due to wrong locking order (A+B vs B+A). However the QOrderedMutexLocker trick in Qt confuses helgrind because of its interesting use of tryLock(), so the lock order feature of helgrind has to be disabled for now, using --track-lockorders=no. See bug 243232. EDIT: as shown in the bug report, Milian added a suppression for this too, in kde.supp, so you don't really need to pass this command-line option anymore.
The default event dispatcher in Qt uses the glib event loop, which has its own races, which we're not really interested in. Easy solution: export QT_NO_GLIB=1
For these two reasons, I recommend to add this line in your ~/.zshrc or ~/.bashrc:
My contributions (historical note)
In case you're curious, I fixed the following issues in Qt:
- QFuture: race on d->state, fixed in Qt 5.0 (commit 7120cf16d)
- QThreadDataPrivate: canWait race, fixed in Qt 5.1 (commit bf3a5cc) (backported to Qt 4.8.5 in commit 815d7f0)
- QThread: race when setting the eventDispatcher, fixed in Qt 5.1 (commits f4609b2 and 85b25fc)
- QEventDispatcherUNIX: race on the interrupt bool, fixed in Qt 5.1 (commit 49d7e71)
- QEventLoop::exec()/exit() race, fixed in Qt 5.1 (commit 5a5a092)
- QThreadPool: races in activeThreadCount(), fixed in Qt 5.2 (commit 85b24bb2de)
- QThreadPool: race at time of thread expiry, fixed in Qt 5.3.0 (commit a9b6a78e54
- qfreelist: race on v[at].next, fixed in Qt 5.3.1 (commit 8636bade17)
- qDebug: race on QLoggingCategory, fixed in Qt 5.3.2 (commit 884b381576)
- qDebug: race in qt_message_print, fixed in Qt 5.3.2 (commit 9ee27005ee)
- QJpegHandler: race condition due to static variable, fixed in Qt 5.5 (commit 211c6f3dc7)
- QThread/QThreadData: two races in destructors, fixed in Qt 5.6 (commit ec6556a2b9)
- QSignalSpy: race between wait() and emit from another thread, fixed in Qt 6.8 (commit c837cd7593)
- QObjectPrivate: race on ConnectionData contents, fixed in Qt 6.8 (commit 75d82afa0d)
The older your version of Qt, the more of these issues might show up.
For Qt 6.4.0 my colleague Giuseppe added TSAN annotations to QMutex (see this change in Qt) which means that you no longer need to rebuild Qt with TSAN enabled when using TSAN on your own application.
5 Comments
30 - Oct - 2014
Neil
Hello, I am using QT5.2.1 and was just wondering if I needed to patch anything with valgrind or qt in order to use helgrind effectively as it seems like the paper was written for <QT5.2? Currently, it seems like helgrind gives me a lot of errors that targets QT and not particularily with the code that's been written.
30 - Oct - 2014
dfaure
Qt 5.2.1 is definitely fine. But you should make sure you're following the steps in the howto and you're using the latest kde.supp file from kde-dev-scripts git master. I fixed a few more things in there yesterday...
If you still get races from within Qt, it might be that some later patches need to be backported, we keep finding and fixing races inside Qt itself :-)
Email me (david.faure at kdab.com) the first race detected by helgrind and I'll be able to tell you where it comes from.
5 - Nov - 2014
Neil
Hello dfaure, I don't see anything in the howto that applies to qt 5.2. Would I need to follow the steps for 5.1? Is that okay? After I do this, I would email you the first race! Thank you!
5 - Nov - 2014
dfaure
The howto does mention Qt 5.2:
"If you're using Qt 5.2 or Qt 4.8.6 you can skip this section."
9 - Apr - 2024
Paul Floyd
Valgrind also contains a second thread hazard detection tool, DRD. Roughly speaking the difference is that Helgrind keeps a history of memory accesses so that it can report the contexts of all threads. DRD only stores the synchronization events so you only get the context of the thread where DRD saw a conflict. Not saving all that history makes DRD faster.
If you find any problems then please report them to https://bugs.kde.org. Fixing problems can take a long time though!