Today's blog post is about something that should be simple and apparently it causes trouble: how to declare a qHash
overload for a custom datatype. This is necessary when we want to use custom datatypes as keys in a QHash. From the documentation:
In other words, given a class like:
How do we implement its qHash()
overload? Let's find out! (Most of this applies to std::hash
too.)
Look at its operator==
The first thing to look at is the implementation of operator==
(or operator<=>
, coming C++20).
Assuming the implementation is sane, then the code will tell us which members are relevant for equality and which ones are not (caches, tags, whatever). The ones not relevant shall not be hashed (otherwise, you might end up with objects that compare equal and have different hashes).
If the type in question has a botched operator==
, then either fix it if possible, otherwise kill the corresponding qHash()
overload via = delete
. This will avoid having people re-adding it in client code.
If the type does not have an operator==
, or cannot have a meaningful one, do not provide qHash()
at all.
What's a "sane" implementation? Generally one that recursively uses plain operator==
on the member subobjects. Fuzzy comparisons are a bad idea, making the type not hashable efficiently.
Prepare for Qt 6 (optional)
Optional step: prepare for some source-incompatible changes in Qt 6 (the impact is still unknown; possibly just warnings). Qt 6 changed the return type of qHash()
from uint
to size_t
, so you can guard yourself with something like:
Declaring your qHash
overload
The idiomatic approach is:
The reason for going all the trouble with 1+2 is because friend declarations cannot have default parameters unless you're also defining the function (cf. [dcl.fct.default]). Although QHash
will never call your qHash
overload without a seed, you may want to unit test it, call it from your own containers that aren't seeded, etc.; so it's just "polite" to have the second argument defaulted.)
In case your implementation is entirely inline, you can skip 1+2 and go just with 3, adding the default argument there, and defining (not just declaring) qHash
in the body of the class. This would also make the qHash a hidden friend, which would be generally a good thing (see here, here and here for more information about hidden friends).
Using just 3 is not doable in general (e.g. if MyClass
is pimpled), in which case your qHash
must be out of line, so you have 1+2+3.
Whatever way you go, this will put your qHash
overload in the surrounding namespace of your class, which is what you want (qHash
is called via argument-dependent lookup in Qt).
Implementing your qHash
overload
Then, implement it (inline or outline depending on MyClass
). The idiomatic way to do so is to look at your operator==
:
Build your qHash
exactly like it: on the same fields, in the same order. This explains why you want qHash to be a friend, just like operator==
, to spare going through accessors:
QtPrivate::QHashCombine
is private API, but does not require any special buildsystem magic; it's in <qhashfunctions.h>
, a public header.
The same result can be obtained with the convenience that I have added in Qt 6:
Reusing the same fields as operator==
, in the same order, ensures the casual reader of the code that your hashing is correctly implemented.
If two objects are to be considered identical even if they have their data in a different order, then use QHashCombineCommutative
/ qHashMultiCommutative
. For instance, you must use the commutative versions if you have a class that represents a set of elements, and two sets are equal if they have the same elements, even if stored in a different order. This again can be detected by looking at operator==
and spotting a call to std::is_permutation
or similar.
However, if your class simply uses a set, then in your operator==
you will compare those sets via their own operator==
, which means that hashing for your class will still use the non-commutative qHashMulti
.
Things not to do in your qHash
implementation
Summing
A simple sum usually doesn't spread different values apart enough for the hashing to be effective.
This cancels information out (the seed, the fields themselves).
Getting creative with magic numbers
Completely "magic" prime sequence, raising many questions on the values used.
Being out of sync with your operator==
This is the same datatype as the previous example. How can you know, from a quick glance, if the qHash
overload is correct? (Not good, just correct; equal objects yield equal hashes.)
Creating lots of unnecessary copies/temporaries
This is because qMakePair
copies. Not necessarily a big deal in case of small, trivially copiable types; but qHashMulti
would just lead to cleaner code, and possibly spread the hashing more equally.
Hashing Qt datatypes for which Qt does not provide a qHash
overload
All the above discussion refers to the case where the datatype is a user-defined one. Sometimes we may need to build a QHash
using a Qt datatype as a key, and we find out that the data type does not have a qHash
overload. What do we do in this case?
- Report a bug. Any Qt datatype that provides an operator== should also provide a
qHash
overload. - Roll your own
qHash
, but protected by a Qt version check that blocks the usage of a higher version of Qt. That code must be revisited when Qt itself gets upgraded -- if a new version introduces a qHash
overload, your code may break in interesting ways (e.g. ODR violation). - ("Worst case scenario") Ditch
QHash
for std::unordered_map
and a custom hasher (following the same reasoning of 1+2, you don't want to specialize std::hash
for Qt datatypes), or use a wrapper type.
Hashing std::
datatypes via qHash
Unfortunately, you really shouldn't do it. First, it should be Qt's job at providing hashers for datatypes defined by the Standard, not yours, so the previous point applies.
Second, it's impossible to do it reliably, because a qHash
overload for such datatypes should be declared in the std namespace, and that's not allowed.
An overload in any other namespace will not work correctly. Given your rolled out qHash for an arbitrary std::foo
datatype:
Then merely using a QHash will lull us into a false sense of security, as this code will work:
But many other usages will be, unfortunately, broken. Suppose you just use the datatype as a data member of a class:
And then you want to define its qHash
overload just like we've seen before:
How is this possible? This is due to how ADL works, combined with 2-phase name lookup. Suppose that you have this little piece of code:
This code works, even if S
and f(S)
were declared after call_f()
. This is what gives us the power of using qHashMulti
(AKA call_f
) that can use qHash
overloads on arbitrary types (AKA f(S)
), even if these overloads got introduced after qHashMulti
's own definition.
If we change the code just slightly, however, we have a problem:
This code now does not compile (surprise!). The reason is that the call to f()
inside call_f()
will only look in some scopes:
- it will check if a
f(std::foo)
was declared before call_f()
(it wasn't) - it will check for a
f(std::foo)
in std::foo
's associated namespaces. Which is namespace std
, not the one we're in!
To make the example work, we would need to do something like this:
... which would be OK for just about any other namespace. Not for namespace std
, though: as explained before, we are not allowed to declare an arbitrary new function in that namespace.
Moving the definition of f(std::foo)
before call_f
would work (this would mean defining our qHash(std::foo)
before including the definition of qHashMulti
). But if we do so, we lose the convenience of using qHashMulti
(as well as qHash
for any other fundamental type or Qt type!).
You can play with this code here on Godbolt.
Long story short: due to qHash
design, it's not possible to conveniently and reliably hash datatypes defined by the Standard. You can use workarounds, e.g. define wrapper types and use such types as keys in your QHash
objects.
Bonus points
As in, nice things to have.
qHash
overloads should really really be noexcept
. Throwing from a hashing function sounds wrong on multiple levels.qHash
overloads are good candidates for being marked as pure functions, that is, functions that only read memory, never modify it (or have any visible side effects). Such markers should help the compiler optimizing the code further.Qt defines a couple of convenience macros for this. One is Q_DECL_PURE_FUNCTION
, that means that the function only reads global memory. There's also Q_DECL_CONST_FUNCTION
, as a even more strict version: the function only reads the values of the parameters, not any memory at all. Note that passing a reference/pointer
and reading through it makes a function PURE
, not CONST
(something like strlen
is PURE
). Use CONST
with a qHash overload where you'd pass an argument by value because it's a small trivially copiable type like QPoint
, QVector3D
or similar, otherwise use PURE
.
Why do I bother with all this?
Because declaring "ordinary" hashing functions should be straightforward, unless you have profiling data to believe that you can do better than the idiomatic way. I keep seeing code that makes it look dark magic. It's not!
Ideally, in some not-too-far future, we may reach a level where the compiler will autogenerate the hash function for us -- just like in C++20 it may autogenerate the implementation of operator==
for us!
Thanks for reading!
Trusted software excellence across embedded and desktop platforms
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.
1 Comment
30 - Jun - 2020
Marc Mutz
Nice summary, thanks! Three points I'd like to comment on:
First, you can always make the
qHash()
function a hidden friend, because you can define it inline to call a private (or even public) member function, sayMyType::hash(qhash_result_t seed = 0) const
.Second, beware of mixed-type equality comparison. A
std::string
isn't equality-comparable to astd::u16string
, so it's ok for them to just hash the bits, but aQLatin1String
is quality-comparable to aQString
, but they, too, just hash the bits these days, which means thatqHash(QLatin1String("foo")) != qHash(QString("foo"))
, even thoughQLatin1String("foo") == QString("foo")
. This is a gotcha often overlooked and, IMHO, one of the major stalling points of mixed-type hash table lookup. Try to do better in your own code bases.Third, please, never implement a global entity for a type you don't own. Don't add relational operators to types you don't own, don't add hashing, don't add
Q_DECLARE_TYPEINFO()
orQ_DECLARE_METATYPE()
, etc.If you find you need any of these things for a type you don't own, you have two options: a) wrap the type in a struct and add these things for that your struct, or else write custom hashers or relational operators as named lambdas and pass them into algorithms and data structures for use in lieu of
std::less
andstd::hash
(not an option with the Qt containers).And yes, both Qt and std should really get their act together and provide a complete set of "global entities" (even if
=delete
ed) for all their Regular Types.