In this article, I will take a look at one of the fundamental concepts introduced in Alex Stepanov and Paul McJones’ seminal book "Elements of Programming" (EoP for short) — that of a (Semi-)Regular Type and Partially-Formed State.
Using these, I shall try to derive rules for C++ implementations of what are commonly called "value types", focusing on the bare essentials, as I feel they have not been addressed in sufficient depth up to now: Special Member Functions.
Alex Stepanov and Paul McJones gave us a whole new way of looking at this, with a mathematical theory of types and algorithms quite unlike anything ever done before. Their achievement will forever change the way you look at computer programming, but eight years after its publication, the book still does not get the widespread adoption it deserves.
Setting The Stage
Special Member Functions, of course, are those member functions of a C++ object that the compiler can write for you: The default constructor, the copy and move constructors, the copy and move assignment operators and the destructor.
A Regular Type in EoP roughly corresponds to the EqualityComparable combined with the CopyConstructible C++ concept, see the book for more details.
A C++ Value Type is a type that is defined by its state, and its state alone (note that EoP has a very different definition of value type). Take an int
as an example. Two int
objects of value 5 will behave identical under all regular operations (simplified: all operations except for taking the object's address). Two Shape
objects, however, both having the same position, color, texture, ... still may end up a square and a triangle when drawn on screen. A Shape
object is defined by its behaviour as much as its state. We call such types polymorphic.
There are many shades of grey in between those two extremes; let's leave it at that crude distinction. See Designing value classes for modern C++ - Marc Mutz @ Meeting C++ 2014 for a somewhat more thorough treatment.
In this article, we will look at two different classes, Rect
and Pen
, and try to write their Special Member Functions hopefully as Stepanov would have us do.
Rect and Pen
The first, Rect
, is simple: it's an integral-coordinate rectangle class that we will define completely inline in the header file. Pen
, however, will be quite a bit different: It will use the Pimpl Idiom to firewall its internals from users. See Pimp My Pimpl and Pimp My Pimpl — Reloaded for more on the idiom.
The first task for today is to write the default constructor.
Default Construction
EoP has this to say about the default constructor:
Ok, so what's a "partially-formed state"? Here comes the good part:
The authors go on to say that any other operation on partially-formed objects is undefined. In particular, such objects do not, in general, represent a valid value of the type.
The motivation for EoP to require default-construction in the first place is programmer convenience: T a = b;
should be equivalent to T a; a = b;
, and the user of the type should get to choose whether to write
Without default construction, if all the type's author gave are user-defined constructors that establish a valid value, the programmer would have to use the ternary operator, whether or not that fits with line length limitations and personal preferences.
The comments at the end of the article contain even more reasons to support default construction.
A default constructor for Rect
So, let's try write something for Rect
:
What do you think? Would you have written the Rect
default constructor this way?
I can tell you I wouldn't have. Not until EoP opened my eyes. Remember that EoP only requires that the default constructor establish a partially-formed state, not a valid value. This should not surprise you. When in C++, do as the int
s do:
In both cases, any use of the default-constructed object other than assignment or destruction is undefined, because the values of the objects are undefined (uninitialised).
If you feel uncomfortable with this implementation, you're letting your inner Java programmer get the better of you. Don't. This is C++. We embrace the undefined.
And, as Howard Hinnant writes in a reddit comment on this article, we give power to our users:
Next, let's try Pen
.
A default constructor for Pen
Should we have left Pen::d
uninitialised, too?
No. Doing so would make destruction undefined.
Should we have new
ed a Pen::Private
object into Pen::d
in the default constructor?
That would be a no, too. We're not required to establish a valid value in the default constructor, so in the spirit of "don't pay for what you don't use", we only do the minimal work necessary to establish a partially-formed state.
To hammer this one home: Should an implementation of
check for d == nullptr
?
No the third. You can see at a glance in the source code whether an object is in a partially-formed state. There is no need for a runtime check, except for debugging purposes.
From the above, it follows that your default constructors should be noexcept. If your default constructors throw, they do too much. Of course, we're still talking Value Types here, so let no man say that yours truly told you to make the default constructors of your RAII types noexcept.
Move-Construction And Move-Assignment
For Rect
, moving and copying are the same thing, and the compiler is in the best position to implement them for you:
Once more, Pen
is a bit more interesting:
We put moved-from Pen
objects into the partially-formed state. In other words: moving from an object has the same effect as default-construction. Can it get any simpler?
We delegate move-assignment to the move constructor:
Note how all special member functions except the destructor are inline so far, yet we didn't break encapsulation of the Pen::Private
class.
Controversy
Thanks in no small part to the ISO C++ standard, which describes moved-from objects (in [lib.types.movedfrom]) as follows:
the simple chain of reasoning described so far has less friends than you might think. And this is why I wrote this article.
You will probably meet a lot of resistance when trying to implement your default and move constructors this way. But think about it: What would a natural "default value" of your type be?
It's easy to fall for the next-best choice: For int
, surely the default-constructed value should be zero, and we just have to put up with this partially-formed, nay: uninitialised, values because C sucks.
I disagree. If you are using the int
additively, then, yes, zero is a good default value. But if you work with multiplication, then one would be the better fit.
Bottomline: for the vast majority of types, there is no natural default. If there isn't, then having to establish a randomly-chosen one on every default-construction operation is wasteful, so don't do it.
Instead, have the default constructor establish only a partially-formed state, and provide literals (or named factory functions for something more complex) for the different "default" values:
Embracing Partially-Formed Objects
Partially-Formed Objects are nothing magical. They offer a simple description of the behaviour of C++ built-in types with respect to default construction, and of pimpl'ed objects with respect to move semantics, if implemented in the natural way.
In both cases, partially-formed objects are easily spotted in source code with local static reasoning, so demands for anything more fancy than the bare minimum as the result of moving from an object or default-constructing one are violating the C++ principle of "don't pay for what you don't use". As a corollary, keep your default constructors noexcept
.
In a future instalment, we will look at a smart pointer that encodes these guidelines for use as a pimpl-pointer.
6 Comments
2 - Feb - 2017
dyp
Unfortunately, partially-formed object may not even be returned from functions (unless they're returned as prvalues and C++17 mandatory copy-elision applies). This somewhat restricts the way you can subdivide your code into small functions. Using optional is a clunky but possible workaround. On the other hand, this guarantees that any T t = f(); produces a fully-formed state.
The "local static reasoning" can easily be performed by static analyzers, if we can teach them which constructors produce a partially-constructed object and which ones don't. The resulting checks are much better than initializing an object to some form of "default value" which might not be an appropriate default for all code paths. Dynamic analyzers are also able to find those issues.
Regarding default-initialization for if/else vs ?: -- I think that's not a very strong argument. If it was the sole argument for a partially-formed state, then the safety concerns (use after lack of proper initialization) would far outweigh this minor benefit IMHO. However, it is not the sole argument. It is far easier to perform aggregation if you have a partially constructed state, since the class might not have a default value to provide to its data member. The other way around - taking away the requirement to always be constructed to a fully-formed state - is harder to implement (using unions or optional), especially if you want to have defensive checks like assertions. One annoying example are MSVC's StdLib container classes, which allocate in their default constructors to implement debug iterators. If the allocator is stateful and needs that state to perform this debug allocation, then you have to pass it an allocator even if you never use the container.
Consider std::thread and std::*fstream, which do not represent a thread or file, respectively, in their partially-formed state, and you can use them easily in your own types which might only sometimes spawn a thread or open a file. Many people I've talked to want their classes to provide as much guarantees as possible, and the "never-empty" guarantee is very popular; this corresponds to deleting the default ctor of std::thread, for example. This "never-empty" guarantee however cannot be uphold with efficient move operations, since the Committee decided against Stepanov's idea of a copy-destructor (destructive move).
20 - Apr - 2017
Edward Welbourne
Rather than the if/else rationale for the default-constructed case, consider the case of initialization from an I/O stream,
and, of course, initialization of an array,
15 - May - 2017
Marc Mutz
FTR: I was using the rationale given by Stepanov. I agree there are more reasons to allow default-construction, incl. the ones you gave.
4 - Feb - 2018
alfC
Very well explained.
What do you think about not assignable types (like streams), should they be conditionally assignable if they are in a partially formed state? For example:
Was the "future installment" on smart pointers ever published?
5 - Feb - 2018
Marc Mutz
A stream is a handle/raii-like type, like a mutex or a thread. It makes some sense to allow moves on these kind of types, even if copying is prohibited. It does force the representation to have external state, though, so a movable mutex class can't just use atomic operations on
*this
anymore, say, like a non-copyable/non-movable mutex class could.15 - May - 2019
alfC
Every day I am more convinced about embracing partially formed states. To the point that now I think containers should be allowed in a partially formed state if the contained value is allowed to be in a partially formed state. For example "std::vector v(20)" should contain objects in partially formed state (if "T t" is in a partially formed state). For formed state we always can do "std::vector v(20, {});" or "std::vector v(20, T{});". In practice "std::vector v(20)" should call "std::uninitialized_default_construct", and "std::vector v(20, {})" should call "std::uninitialized_value_construct". Currently it is not possible to customize this behavior. (It is not the main reason, but "std::map" should also "default" initialize the Value keys.) What do you think?