For quite some time, I have been bothered by this thought: Individual programming languages (C++, Rust, Go, etc.) are traditionally viewed as walled gardens. If your main()
function is written in C++, you had better find yourself C++ libraries like Qt to build the rest of your codebase with. Do you want to use Flutter to build your app's user interface? Get ready to build the logic in Flutter, too. Do you really want to use that Rust library to make your application safer? You get to either rewrite the whole app in Rust or build an ugly extern "C"
wrapper around it that won't fit well in your object-oriented C++ code.
This has been the standard view on using multiple programming languages for many years. However, I've decided that this view is fundamentally flawed, because every compiled language uses the same set of concepts when it is compiled:
- Code is split up into functions that can be reused.
- Functions are identified by a string generated from the function name in the source code. For example, g++ generates
_Z3foov
as the identifier for void foo()
. This string is always reproducible; for example, both Clang and GCC on Linux follow the Itanium C++ ABI convention for mangling function names. - Functions are called by storing all parameters to that function at a specific location in memory and then using a
call
instruction or equivalent to move control to the function. For example, to call void foo()
from earlier, the compiler converts a C++ statement foo();
into the assembly call _Z3foov
. The assembler then replaces call
with the appropriate opcode and replaces _Z3foov
with the location of the first instruction identified by _Z3foov
. - Functions return by storing their return value (if they have one) at a specific location and then using a
ret
instruction or equivalent. - Classes and structs can be boiled down to a collection of primitive types (although some classes do have vtables).
- Class methods are just another function that happens to take a pointer to the class object as the first parameter. In other words, when you write this:
class Foo { void foo(int bar); int baz; };
your code actually compiles to something that is better represented this way:
class Foo { int baz; }; void foo(Foo *this, int bar);
Since every compiled programming language uses the same concepts to compile, why can't they just interact?
Example
Before we go any further, I'd like to give an example of what we want to achieve:
We want to be able to compile those files and get an executable file that prints Hello from Rust
to stdout
.
Now let's look at why this won't just work out of the box.
Name mangling, data layout, and standard libraries
The most obvious reason that compiled programming languages can't just interact with each other is the most obvious one: syntax. C++ compilers don't understand Rust, and Rust compilers don't understand C++. Thus neither language can tell what functions or classes the other is making available.
Now, you might be saying "But if I use a C++ .h file to export functions and classes to other .cpp files, certainly I could make a .h file that tells C++ that there is a Rust function fn foo()
out there!" If you did say (or at least think) that, congratulations! You are on the right track, but there are some other less obvious things we need to talk about.
The first major blocker to interoperability is name mangling. You can certainly make a .h file with a forward declaration of void foo();
, but the C++ compiler will then look for a symbol called _Z3foov
, while the Rust compiler will have mangled fn foo()
into _ZN10rustmodule3foo17hdf3dc6f68b54be51E
. Compiling the C++ code starts out OK, but once the linking stage is reached, the linker will not be able to find _Z3foov
since it doesn't exist.
Obviously, we need to change how the name mangling behaves on one side or the other. We'll come back to this thought in a moment.
The second major blocker is data layout. Put simply, different compilers may treat the same struct declaration differently by putting its fields at different locations in memory.
The third and final blocker I want to look at here is standard libraries. If you have a C++ function that returns an std::string
, Rust won't be able to understand that. Instead, you need to implement some sort of converter that will convert C++ strings to Rust strings. Similarly, a Rust Vec
object won't be usable from C++ unless you convert it to something C++ understands.
Let's investigate how we can fix the first problem, name mangling.
extern "C"
and why it sucks
The easy way is to use the extern "C"
feature that nearly every programming language has:
This actually will compile and run (assuming you link all the proper standard libraries)! So why does extern "C"
suck? Well, by using extern "C"
you give up features like these:
It's possible to create wrappers around the extern "C"
functions to crudely emulate these features, but I don't want complex wrappers that provide crude emulation. I want wrappers that directly plumb those features and are human readable! Furthermore, I don't want to have to change the existing source, which means that the ugly #[no_mangle] pub extern "C"
must go!
Enter D
D is a programming language that has been around since 2001. Although it is not source compatible with C++, it is similar to C++. I personally like D for its intuitive syntax and great features, but for gluing Rust and C++ together, D stands out for two reasons: extern(C++)
and pragma(mangle, "foo")
.
With extern(C++)
, you can tell D to use C++ name mangling for any symbol. Therefore, the following code will compile:
However, it gets better: we can use pragma(mangle, "foo")
to manually override name mangling to anything we want! Therefore, the following code compiles:
With pragma(mangle, "foo")
we can not only tell D how Rust mangled its function, but also create a function that Rust can see!
You might be wondering why we had to tell Rust to override mangling of bar()
. It's because Rust apparently won't apply any name mangling to bar()
for the sole reason that it is in an extern
block; in my testing, not even marking it as extern "Rust"
made any difference. Go figure.
You also might be wondering why we can't use Rust's name mangling overrides instead of D's. Well, Rust only lets you override mangling on function forward declarations marked as extern
, so you can't make a function defined in Rust masquerade as a C++ function.
Using D as the glue
We can now use D to glue our basic example together:
In this example, when main()
calls foo()
from C++, it is actually calling a D function that can then call the Rust function. It's a little ugly, but it's possibly the best solution available that leaves both the C++ and Rust code in pristine condition.
Automating the glue
Nobody wants to have to write a massive D file to glue together the C++ and Rust components, though. In fact, nobody even wants to write the C++ header files by hand. For that reason, I created a proof-of-concept tool called polyglot that can scan C++ code and generate wrappers for use from Rust and D. My eventual goal is to also wrap other languages, but as this is a personal project, I am not developing polyglot very quickly and it certainly is nowhere near the point of being ready for production use in serious projects. With that being said, it's really amazing to compile and run the examples and know that you are looking at multiple languages working together.
Next up
I originally planned to write on this topic in one blog post, but there are a lot of interesting things to cover, so I will stop here for now. In the next installment (part 2) of this series we will take a look at how we can overcome the other two major blockers to language interoperability and here you can find part 3.
Trusted software excellence across embedded and desktop platforms
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.
11 Comments
7 - Dec - 2023
Lachu
Check language called nim .
7 - Dec - 2023
Loren Burkholder
Nim looks like it could indeed be useful here; however, I think D is the better solution for this usecase. This article is targeted at programmers who are already familiar with C++, and D has a much more C++-like syntax than Nim. However, it would be perfectly valid to use Nim here instead of D; in fact, any language that supports changing the name mangling of arbitrary symbols (both external and internal) could be used instead of D.
8 - Dec - 2023
Paulo Pinto
Maybe the title should be "Mixing C++, D and Rust for Fun and Profit"
8 - Dec - 2023
Loren Burkholder
Thanks for the suggestion! However, the emphasis here is on the interop between C++ and Rust. D is used merely as a glue layer; while you can easily expand the glue to bind to other languages including D, I'm using C++ and Rust since many people are interested in migrating their C++ codebases to Rust.
8 - Dec - 2023
Matheus Catarino
Excellent post. I was surprised and happy to have mentioned Dlang in this post.
Even the technique used in Dlang also reminded me of the possibility also carried out in Zig, which can also read mangled functions, probably inheriting llvm-demangle.
I believe that Swift is the only one that does not require bindings and reading the modulemap containing the header included.
I'm curious about the polyglot project, and of course in comparison to the cxx-rs and cbindgen project.
9 - Dec - 2023
Loren Burkholder
I'm a huge fan of D myself and I'm always happy to promote its use in any way possible. You are right, though, that D is not the only language that is usable as a glue layer; however, D is easy to understand for C++ programmers.
Good job for noticing that this is somewhat duplicating the cxx-rs effort, though! I plan to review some existing efforts like cxx-rs in a later installment of this series.
16 - Dec - 2023
BitSyndicate
Out of curiosity, how does D know about the calling conventions for Rust code, as far as I know those are version dependent and can change arbitrarily?
18 - Dec - 2023
Loren Burkholder
Good question! I actually was not aware that Rust calling conventions are unstable. However, it looks like you can configure Rust to compile as C-compatible shared library, and the Rust docs also imply that compiling Rust as a static library will work as well (assuming you link the final executable to the Rust system libraries).
17 - Dec - 2023
Andy
Please start a youtube channel & explain.
Thank you.
17 - Dec - 2023
Marcus
People with D always wants to put the it where it doesn’t belong 😏
Have you tried to use a linker script with EXTERN and PROVIDE?
18 - Dec - 2023
Loren Burkholder
No, I haven't tried linker scripts. That does seem like an interesting option; I might have to research it to see how easy it is to integrate into a binding generator.