A few days ago, a customer using ourKD Soap library reported an interesting problem with it. The library worked perfectly to access the SOAP server, but there was some noticeable latency when doing a series of requests in short succession from the client. This latency should not have been there, as both the server and the client were in the same local network.
An investigation began, regarding whether the client code or KD Soap were doing something suboptimal, like reconnecting to the server all the time, or something like that. The actual reason turned out to be something much more profound: a 6-year-old bug in QNetworkAccessManager (at least, that's when it was reported. I think the bug is at least 10 years old...)
What's QNetworkAccessManager (or QNAM, for the friends)? It's the central class in Qt to issue HTTP/FTP requests (incl. HTTPS, HTTP/2, etc.); KD Soap uses it to implement SOAP over HTTP.
Did you know?
In Qt Creator, you can type just upper case letters of a class name (like QNAM, QAIM, QSFPM) -- and Creator will offer to autocomplete it for you!
So, "not our bug", problem solved... maybe for us, but certainly not for our customer!
Therefore, I took a shot at the issue anyhow, as personal curiosity. (Mandatory reminder: KDAB contributesa lot of code to Qt!)
The problem
Using a network sniffer like Wireshark, the problem quickly emerged: each SOAP request, transported through a POST method, was sent by QNAM split over multiple TCP segments.
Let's analyze the packet dump together. We have 192.168.1.1, which is the client running KD Soap, and 192.168.1.254 running a HTTP server.
In the first few exchanged segments, nothing special happens:
Segment 5 contains the first part of the first SOAP request.
Segment 6 quickly acknowledges the data sent by segment 5.
Segment 7 completes the first SOAP request (that is, completes the HTTP POST).
Segment 8 acknowledges the data sent in segment 7. (Note: Please ignore the length field -- it does not refer to the TCP data length, but to the request length; that's why the acknowledged data does not match.)
Segment 9 sends the result back to the client.
The whole exchange (establishing the connection, sending the request, getting an answer) took less than 3 milliseconds. Pretty reasonable, considering everything is happening inside the same local network.
Now this is when things start going wrong. The client did not make just one request; it made several, in quick succession. These requests got queued inside QNAM. QNAM implements HTTP persistent connections, so the TCP connection is not immediately shut down after the request has been served. Instead, once QNAM receives the reply, it's able to immediately send another request. This happens in the next exchange:
Segment 10 acknowledges the data received by the server, and sends part of a new request.
Segment 11 acknowledges the data for the new request.
Segment 12 completes the request (sending more data).
Segment 13 acknowledges the data sent to complete the request.
Segment 14 sends the result back to the client.
It looks pretty much like the last exchange, minus establishing the connection, doesn't it? There's however a small but important difference: if you carefully look at the timestamps, you may notice that this request took almost 50ms to be served! That's 10-20 times more than the first one!
If we keep going, we have another request, and this one also takes 50ms:
These delays would just keep piling up; in order to send 20 requests we take over one second, rather than less than 0.1s (as it should be). What's going on here? Where do these delays come from?
Diagnosis
As hinted in the bug report, the problem is a very subtle interaction between QNAM and the underlying TCP stack. The TCP protocol itself, as well as the API that operating systems offer to use it (e.g. the socket(2) APIs on Unix) feature a lot of knobs to fine tune TCP's behavior. This is intentional, and done in order to make it perform better: reduce latency, increase the used bandwidth, reduce the number of segments exchanged on the network. Some of these knobs are entirely opt-in because whether they make sense or not depends on the application.
A couple of these optimizations are usually enabled by default, and are the cause of the problem here.
The first one is called delayed acknowledgments. In order to reduce the load on the network, one side of a TCP connection may decide to wait a bit before sending out an acknowledgement for the data it has just received. (Of course, it must not wait too much, or the sender may trigger a retransmission thinking that the data it just sent got lost!) It's a pretty simple idea: give the other side the chance of sending more data; if this happens, the receiving side may send only one segment acknowledging all the data received. (Yes, this is perfectly legal in TCP.)
The second one is called Nagle's algorithm. This is an algorithm that again tries to reduce the network load by having one side not transmit data too often (until certain conditions are met). The idea is that if a TCP application has to send small chunks of data, it would be better for the TCP stack to buffer all the data and then send it out in one segment. Of course, sooner or later the data has to be sent, and the algorithm controls when.
How does QNAM manage to break all this? When QNAM has to issue an HTTP request with a body (such as POST or PUT), it does two different writes into the TCP socket: first it writes the HTTP headers, then it writes the body of the request. This "double write" causes an interesting chain of events:
Client: QNAM writes the HTTP headers into the socket
Client: the TCP socket sends the headers right away
Server: receives the data, but does not send the ACK yet, because of delayed acknowledgments
Client: QNAM writes the HTTP body into the socket
Client: the TCP socket does not transmit the HTTP body, because Nagle's algorithm says so (there is 1 pending acknowledgment)
Nothing happens for 50ms
Server: the delayed ack timer times out, so it sends the ACK (just for the HTTP headers)
Client: receives the ACK, Nagle's algorithm says it can transmit more data, so it sends the rest of the request
Server: receives the rest of the request; the request is complete, the SOAP server handles it, creates a reply, sends the ACK for the rest of the request and the reply
Steps 2-4 can happen in a slightly different order, but the result does not change: we have a 50ms delay, introduced by TCP, which is affecting our requests.
Workarounds
What can we do to fix this? This should really be fixable: the overall data we want to transmit (headers+body) fits in one TCP segment -- sending two and having to wait before transmitting the second sounds just wrong.
Note that we do not necessarily control the server, so disabling delayed ACKs is out of the question.
The first thing we could think of is to enable buffering on the TCP socket. QTcpSocket, like all QIODevice subclasses, can be opened in buffered or in unbuffered mode. By default, QNAM opens it in unbuffered mode: it does not want an extra copy of the data to be done (from the application, into the TCP socket, then into the kernel; and similarly the other way around). However, opening it in buffered mode does not help at all. What happens is that QTcpSocket copies the data into its own write buffer, and then immediately tries to flush it. Typically the kernel will allow the flush to happen (because there's not enough outgoing data filling kernel's outgoing buffers); and this results in the same chain of events, with each write (by QNAM) becoming a write into the kernel, and the second one being delayed by Nagle's algorithm.
On the client, we could think of disabling Nagle's algorithm. All operating systems offer a way to do so; the rationale is that if we have to frequently send small chunks of data we don't want to wait for acknowledgements to arrive before sending the next chunk (think, for instance, to a SSH client that has to send individual keystrokes). This option is typically called TCP_NODELAY, and it's even exposed in Qt by the QAbstractSocket::LowDelayOption socket option. For some reason, QNAM has a commented out call to set the option:
// improve performance since we get the request sent by the kernel ASAP
// socket->setSocketOption(QAbstractSocket::LowDelayOption, 1);
// We have this commented out now. It did not have the effect we wanted. If we want to
// do this properly, Qt has to combine multiple HTTP requests into one buffer
// and send this to the kernel in one syscall and then the kernel immediately sends
// it as one TCP packet because of TCP_NODELAY.
// However, this code is currently not in Qt, so we rely on the kernel combining
// the requests into one TCP packet.
Not exactly sure why this comment was made. (This code was written before the Qt source code history had been made public in 2011.) I was not willing to revisit this, so I decided to leave it alone, at least for for the moment being.
The Linux socket API has the ideal option for this use case. It's theTCP_CORK socket option. Its effects are to delay the sending of the data written into the socket until a timeout expires, or the option is toggled. This allows a client (or a server) to write in the socket multiple times, and the kernel will take care of coalescing the writes into fewer TCP segments. (The interesting bit is that the writes can happen using different system calls, such as write(2) for the headers followed by sendfile(2) for the body.). While this works, it's of course OS-specific -- there is nothing comparable for any other operating system -- so it's not a good candidate for fixing the problem in Qt (which is, after all, a cross platform toolkit).
Finally, the above comment in QNAM contains the actual solution. We can refactor QNAM to avoid writing the headers and then the rest of the request, using separate writes. Instead, it should try to combine them, if possible, into one single write. This should ensure that, once a complete request is sent by the client, the server can act on it and immediately write the reply.
I have implemented and submitted this last solution to Qt.
Enough talk! Give me the code!
Ok, ok... the patch that fixes the problem is here. At this time, it will only be present in Qt 6.0. Qt 5 is in "bugfix only" mode, and since this issue has been present throughout the entire Qt 5 history, it was not deemed safe to apply the change there as well.
Still, it doesn't mean you cannot just cherry-pick the patch into your local build of Qt 5, if your applications are affected. It should apply without issues. In fact, the customer was able to try it out and confirm that it solves the original issue they were seeing. Whew!
The moral lesson
At first I didn't want to write a blog post simply to discuss "yet another bugfix that landed into Qt".
But then, I decided to take this opportunity to also say something else. Many of our customers are (positively!) surprised by the huge skillset found within KDAB. By hiring us, customers don't get just someone to write some code for them; they get access to a great talent pool. Within KDAB there's enormous technical knowledge. Each engineer is of course broadly skilled in computer science and programming, but they then have tremendous specialized skills in one or more focus areas:
We value “T-shaped” people. That is, people who are both generalists (highly skilled at a broad set of valuable things — the top of the T) and also experts (among the best in their field within a narrow discipline — the vertical leg of the T).
Or, to use the words of my ex-colleague Jean-Michaël:
All the above bugfixing story is actually about knowing how things work, all the way from HTTP, into Qt's code, and down to low level details of TCP/IP programming. Does it mean that I'm just "that good" and happen to know just about everything? We both know that is not true.
This is possible, not only because KDAB hires T-shaped individuals, but also because it nurtures its employees so that their strong "vertical" skills can actually grow over time, while also embracing more "horizontal" skills. We are constantly encouraged in our professional growth, and that's not something always found out there.
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications. In addition to being leading experts in Qt, C++ and 3D technologies for over two decades, KDAB provides deep expertise across the stack, including Linux, Rust and modern UI frameworks. With 100+ employees from 20 countries and offices in Sweden, Germany, USA, France and UK, we serve clients around the world.
Giuseppe D’Angelo
Senior Software Engineer
Senior Software Engineer at KDAB. Giuseppe is a long-time contributor to Qt, having used Qt and C++ since 2000, and is an Approver in the Qt Project. His contributions in Qt range from containers and regular expressions to GUI, Widgets, and OpenGL. A free software passionate and UNIX specialist, before joining KDAB, he organized conferences on opensource around Italy. He holds a BSc in Computer Science.