Trusted Software Excellence across Desktop and Embedded
Take a glance at the areas of expertise where KDAB excels ranging from swift troubleshooting, ongoing consulting and training to multi-year, large-scale software development projects.
Find out why customers from innovative industries rely on our extensive expertise, including Medical, Biotech, Science, Renewable Energy, Transportation, Mobility, Aviation, Automation, Electronics, Agriculture and Defense.
High-quality Embedded Engineering across the Stack
To successfully develop an embedded device that meets your expectations regarding quality, budget and time to market, all parts of the project need to fit perfectly together.
Learn more about KDAB's expertise in embedded software development.
Where the capabilities of modern mobile devices or web browsers fall short, KDAB engineers help you expertly architect and build high-functioning desktop and workstation applications.
Extensible, Safety-compliant Software for the Medical Sector
Create intelligent, patient-focused medical software and devices and stay ahead with technology that adapts to your needs.
KDAB offers you expertise in developing a broad spectrum of clinical and home-healthcare devices, including but not limited to, internal imaging systems, robotic surgery devices, ventilators and non-invasive monitoring systems.
Building digital dashboards and cockpits with fluid animations and gesture-controlled touchscreens is a big challenge.
In over two decades of developing intricate UI solutions for cars, trucks, tractors, scooters, ships, airplanes and more, the KDAB team has gained market leading expertise in this realm.
Build on Advanced Expertise when creating Modern UIs
KDAB assists you in the creation of user-friendly interfaces designed specifically for industrial process control, manufacturing, and fabrication.
Our specialties encompass the custom design and development of HMIs, enabling product accessibility from embedded systems, remote desktops, and mobile devices on the move.
Legacy software is a growing but often ignored problem across all industries. KDAB helps you elevate your aging code base to meet the dynamic needs of the future.
Whether you want to migrate from an old to a modern GUI toolkit, update to a more recent version, or modernize your code base, you can rely on over 25 years of modernization experience.
KDAB offers a wide range of services to address your software needs including consulting, development, workshops and training tailored to your requirements.
Our expertise spans cross-platform desktop, embedded and 3D application development, using the proven technologies for the job.
When working with KDAB, the first-ever Qt consultancy, you benefit from a deep understanding of Qt internals, that allows us to provide effective solutions, irrespective of the depth or scale of your Qt project.
Qt Services include developing applications, building runtimes, mixing native and web technologies, solving performance issues, and porting problems.
KDAB helps create commercial, scientific or industrial desktop applications from scratch, or update its code or framework to benefit from modern features.
Discover clean, efficient solutions that precisely meet your requirements.
Boost your team's programming skills with in-depth, constantly updated, hands-on training courses delivered by active software engineers who love to teach and share their knowledge.
Our courses cover Modern C++, Qt/QML, Rust, 3D programming, Debugging, Profiling and more.
The collective expertise of KDAB's engineering team is at your disposal to help you choose the software stack for your project or master domain-specific challenges.
Our particular focus is on software technologies you use for cross-platform applications or for embedded devices.
Since 1999, KDAB has been the largest independent Qt consultancy worldwide and today is a Qt Platinum partner. Our experts can help you with any aspect of software development with Qt and QML.
KDAB specializes in Modern C++ development, with a focus on desktop applications, GUI, embedded software, and operating systems.
Our experts are industry-recognized contributors and trainers, leveraging C++'s power and relevance across these domains to deliver high-quality software solutions.
KDAB can guide you incorporating Rust into your project, from as overlapping element to your existing C++ codebase to a complete replacement of your legacy code.
Unique Expertise for Desktop and Embedded Platforms
Whether you are using Linux, Windows, MacOS, Android, iOS or real-time OS, KDAB helps you create performance optimized applications on your preferred platform.
If you are planning to create projects with Slint, a lightweight alternative to standard GUI frameworks especially on low-end hardware, you can rely on the expertise of KDAB being one of the earliest adopters and official service partner of Slint.
KDAB has deep expertise in embedded systems, which coupled with Flutter proficiency, allows us to provide comprehensive support throughout the software development lifecycle.
Our engineers are constantly contributing to the Flutter ecosystem, for example by developing flutter-pi, one of the most used embedders.
KDAB invests significant time in exploring new software technologies to maintain its position as software authority. Benefit from this research and incorporate it eventually into your own project.
Start here to browse infos on the KDAB website(s) and take advantage of useful developer resources like blogs, publications and videos about Qt, C++, Rust, 3D technologies like OpenGL and Vulkan, the KDAB developer tools and more.
The KDAB Youtube channel has become a go-to source for developers looking for high-quality tutorial and information material around software development with Qt/QML, C++, Rust and other technologies.
Click to navigate the all KDAB videos directly on this website.
In over 25 years KDAB has served hundreds of customers from various industries, many of them having become long-term customers who value our unique expertise and dedication.
Learn more about KDAB as a company, understand why we are considered a trusted partner by many and explore project examples in which we have proven to be the right supplier.
The KDAB Group is a globally recognized provider for software consulting, development and training, specializing in embedded devices and complex cross-platform desktop applications.
Read more about the history, the values, the team and the founder of the company.
When working with KDAB you can expect quality software and the desired business outcomes thanks to decades of experience gathered in hundreds of projects of different sizes in various industries.
Have a look at selected examples where KDAB has helped customers to succeed with their projects.
KDAB is committed to developing high-quality and high-performance software, and helping other developers deliver to the same high standards.
We create software with pride to improve your engineering and your business, making your products more resilient and maintainable with better performance.
KDAB has been the first certified Qt consulting and software development company in the world, and continues to deliver quality processes that meet or exceed the highest expectations.
In KDAB we value practical software development experience and skills higher than academic degrees. We strive to ensure equal treatment of all our employees regardless of age, ethnicity, gender, sexual orientation, nationality.
Interested? Read more about working at KDAB and how to apply for a job in software engineering or business administration.
I'd like to talk about a common problem that we have seen come up in several projects, namely how to plot large datasets over time in an efficient manner using Qt. This seems to be a general issue, especially in embedded situations and is affecting many people. Because of this, I thought we'd share one approach that we have found to work well in some situations, maybe it helps you out in your project.
Problem: Waveform Graphs of Large Data Sets
We have a situation where we are accumulating more than 500.000 samples of sensor data. The data is appended over time to a huge array. We want to visualize this growing amount of sensor information as a waveform graph:
To intelligently render this on low-profile hardware, we cannot visit all the data points in a single render pass, but we need to cheat. Instead of drawing lines between all the hundreds of thousands of data points, we draw each vertical pixel column as one line reaching from the minimum sample to the maximum sample. With this method, we only need to render a few hundred lines instead of hundreds of thousands to reach the same visual result. To pull off this trick, however, we need to query minmax(beginIndex, endIndex) to obtain the range of values for which to draw a line very efficiently and very often. For this, I developed a MinMaxTree data structure which can offer high speeds for this query. The effective API of such a data structure can look like this:
This API allows you to insert a single data sample to store new sensor data. It also lets you retrieve the minimum and maximum given begin and end indices, which we can then use to render a line in our waveform.
Crafting an Append-Only Min–Max Tree as a Search Index
Remembering computer science classes, I figured that reusing prior calculations can save a lot of work (dynamic programming). In addition, given very large data sets, trees can cut your query times tremendously by intelligently arranging data or—to put it differently—by allowing us to skip a vast amount of data quickly. The solution I explain in this blog post uses both approaches at their best. Here is the tree I modeled on top of the data:
The tree summarizes ranges of data points in the underlying array. I designed the tree nodes to contain the minimum and maximum values of their range. Since parent nodes represent the data of their left and right children, they contain the minimum and maximum of their combined ranges. Finally, the root node contains the minimum and maximum values of the entire data set. Because I want to keep the tree small and profit from caching effects, each node represents a sub-range or “bucket” of the array. The bucket size can be adjusted to match the cache sizes of the hardware best. This keeps the tree flat while still enabling fast linear scans inside the bucket.
Appending a Value, Updating the tree
There are two tasks that come with insertion: updating the tree and, optionally, extending the tree. When appending a value, it needs to update the overlying tree structure. When inserted, the value needs to find and update its respective tree leaf, which, in turn, must inform its parent node and so on. I hope that it’s easy to see that if a node’s minimum or maximum do not change, it does not need to inform its parent node. Using this optimization, the average-case complexity of insertion is very low. The code snippet below illustrates this recursive “bubbling up” of a value trough the tree:
template<T>;voidMinMaxTree<T>::bubbleUpMinMax(T value, size_t treeIndex){//updates and returns true, if we altered minmax in the nodeauto minmax =[&](T value)->bool{auto&node = m_treeOntopOfBuckets.at(treeIndex);constauto oldmin = node.min;constauto oldmax = node.max; node.min = std::min(value, node.min); node.max = std::max(value, node.max);return(value < oldmin || value > oldmax);};//we are at the root, break recursionif(treeIndex == m_treeCapacity/2){minmax(value);return;}//update node and optionally bubble up furtherelse{if(minmax(value))bubbleUpMinMax(value,parent(treeIndex));}}
The second problem when inserting a value is that our tree structure might need to grow, because the new value breaches the capacity of the existing tree. Here, the tree needs to extend “sideways” to leave its complete old structure intact and form a new node on top. For this, I
double the trees size and mirror its current structure to extend its reach,
make a new root,
copy the data from the old root into the new root.
Now that we have doubled the tree size, we can again insert data until we need to expand again. A note on the default values inside our nodes: I initialize all nodes with the highest possible value as minimum and lowest possible value as maximum.
template<T> MinMax{ T min = std::numeric_limits<T>::max(); T max = std::numeric_limits<T>::lowest();//it’s not ::min(), this costed me hours};
So when actual real values enter the array, they will change min and max, because they are different to these default extremes. BE WARNED!std::numeric_limits::min() represents the smallest positive representation of a value, not the lowest possible number. I learned it the hard way.
Indexing: Squeezing the Tree into an Array
In our case, I wanted to optimize the tree accesses and its growth without using a pointer implementation that would link a node to its children using pointers. Instead, I adopted a commonly used trick to squeeze heaps into arrays, by letting the left child of an element be at 2 × index and the right child at 2 × index + 1. While it seems that I could have used this approach for this project as well, growing the tree would require me to insert and make space at many places. Instead, I went with an infix indexing method, putting the tree into an array in a “left node–root node–right node” order like this:
This is nice for several reasons:
It eliminates the need for the use of pointer chasing.
Nodes and their parents are fairly close (in the lower tree).
Expanding the tree is just doubling the array in size.
The root node will be in the middle of the array.
To have convenient ways of accessing rightChild and leftChild as well as the parentNode from a certain index, we have to look at the binary representation of our indices. Note how the parent index, for instance, always is the next higher “pure” power of 2. It would be a bit cumbersome to explain all the magic bit in text form, instead the code tells it best: leftChild Find lowest set bit, unset it, and set the one-lower bit
template<T>size_t MinMaxTree<T>::leftChild(size_t index)const{int lsb =ffs(index); index &=~(1UL<<(lsb-1)); index |=1UL<<(lsb-2);return index;}
rightChild Find lowest set bit and set the one-lower bit
template<T>size_t MinMaxTree<T>::leftChild(size_t index)const{int lsb =ffs(index); index |=1UL<<(lsb-2);return index;}
parentNode Find lowest set bit, unset it, and set the one-higher bit
template<T>size_t MinMaxTree<T>::parent(size_t index)const{int lsb =ffs(index); index &=~(1UL<<(lsb-1)); index |=1UL<< lsb;return index;}
All Functions use the glibc-provided intrinsic ffs, which can be found in <strings.h> (FindFirstSet to identify the lowest-significant set bit). On Windows, the intrinsic BitScanForward can be used to accomplish this. Similarly, I wrote a function to return the actual range in our data array a particular node covers.
Querying a Range
Now that we have all tools in hand, we can finally implement the range query itself. We recursively query the tree nodes, starting from root downward. We check how a node’s range relates to our query range and query its children for the minimum and maximum, according to the following few cases: Case 1: The node’s range is outside of the query range → Return the uninitialized min/max. Case 2: The node’s range is fully enclosed in the query range → That’s the best case, we just return the node’s min/max. Case 3: The node’s range is partly inside, partly outside the query range or the node covers more than the range and we can split → Split, query left and right, and combine the results. Case 4: The node’s range overlaps and we are at a leaf node → We need to scan through the bucket.
template<T>MinMax<T>::queryNode(size_t nodeIndex, size_t rangeBegin, size_t rangeEnd)const{// […] calculate Node Range, store in NodeBegin, NodeEnd// node outside range, return empty MinMax()// |------range----|// |---node---| or |---node---|// this should never happen, but safe is safeif((nodeEnd < rangeBegin)||(nodeBegin > rangeEnd))return MinMax<T>// range spans node fully, return nodes' MinMax// |---------range-----------|// |------node------|if((rangeBegin <= nodeBegin)&&(rangeEnd >= nodeEnd))return m_treeOntopOfBuckets[nodeIndex];// node cuts range left or right or both// |------range----|// |---node---| or |---node---| or// |--------node-------|if((nodeBegin <= rangeBegin )||(nodeEnd >= rangeEnd)){const MinMax<T> leftMinMax =queryNode(leftChild(nodeIndex), rangeBegin, rangeEnd);const MinMax<T> rightMinMax =queryNode(rightChild(nodeIndex), rangeBegin, rangeEnd); MinMax<T> resultMinMax; resultMinMax.min = std::min(rightMinMax.min, leftMinMax.min); resultMinMax.max = std::max(rightMinMax.max, leftMinMax.max);return resultMinMax;}// Node is leaf, re-scan its bucket for subrange// |---------range-----------|// |----node(leaf)----| or |----node(leaf)----|if( nodeIndex %2==1){ MinMax<T> scanResult;for(size_t i = std::max(nodeBegin, rangeBegin); i <= std::min(nodeEnd, rangeEnd); i++){ scanResult.min = std::min(m_dataInBuckets[i], scanResult.min); scanResult.max = std::max(m_dataInBuckets[i], scanResult.max);}return scanResult;}}
Results
Visualizing 300.000 data points took more than 2 seconds with the naïve approach (drawing lines from point to point) on the embedded test hardware, while this approach brought down the time for queries to about 3 ms, such that the pure line rendering now even accounts for the bigger part of the cost (6 ms). We again have reached levels below the juicy target of 16 ms.
Further Optimization: Find Lowest Common Parent
In a similar setup, my colleague James Turner found that many of our node visits actually were searching down the tree for the lowest common node covering the full range. In these cases, we often ended up splitting the node in its two children, one of which containing the range fully and the other not containing the range at all. This initial search down the tree took most of the node visits, while the actual desired min max scan (collecting all the nodes containing the range) was less than that. So instead of searching from the top, beginning at the root node, I now calculate the left and right leaves, then determine the lowest common node spanning the full range, and perform the query from there (which is a simple loop on bit operations).
I hope, you found this useful. I am happy to hear your comments about this solution. At KDAB, we do these kind of optimizations specific to low-end hardware all the time, ask us if we can help you or check out our Profiling Workshop to learn how to find and tackle performance bottlenecks in your own projects.
Great post! This looks like you "(re-)invented" one of the polyline simplification algorithms. We implemented couple of such algorithms in LabPlot last year(https://labplot.kde.org/2017/04/09/labplot-2-4-released/)). Those are documented in https://phabricator.kde.org/T3507 - in case you're interested, you can maybe check this collection of useful links here. I think KST hat also something similar in place.
Independent of this, your results with 300k points visualized within 2 seconds on the embedded hardware are still very impressive. Can you provide a small code snippet for how you benchmarked this?
23 - Nov - 2018
Christoph Sterz
Hi, Alexander.
Thank you for your comment!
It is interesting and in the same time a bit sad, how often engineers need to reinvent concepts although solutions exist.
In the same time, each approach can find new tricks and might be custom-tailored to the needs of the individual problem.
I think with "within 2 seconds" you might mean the actual end-result (3ms for the queries)?
(2 seconds was a single renderpass on the original implementation, that was bad.)
I just used QElapsedTimer t; t.start(); […renderloop…] auto time = t.nsecs(); around the data-render-pass which consistet of about 150 pixel columns in one loop.
Because I was curious, I did the measurement twice: once with the regular QPainter::drawLine() and one without the QPainter part.
This is how I came to the results which I describe in the Text. Of course QElapsedTimer has some overhead here, but in the realm of miliseconds, I considered it less of an issue.
There is one more aspect, that I did not describe in the Blog: Imagine an initial state with not-so-much sensor data. In the process when more and more data accumulates, the graph might need to switch from point-to-point-rendering to this optimized rendering. I can imagine, that in programs like LabPlot, handling this transition, such that it does not appear as a visual glitch is quite challenging.
PS: In the meantime some people on the cpp-reddit found a bit more about the theoretical concepts behind all of this:
WP: Range Miniumum Query and WP: Cartesian tree
23 - Nov - 2018
Uwe
Actually Qwt does something similar, when enabling QwtPlotCurve::FilterPointsAggressive ( since Qwt 6.2 ).
Maybe interesting in this context: the Qt raster paint engine had a nasty bug for a couple of versions, when drawing a vertical line of one 1 pixel width, that affected this optimization.
29 - Nov - 2018
Damian Dixon
Interesting approach that I will have to remember for future use.
I've recently run into a similar issue of storing and displaying a huge number of data points (10 days of data with two sets of data produced 4 times a second). However I had to use C# and Live-Charts.
The approach I took was to calculate the start and end index into the circular buffer for the segment that I needed to display. The data was then thinned based upon the length of the time segment being displayed.
Needless to say WPF made it hard to get the required performance.
Christoph Sterz
Senior Software Engineer
Christoph Sterz is a Senior Software Engineer at KDAB
Our hands-on Modern C++ training courses are designed to quickly familiarize newcomers with the language. They also update professional C++ developers on the latest changes in the language and standard library introduced in recent C++ editions.
4 Comments
22 - Nov - 2018
Alexander
Great post! This looks like you "(re-)invented" one of the polyline simplification algorithms. We implemented couple of such algorithms in LabPlot last year(https://labplot.kde.org/2017/04/09/labplot-2-4-released/)). Those are documented in https://phabricator.kde.org/T3507 - in case you're interested, you can maybe check this collection of useful links here. I think KST hat also something similar in place. Independent of this, your results with 300k points visualized within 2 seconds on the embedded hardware are still very impressive. Can you provide a small code snippet for how you benchmarked this?
23 - Nov - 2018
Christoph Sterz
Hi, Alexander. Thank you for your comment! It is interesting and in the same time a bit sad, how often engineers need to reinvent concepts although solutions exist. In the same time, each approach can find new tricks and might be custom-tailored to the needs of the individual problem. I think with "within 2 seconds" you might mean the actual end-result (3ms for the queries)? (2 seconds was a single renderpass on the original implementation, that was bad.) I just used
QElapsedTimer t; t.start(); […renderloop…] auto time = t.nsecs();
around the data-render-pass which consistet of about 150 pixel columns in one loop. Because I was curious, I did the measurement twice: once with the regular QPainter::drawLine() and one without the QPainter part. This is how I came to the results which I describe in the Text. Of course QElapsedTimer has some overhead here, but in the realm of miliseconds, I considered it less of an issue. There is one more aspect, that I did not describe in the Blog: Imagine an initial state with not-so-much sensor data. In the process when more and more data accumulates, the graph might need to switch from point-to-point-rendering to this optimized rendering. I can imagine, that in programs like LabPlot, handling this transition, such that it does not appear as a visual glitch is quite challenging. PS: In the meantime some people on the cpp-reddit found a bit more about the theoretical concepts behind all of this: WP: Range Miniumum Query and WP: Cartesian tree23 - Nov - 2018
Uwe
Actually Qwt does something similar, when enabling QwtPlotCurve::FilterPointsAggressive ( since Qwt 6.2 ). Maybe interesting in this context: the Qt raster paint engine had a nasty bug for a couple of versions, when drawing a vertical line of one 1 pixel width, that affected this optimization.
29 - Nov - 2018
Damian Dixon
Interesting approach that I will have to remember for future use. I've recently run into a similar issue of storing and displaying a huge number of data points (10 days of data with two sets of data produced 4 times a second). However I had to use C# and Live-Charts. The approach I took was to calculate the start and end index into the circular buffer for the segment that I needed to display. The data was then thinned based upon the length of the time segment being displayed. Needless to say WPF made it hard to get the required performance.