We’ve explained some of the architectural elements of building your own firmware over-the-air (OTA) soution for IoT and embedded systems in a previous blog. But what about the hardware requirements? Or processes? Let’s explore issues that are essential to solve for a robust OTA solution.
Maximizing flash longevity
Modern devices rely on solid-state drives, which leverage NAND Flash memory with a finite number of program-erase cycles. Day-to-day operations don’t typically compromise a drive’s lifespan because wear-leveling mechanisms within flash controllers distribute writes and erases across blocks to prolong device longevity.
However, for systems that heavily rely on frequent disk writes, the progressive reduction of available space is a factor that can’t be overlooked. Persistent logging and database flushing are two examples that need a more strategic approach. Buffering up writes and disabling the 'always flush' functionality in logging systems during production builds can reduce unnecessary wear. If you cannot reduce constant disk write tasks, allocate additional reserved space to facilitate wear leveling or invest in NOR flash for its superior endurance, despite its higher cost.
Avoiding SD card risks
While SD cards are popular in hobbyist projects and consumer electronics, booting off of them poses reliability concerns. This is due to loosening contacts from vibration or thermal cycles, variable media read performance, and risks to flash image security. Opting for onboard flash mitigates the risks associated with removable media and makes for a robust device.
Timing your reboots
Determining the ideal time for firmware updates and subsequent reboots depends on the device’s operational requirements and usage patterns. Other factors to consider include the length of the reboot process, requisite downtime, uptime guarantees, and implications of device unavailability. For example, automotive software updates can be scheduled during periods when the vehicle is not normally in operation. Consumer devices that are routinely powered off can adopt even simpler strategies, since they’re regularly cycled off and can impose an occasional wait on the consumer.
However, devices essential for continuous operation, like industrial controllers or healthcare monitoring systems, demand a more nuanced approach to avoid disruptions. For such devices, scheduling updates during predictable maintenance windows may be necessary. In cases where uptime is essential, obtaining user consent before initiating an update may be the best approach.
Staying secure
Regularly tracking and addressing common vulnerabilities and exposures (CVEs), integrating security patches, and issuing updates are non-negotiable for product maintenance. This includes the operating system, drivers, and third-party libraries. Implementing a regular six-month review of your product’s software bill of materials (SBOM) can help identify any critical vulnerabilities across the entire software stack.
Distributing your updates
To avoid overwhelming server resources, split firmware updates into manageable batches to balance them across your user base. Geographical segmentation allows for time zone-appropriate update scheduling, which can be beneficial for customer support operations. Once you have a batch release mechanism in place, try leveraging test groups or beta users to gather feedback on releases before rolling them out to all users.
Integrating and automating updates
Although it’s tempting, don’t delay integrating update functionality into your product – add it early in the development cycle. This avoids compressed testing cycles and potential oversights, allows you to distribute bug fixes and features during development and testing, and gives the update feature a thorough workout before you’re relying on it in the field.
A robust build process for firmware updates should minimize manual intervention through automated versioning, image signing, server uploads, and deployment initiation. Streamlined automation ensures confident deployment of new updates, including high-pressure situations like urgent security patches or critical bug fixes.
Making device-unique keys
Using a single cryptographic key across devices simplifies manufacturing, but it introduces a significant security risk that could endanger every product. Assigning each device a unique key, stored securely within the device is a better approach that enhances security. While device private keys should be read-only under normal conditions, it’s essential to make provisions for key replacement in case it’s needed.
And there’s more…
It takes a lot to create an effective OTA strategy, and we’ve only touched on a few of the necessary considerations. To get more info on this topic, check out our best practice guide, Updates Outside the App Store.