Founder, CEO and consultant, Code Archers AB,
Feb 2017 - present, Stockholm, Sweden
Open source projects:
- Author and maintainer of a SHA-256 implementation. Among others, it is being used in WinBtrfs, as well as in some industrial applications.
- After some basic and advanced courses in Haskell to improve my knowledge of functional programming, currently exploring the C++ ranges library (C++20, 23 and 26), which for instance enables more immutability and lazy evaluation, through a personal C++ project in the field of cryptography. Immutability, in particular, is generally considered as a way to drastically decrease the number of possible bugs. I will probably release the source code for that project at some point in the future.
Customer assignments:
-
For several years, I have sucessfully been recommending
and encouraging the broader adoption of modern C++ (up to
C++20) in a large embedded software code base. I have in
multiple practical cases demonstrated how it can be used
to improve type-safety, and to provide compile
time-configurability (through C++ templates and concepts),
which allows a dramatic increase of code sharing between
applications, without any performance impact.
I have made extensive use of
constexpr
andstatic_assert
to implement "compile time-unit tests", a very powerful mechanism made possible by modern C++. - Integration of a module for differential control and traction control, generated from Matlab Simulink, in a rig control system for a mine loader. Implemented failure mode, which deactivates the features when a critical sensor error is detected, and logs such errors.
- Implemented a fairly generic time based-filter system, which is used both to solve a race condition in a module handling a gear turn knob, and to filter out transient out-of-range CAN J1939 parameter values during startup and device reboot.
- Development, maintenance and troubleshooting of an automated test framework in Python for a mine rig control system (RCS). The test programs start the RCS on the host, in an environment that emulates inputs and to a certain extent also simulates the controlled units (combustion engine or electric motor, inverters, hydraulics systems, transmission control unit, battery management system, etc.). The RCS is a Linux-based system and communicates with the controlled devices over CAN J1939 and CANopen. Example: creation of a new pattern, based on the Python with-statement, for events that are expected to occur after a certain delay. The new pattern entirely eliminates a certain category of race conditions, while being syntactically convenient.
- Implemented a Python program to generate C++ integration code for control modules generated from Matlab Simulink (also C++ code). For every such model, the program saves several hours of work for the responsible software developers. The program is modular, with a module that parses the Simulink generated code into an intermediary data representation, and a module that generates C++ code from that representation. Such models are routinely used in the target system, a mine rig control system, for power control, differential lock control, traction control, etc.
- Implemented support for a new J1939 IMU (inertial measurement unit) device, in a mining truck control system, meant to be used for load weighing adjustment.
- Refactored a solution for distribution of application messages over an SPI bus using unique pointers instead of manually allocated/deallocated buffers. Manual allocation/deallocation is error prone (risk of memory leaks) and is generally discouraged in modern C++. This implementation showed how standard unique pointers can be used even when dynamic allocation is based on ad-hoc memory pools (to avoid fragmentation). This refactoring also allowed the removal of a large number of lines of code with no loss of functionality, which decreases the maintenance cost.
- Design and implementation of a solution for application level message distribution between and single master and multiple slaves on a communication bus. A general model was designed, where multiple OS-threads on a master shall be able to communicate with multiple OS-threads running on multiple physical slaves. Enumerated message types are mapped at run time to true C++ types, enabling type-safety and avoiding bugs.
-
In a project initiated to work around component shortage
in a highly successful product, implementation of an MCU-based
replacement for an FPGA. Two blocks are implemented:
- A communication protocol master for an RS-485 multidrop bus.
- Handlers for three SPI analog-to-digital converters.
- Design and implementation of a generic thread-safe serial flash storage solution, with support for rotating logs and an arbitrary key-value store. The exact contents of the storage, the so-called "flash map", is configured at compile-time with the help of C++17 variadic templates. This is used to replace ad-hoc solutions in multiple applications, which allows the removal of several hundreds lines of code in each concerned application, without loss of functionality.
- Design and implementation of a solution to synchronize the real-time scheduler and data transmission of a slave to synchronization packets received from a bus master. The real time synchronization algorithm involves inducing the master period and dynamically adjusting the current slot length, within configurable margins. The synchronization of transmission involves optimal use of multiple peripherals (timer, DMA and UART). The code is written in C (update of legacy device drivers) and C++17.
- Refactoring of a torque handling module written in C and using static structures into configurable and reusable C++ classes. The primary goal was to correct multiple race conditions, among others due to incorrect handling of battery glitches, but at the same time, care was taken to make the objects adapatable to similar needs in the same application and in other similar applications. The result was a sucessful removal of multiple bugs, and the ability to replace multiple similar modules by shared classes. This was also an occasion to introduce more modern C++ (C++17) in the applications.
- Design and implementation of a bootloader for a Cortex-M MCU, as a workaround for a blocking issue introduced by the MCU vendor in a new revision of the MCU. The booloader shipped with the MCU has a critical bug which the vendor is unable to fix. Fortunately, there was enough free Flash memory to load an alternative bootloader during production, which emulates the vendor's SPI-based data protocol.
- Participation in the integration of a new digital torque transducer in an electrical nutrunner. The transducer is attached to the tap and rotates with it. This gives higher accurracy, but makes the transmission of electrical signals more challenging. A slip ring is used. The communication is asynchronous, half-duplex, over a single wire. Design and implementation of the data procotol for software update on master and slave. Design and implementation of a bootloader which runs from RAM. Design and implementation of the software solution for the reception of real time samples with a fixed frequency. Since there is no strict synchronization between the sender and the receiver, an adaptative algorithm was designed and implemented, based on a DMA circular buffer.
- Design and implementation of a solution to avoid race conditions in the detection and handling of battery connections/disconnections on a battery powered nutrunner. In particular, proper handling of series of glitches required the implementation of a specific real-time state machine. This is quite a typical example of a problem in which the lucky case is pretty much irrelevant in the design of a proper solution. Robustness can only be achieved by solving the worst case, at every step.
- Troubleshooting of Wi-Fi disconnection issues with a radio module provided by an external vendor. In order to prove that the issue was in the vendor's module, I isolated it by implementing a simple TCP client and server in Python on Raspberry Pi boards, and captured the air traffic in Wireshark.
- Design and implementation of a solution for retransmission of message segments over an SPI link. The purpose was to improve the robustness of transmission in case of ESD issues that lead to CRC errors. The main design criteria was minimization of the impact on CPU time, which was a scarce resource. The solution only used one additional bit of signaling on the link.
- Design and implementation of a solution for update of software on multiple sub-systems in a new embedded product. A master was implemented in C++ under Linux. Since the master only has direct contact with one of the slaves, a software update relay was implemented on that slave, in C++ under FreeRTOS. All communication between master, relay and slaves was based on SPI. One important design criteria was code sharing: the addition of a new slave only requires the development of a thin layer of adapter code.
- Design and implementation of an EEPROM emulation layer for NOR-flash chips. This involved studying ideas available from academia (Algorithms and Data Structures for Flash Memories, in particular), and developing the data structures and algorithms for reading, writing, and garbage collection. One of the main design criteria was reliable wear leveling. Unit testing was based on a RAM-emulated NOR-flash area, and was crucial in ensuring stability and reliability of the solution. The reference use case used an SPI flash chip, although the solution in itself made no assumptions on the available interface, which was accessed through an abstract API.
- Solved some real-time overrun issues in a distributed software oscilloscope module. The running time of the most time-critical run-to-completion function, running on a 16-bit DSP, was drastically reduced, from 28 microseconds to 6 microseconds.
- Design and implementation of an encryption solution for a 16-bit DSP. This was necessary to protect secret data located on an SD-card. I selected AES as the obvious state-of-the-art solution, and chose the tiny-AES-c implementation. Since no obvious value was available for the initialization vector (IV), I choose to use a cryptographically secure hash computed from a unique serial number. Since I found no SHA-2 implementation that was small enough for the 16-bit DSP, I created and published a public domain SHA-256 implementation on my own time, and used it for this solution. The SHA-256 implementation includes extensive unit tests that are automatically run on Travis.
- Design and implementation of a mutex solution for a custom scheduler on a 16-bit DSP. Special attention was given to starvation-freedom and performance.
- Performance optimization in a custom scheduler on a 16-bit DSP. This involved a modification of the scheduler's context switching algorithm.
- Implementation of various support tools in Python for the needs of the project team. These involved custom SD card partitioning and formatting, serial communication over USB, etc.
- Proposed and partly implemented the adaptation of a legacy binary protocol to support a new user level interface for an industrial process. This involved the prioritization of features, a clever reuse and extension of the existing protocol, as well as an incremental implementation and regression testing at every step, in order to timely deliver the functionality for a crucial demonstration to the customer.
- Design and implementation of a solution to store and retrieve critical records on an SD card over SPI in an industrial application. This involved some performance optimizations in order to meet the response time requirements, as well as wear testing over several million records.
- Design and implementation of a solution to update software on five distinct CPUs in a distributed embedded system from an SD card over SPI, while answering status requests from a PC application over a UART.