Online detection and diagnosis for radiation-induced errors in COTS microprocessors
Microprocessors (uPs) are the backbone of digital electronic systems commonly present in spacecraft.
They show complex and varied failure modes that need to be tackled in different ways, especially when attending to different criticality levels. We propose a new error detection and diagnosis approach that could be applied to any microprocessor (uP) technology, based on the trace interface. The trace interface is a resource which is commonly found in modern uPs for software debugging and application profiling. For these purposes, it provides relevant data with low latency in a non-intrusive manner. It is left unused when the application development is completed, so it can be reused with no extra overheads. We are presenting a new solution to tackle both radiation hardening and testability challenges regarding COTS uPs.
The idea is to leverage the information available at the trace interface to detect and diagnose errors in uPs, supporting several development phases at component and system level:
- At design phase, by providing error detection and diagnosis results during technology characterization tests.
- At qualification phase, to detect and advise about the severity of observed errors and provide useful diagnosis information to enhance a given application to meet dependability requirements.
- At operational phase, to work side by side with a uP to check the integrity of the executed application, raise an alert upon error, and provide diagnosis information to perform the necessary corrective action with low latency.
This solution is currently under development at ARQUIMEA as an IP core compatible with ARM Cortex-A9 processor core. It has been successfully demonstrated at laboratory environment (TRL4) obtaining high error detection rate (up to 99.9%) and useful diagnosis information as published in research journals. It could be extended to a wider range of ARM processor cores, which are becoming increasingly attractive in space industry, as they use common trace resources.