Flagging Critical Components to Prevent Transient Faults in Real-Time Systems

This paper proposes the use of metrics in design space exploration that highlight where in the structure of the model and at what point in the behaviour, prevention is needed against transient faults. Previous approaches to tackle transient faults focused on recovery after detection. Almost no research has been directed towards preventive measures. But in real-time systems, hard deadlines are performance requirements that absolutely must be met and a missed deadline constitutes an erroneous action and a possible system failure. This paper proposes the use of metrics to assess the system design to flag where transient faults may have significant impact. These tools then allow the design to be changed to minimize that impact, and they also flag where particular design techniques – such as coding of communications or memories – need to be applied in later stages of design.