Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

MaxiMals: A Low-cost and Effective Technique for Corrupted Values Correction in Vision Transformers

SessionChallenging the Autonomy Challenges

DescriptionVision Transformers (ViTs) are highly accurate Machine Learning (ML) models. However, their large size and com-
plexity increase the expected error rate due to hardware faults. Measuring the error rate of large ViT models is challenging, as conventional microarchitectural fault simulations can take years to produce statistically significant data. This paper proposes a two-level evaluation based on data collected through more than 70 hours of neutron beam experiments and more than 600 hours of software fault simulation. We consider 12 ViT models executed in 2 NVIDIA GPU architectures. We first characterize the fault model in ViT's kernels to identify the faults that are more likely to propagate to the output. We then design dedicated procedures efficiently integrated into the ViT to locate and correct these faults. We propose Maximum corrupted Malicious values (MaxiMals), an experimentally tuned low-cost mitigation solution to reduce the impact of transient faults on ViTs. We demonstrate that MaxiMals can correct 90.7% of critical failures, with execution time overheads as low as 5.61%.

Authors

Lucas Roquet

University of Rennes

Fernando Fernandes dos Santos