Close

Presentation

MaxiMals: A Low-cost and Effective Technique for Corrupted Values Correction in Vision Transformers
DescriptionVision Transformers (ViTs) are highly accurate Machine Learning (ML) models. However, their large size and com-
plexity increase the expected error rate due to hardware faults. Measuring the error rate of large ViT models is challenging, as conventional microarchitectural fault simulations can take years to produce statistically significant data. This paper proposes a two-level evaluation based on data collected through more than 70 hours of neutron beam experiments and more than 600 hours of software fault simulation. We consider 12 ViT models executed in 2 NVIDIA GPU architectures. We first characterize the fault model in ViT's kernels to identify the faults that are more likely to propagate to the output. We then design dedicated procedures efficiently integrated into the ViT to locate and correct these faults. We propose Maximum corrupted Malicious values (MaxiMals), an experimentally tuned low-cost mitigation solution to reduce the impact of transient faults on ViTs. We demonstrate that MaxiMals can correct 90.7% of critical failures, with execution time overheads as low as 5.61%.
Event Type
Research Manuscript
TimeWednesday, June 261:30pm - 1:45pm PDT
Location3001, 3rd Floor
Topics
Autonomous Systems
Keywords
Autonomous Systems (Automotive, Robotics, Drones)