BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180034Z
LOCATION:3001\, 3rd Floor
DTSTART;TZID=America/Los_Angeles:20240626T133000
DTEND;TZID=America/Los_Angeles:20240626T134500
UID:dac_DAC 2024_sess108_RESEARCH1617@linklings.com
SUMMARY:Cross-Layer Reliability Evaluation and Efficient Hardening of Larg
 e Vision Transformers Models
DESCRIPTION:Research Manuscript\n\nLucas Roquet (University of Rennes), Fe
 rnando Fernandes dos Santos (INRIA), Paolo Rech (University of Trento), Ma
 rcello Traiola and Olivier Sentieys (INRIA), and Angeliki Kritikakou (Univ
 ersité de Rennes)\n\nVision Transformers (ViTs) are highly accurate Machin
 e Learning (ML) models. However, their large size and com-\nplexity increa
 se the expected error rate due to hardware faults. Measuring the error rat
 e of large ViT models is challenging, as conventional microarchitectural f
 ault simulations can take years to produce statistically significant data.
  This paper proposes a two-level evaluation based on data collected throug
 h more than 70 hours of neutron beam experiments and more than 600 hours o
 f software fault simulation. We consider 12 ViT models executed in 2 NVIDI
 A GPU architectures. We first characterize the fault model in ViT's kernel
 s to identify the faults that are more likely to propagate to the output. 
 We then design dedicated procedures efficiently integrated into the ViT to
  locate and correct these faults. We propose Maximum corrupted Malicious v
 alues (MaxiMals), an experimentally tuned low-cost mitigation solution to 
 reduce the impact of transient faults on ViTs. We demonstrate that MaxiMal
 s can correct 90.7% of critical failures, with execution time overheads as
  low as 5.61%.\n\nTopic: Autonomous Systems\n\nKeyword: Autonomous Systems
  (Automotive, Robotics, Drones)\n\nSession Chair: Hokeun Kim (Arizona Stat
 e University)
END:VEVENT
END:VCALENDAR
