Session
It's Not 8b Retro-Gaming, It's State-Of-The-Art Architectures Using Quantization, Sparsity, and Compression!
Session Chairs
DescriptionThis session presents state-of-the-art work in architecture design focusing on optimization techniques such as quantization, sparsity, pruning, and compression for DNN accelerators. The session begins with a series of presentations on quantization, which is an increasingly popular and energy efficient technique used for deep neural networks (DNNs). The session presents other hot topic techniques such as utilization and optimizing sparsity and pruning, with a focus on the ever-popular transformer attention architecture.
Event TypeResearch Manuscript
TimeThursday, June 2710:30am - 12:00pm PDT
Location3003, 3rd Floor
AI
Design
AI/ML Architecture Design
Presentations