Close

Presentation

Drift: Leveraging Distribution-based Dynamic Precision Quantization for Efficient Deep Neural Network Acceleration
DescriptionQuantization is one of the most hardware-efficient ways to reduce inference costs for deep neural network (DNN) models. Nevertheless, with the continuous growth of DNN model size, existing static quantization methods fail to utilize the sparsity of models sufficiently. Motivated by the pervasive dynamism in data tensors across DNN models, we propose a dynamic precision quantization algorithm to further reduce computational costs. Furthermore, to address the shortcomings of existing precision-flexible accelerators, we design a novel accelerator, Drift, and achieve online scheduling to efficiently support dynamic precision execution. Evaluation results show that Drift achieves 2.85x speedup and 3.12x energy saving over existing precision-flexible accelerators.
Event Type
Research Manuscript
TimeThursday, June 2710:45am - 11:00am PDT
Location3003, 3rd Floor
Topics
AI
Design
Keywords
AI/ML Architecture Design