Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

Drift: Leveraging Distribution-based Dynamic Precision Quantization for Efficient Deep Neural Network Acceleration

SessionIt's Not 8b Retro-Gaming, It's State-Of-The-Art Architectures Using Quantization, Sparsity, and Compression!

DescriptionQuantization is one of the most hardware-efficient ways to reduce inference costs for deep neural network (DNN) models. Nevertheless, with the continuous growth of DNN model size, existing static quantization methods fail to utilize the sparsity of models sufficiently. Motivated by the pervasive dynamism in data tensors across DNN models, we propose a dynamic precision quantization algorithm to further reduce computational costs. Furthermore, to address the shortcomings of existing precision-flexible accelerators, we design a novel accelerator, Drift, and achieve online scheduling to efficiently support dynamic precision execution. Evaluation results show that Drift achieves 2.85x speedup and 3.12x energy saving over existing precision-flexible accelerators.

Authors

Lian Liu

State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences

Zhaohui Xu

Shanghai Tech University

Yintao He

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences

Ying Wang

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences

Huawei Li

Institute of Computing Technology, Chinese Academy of Sciences

Xiaowei Li