Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

INSPIRE: Accelerating Deep Neural Networks via Hardware-friendly Index-Pair Encoding

SessionDo-More-with-Less: Optimizing AI Models for Inference Efficiencies

DescriptionDeep Neural Network (DNN) inference consumes significant computing resources and development efforts due to the growing model size. Quantization is a promising technique to reduce the computation and memory cost of DNNs. Most existing quantization methods rely on fixed-point integers or floating-point types, which require more bits to maintain model accuracy. In contrast, variable-length quantization, which combines high precision for values with significant magnitudes (i.e., outliers) and low precision for normal values, offers algorithmic advantages but introduces significant hardware overhead due to variable-length encoding and decoding. Also, existing quantization methods are less effective for both (dynamic) activations and (static) weights due to the presence of outliers.

In this work, we propose INSPIRE, an algorithm/architecture co-designed solution that employs an Index-Pair (INP) quantization and handles outliers globally with low hardware overheads and high performance gains. The key insight of INSPIRE lies in identifying typical features associated with important values, encoding them as indexes, and precomputing corresponding results for efficient storage in lookup table. During inference, the results of inputs with paired index can be directly retrieved from the table, which eliminates the need for any computational overhead. Furthermore, we design a unified processing element architecture for INSPIRE and highlight its seamless integration with existing DNN accelerators. As a result, INSPIRE-based accelerator surpasses the state-of-the-art quantization accelerators with a remarkable $9.31\times$ speedup and $81.3\%$ energy reduction, respectively, while maintaining superior model accuracy.

Authors

Fangxin Liu

Shanghai Jiaotong University

Ning Yang

Shanghai Jiao Tong University

Zhiyan Song

University of Shanghai for Science and Technology

Zongwu Wang

Shanghai Jiaotong University

Haomin Li

Shanghai Jiao Tong University

Shiyuan Huang