Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

Addition is Most You Need: Efficient Floating-Point SRAM Compute-in-Memory by Harnessing Mantissa Addition

SessionMemories Have a Mind of Their Own

DescriptionThe compute-in-memory (CIM) paradigm holds great promise to efficiently accelerate machine learning workloads. Among memory devices, static random-access memory (SRAM) stands out as a practical choice due to its exceptional reliability in the digital domain and balanced performance. Recently, there has been a growing interest in accelerating floating-point (FP) deep neural networks (DNNs) with SRAM CIM due to their critical importance in DNN training and high-accurate inference. This paper proposes an efficient SRAM CIM macro for FP DNNs. To achieve the design, we identify a lightweight approach that decomposes conventional FP mantissa multiplication into two parts: mantissa sub-addition (sub-ADD) and mantissa sub-multiplication (sub-MUL). Our study shows that while mantissa sub-MUL is compute-intensive, it only contributes to the minority of FP products, whereas mantissa sub-ADD, although compute-light, accounts for the majority of FP products. Recognizing "Addition is Most You Need", we develop a hybrid-domain SRAM CIM macro to accurately handle mantissa sub-ADD in the digital domain while improving the energy efficiency of mantissa sub-MUL using analog computing. Experiments with the MLPerf benchmark demonstrate its remarkable improvement in energy efficiency by 8.7×∼ 9.3× (7.3×∼8.2×) in inference (training) compared to a fully digital FP baseline without any accuracy loss, showcasing its great potential for FP DNN acceleration.

Authors

Weidong Cao

George Washington University

Jian Gao

Northeastern University

Xin Xin

University of Central Florida

Xuan Zhang

Northeastern University

Event Type

Research Manuscript