Close

Presentation

Addition is Most You Need: Efficient Floating-Point SRAM Compute-in-Memory by Harnessing Mantissa Addition
DescriptionThe compute-in-memory (CIM) paradigm holds great promise to efficiently accelerate machine learning workloads. Among memory devices, static random-access memory (SRAM) stands out as a practical choice due to its exceptional reliability in the digital domain and balanced performance. Recently, there has been a growing interest in accelerating floating-point (FP) deep neural networks (DNNs) with SRAM CIM due to their critical importance in DNN training and high-accurate inference. This paper proposes an efficient SRAM CIM macro for FP DNNs. To achieve the design, we identify a lightweight approach that decomposes conventional FP mantissa multiplication into two parts: mantissa sub-addition (sub-ADD) and mantissa sub-multiplication (sub-MUL). Our study shows that while mantissa sub-MUL is compute-intensive, it only contributes to the minority of FP products, whereas mantissa sub-ADD, although compute-light, accounts for the majority of FP products. Recognizing "Addition is Most You Need", we develop a hybrid-domain SRAM CIM macro to accurately handle mantissa sub-ADD in the digital domain while improving the energy efficiency of mantissa sub-MUL using analog computing. Experiments with the MLPerf benchmark demonstrate its remarkable improvement in energy efficiency by 8.7×∼ 9.3× (7.3×∼8.2×) in inference (training) compared to a fully digital FP baseline without any accuracy loss, showcasing its great potential for FP DNN acceleration.
Event Type
Research Manuscript
TimeWednesday, June 2611:00am - 11:15am PDT
Location3002, 3rd Floor
Topics
Design
Keywords
In-memory and Near-memory Computing Architectures, Applications and Systems