Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

RL-PTQ: RL-based Mixed Precision Quantization for Hybrid Vision Transformers

SessionThe Next Step to Efficient AI: Number Formats, Quantization and Beyond

DescriptionExisting quantization approaches incur significant accuracy loss when compressing hybrid transformers with low bit-width. This paper presents RL-PTQ, a novel post-training quantization (PTQ) framework utilizing reinforcement learning (RL). Our focus is on determining the most effective bit-width and observer for quantization configurations tailored for mixed-precision by grouping layers and addressing the challenges of quantization of hybrid transformers. We achieved the highest quantized accuracy for MobileViTs compared to the previous PTQ methods. Furthermore, our quantized model on PIM architecture exhibited an energy efficiency enhancement of 10.1× and 22.6× compared to the baseline model, on the state-of-the-art PIM accelerator and GPU, respectively.

Authors

Eunji Kwon

Postech

Minxuan Zhou

University of California, San Diego

Weihong Xu

University of California, San Diego

Tajana Rosing

University of California, San Diego

Seokhyeong Kang

Pohang University of Science and Technology

Event Type