Close

Presentation

RL-PTQ: RL-based Mixed Precision Quantization for Hybrid Vision Transformers
DescriptionExisting quantization approaches incur significant accuracy loss when compressing hybrid transformers with low bit-width. This paper presents RL-PTQ, a novel post-training quantization (PTQ) framework utilizing reinforcement learning (RL). Our focus is on determining the most effective bit-width and observer for quantization configurations tailored for mixed-precision by grouping layers and addressing the challenges of quantization of hybrid transformers. We achieved the highest quantized accuracy for MobileViTs compared to the previous PTQ methods. Furthermore, our quantized model on PIM architecture exhibited an energy efficiency enhancement of 10.1× and 22.6× compared to the baseline model, on the state-of-the-art PIM accelerator and GPU, respectively.
Event Type
Research Manuscript
TimeTuesday, June 252:06pm - 2:24pm PDT
Location3001, 3rd Floor
Topics
AI
Keywords
AI/ML Algorithms