Close

Presentation

A Real-time Execution System of Multimodal Transformer through PIM-GPU Collaboration
DescriptionMultimodal transformer excels in various applications, but faces great challenges such as high memory consumption and limited data reuse that hinder real-time performance. To address these issues, we propose a processing-in-memory (PIM)-GPU collaboration oriented compiler that optimizes the acceleration of multimodal transformers. The PIM-GPU synergy adapts well to multimodal transformers and improves execution time through dynamic programming algorithms. In addition, we introduce a tailored PIM allocation algorithm for variable-length inputs to further increase efficiency. Experimental results show an average end-to-end speedup of 15x.
Event Type
Research Manuscript
TimeTuesday, June 2510:30am - 10:45am PDT
Location3012, 3rd Floor
Topics
Embedded Systems
Keywords
Time-Critical and Fault-Tolerant System Design