Close

Presentation

On the Design of Novel Attention Mechanism for Enhanced Efficiency of Transformers
DescriptionWe present a new XOR-based attention function for efficient hardware implementation of transformers. While standard attention relies on matrix multiplication, we propose replacing the computation of this attention function with bitwise XOR operations. We mathematically analyze the information-theoretic properties of multiplication-based attention, demonstrating that it preserves input entropy, and then show that XOR-based attention approximately preserves the entropy of its input. Across various simple tasks, including arithmetic, sorting, translation, and text generation, we show comparable performance to baseline methods using scaled GPT models. XOR-based attention shows substantial improvement in power, latency, and area compared to the multiplication-based attention function.
Event Type
Research Manuscript
TimeWednesday, June 262:15pm - 2:30pm PDT
Location3003, 3rd Floor
Topics
AI
Design
Keywords
AI/ML System and Platform Design