Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

On the Design of Novel Attention Mechanism for Enhanced Efficiency of Transformers

SessionDo-More-with-Less: Optimizing AI Models for Inference Efficiencies

DescriptionWe present a new XOR-based attention function for efficient hardware implementation of transformers. While standard attention relies on matrix multiplication, we propose replacing the computation of this attention function with bitwise XOR operations. We mathematically analyze the information-theoretic properties of multiplication-based attention, demonstrating that it preserves input entropy, and then show that XOR-based attention approximately preserves the entropy of its input. Across various simple tasks, including arithmetic, sorting, translation, and text generation, we show comparable performance to baseline methods using scaled GPT models. XOR-based attention shows substantial improvement in power, latency, and area compared to the multiplication-based attention function.

Authors

Sumit Jha

Florida International University

Susmit Jha

SRI International

Rickard Ewetz

University of Central Florida

Alvaro Velasquez

University of Central Florida

Event Type

Research Manuscript