BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180035Z
LOCATION:3003\, 3rd Floor
DTSTART;TZID=America/Los_Angeles:20240626T144500
DTEND;TZID=America/Los_Angeles:20240626T150000
UID:dac_DAC 2024_sess125_RESEARCH255@linklings.com
SUMMARY:MoC: A Morton-Code-Based Fine-Grained Quantization for Acceleratin
 g Point Cloud Neural Networks
DESCRIPTION:Research Manuscript\n\nXueyuan Liu, Zhuoran Song, Hao Chen, Xi
 ng Li, and Xiaoyao Liang (Shanghai Jiao Tong University)\n\nPoint Cloud Ne
 ural Network (PCNN) plays an essential role in various 3D applications, wi
 th some of them even being time-sensitive and safety-critical. However, th
 e large scale of unordered points with lengthy features results in heavy c
 omputational workloads, making them far from real-time processing. To addr
 ess this challenge, we propose MoC, a Morton-code-based fine-grained quant
 ization for accelerating PCNNs. Specifically, we utilize Morton code to ca
 pture the spatial locality among points. Then, we gather nearby points wit
 h similar features into a region. Considering the similarity in features o
 f nearby points, we propose to decompose features into base and offsets, w
 here the offsets fall within a narrow range. Building upon this, we introd
 uce a two-level mixed-precision quantization. In the first level, we quant
 ize offsets with low precision, while keeping the base in high precision t
 o ensure accuracy. For the second level, noticing the different data distr
 ibution of offsets across various regions, we employ two types of low prec
 ision at the region level, which provides opportunities to further acceler
 ate feature computations. To support our algorithm, we design a hardware a
 rchitecture that parallelizes the Morton code path with the critical path.
  In our extensive experiments on various datasets, our algorithm-architect
 ure co-designed method demonstrates 12x, 6.3x, 4.7x, 3.8x, 3.4x and 2.8x s
 peedup and 19.3x, 9.7x, 6.0x, 5.2x, 4.6x and 4.1x energy savings over CPU,
  Server and Edge GPUs, state-of-the-art ASICs (incl. PointAcc, MARS, PRADA
 ) with negligible accuracy loss.\n\nTopic: AI, Design\n\nKeyword: AI/ML Sy
 stem and Platform Design\n\nSession Chairs: Amin Firoozshahian (Rain AI) a
 nd Thierry Tambe (Stanford University)
END:VEVENT
END:VCALENDAR
