BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180034Z
LOCATION:Level 2 Lobby
DTSTART;TZID=America/Los_Angeles:20240625T180000
DTEND;TZID=America/Los_Angeles:20240625T190000
UID:dac_DAC 2024_sess236_RESEARCH1425@linklings.com
SUMMARY:A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wire
 less Baseband Processing
DESCRIPTION:Work-in-Progress Poster\n\nLimin Jiang, Yi Shi, Haiqin Hu, Qin
 gyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye
 , Shan Cao, and Zhiyuan Jiang (Shanghai University)\n\nWireless baseband p
 rocessing (WBP) is a key element of wireless communications, with a series
  of signal processing modules to improve data throughput and counter chann
 el fading. Conventional hardware solutions, such as digital signal process
 ors (DSPs) and more recently, graphic processing units (GPUs), provide var
 ious degrees of parallelism, yet they both fail to take into account the c
 yclical and consecutive character of WBP. Furthermore, the large amount of
  data in WBPs cannot be processed quickly in symmetric multiprocessors (SM
 Ps) due to the unpredictability of memory latency. To address this issue, 
 we propose a hierarchical dataflow-driven architecture to accelerate WBP. 
 A \textit{pack-and-ship} approach is presented under a non-uniform memory 
 access (NUMA) architecture to allow the subordinate tiles to operate in a 
 bundled access and execute manner. We also propose a multi-level dataflow 
 model and the related scheduling scheme to manage and allocate the heterog
 eneous hardware resources. Experiment results demonstrate that our prototy
 pe achieves $2\times$ and $2.3\times$ speedup in terms of normalized throu
 ghput and single-tile clock cycles compared with GPU and DSP counterparts 
 in several critical WBP benchmarks. Additionally, a link-level throughput 
 of $288$ Mbps can be achieved with a $45$-core configuration.\n\nTopic: AI
 , Autonomous Systems, Cloud, Design, EDA, Embedded Systems, IP, Security
END:VEVENT
END:VCALENDAR
