BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20240626T180033Z
LOCATION:Level 2 Lobby
DTSTART;TZID=America/Los_Angeles:20240626T180000
DTEND;TZID=America/Los_Angeles:20240626T190000
UID:dac_DAC 2024_sess237_RESEARCH1341@linklings.com
SUMMARY:Accelerating Large-scale Sparse LU Factorization for RF Circuit Si
 mulation
DESCRIPTION:Work-in-Progress Poster\n\nGuofeng Feng, Hongyu Wang, Zhuoqian
 g Guo, Mingzhen Li, and Tong Zhao (State Key Lab of Processors, Institute 
 of Computing Technology, Chinese Academy of Sciences); Zhou Jin (Super Sci
 entific Software Laboratory, China University of Petroleum-Beijing); and W
 eile Jia, Guangming Tan, and Ninghui Sun (State Key Lab of Processors, Ins
 titute of Computing Technology, Chinese Academy of Sciences)\n\nSparse LU 
 factorization is the indispensable building block of the circuit simulatio
 n, and dominates the simulation time, especially when dealing with large-s
 cale circuits. RF circuits has been increasingly emphasized with the evolu
 tion of ubiquitous wireless communication (i.e., 5G and WiFi). The RF simu
 lation matrices show a distinctive pattern of structured dense blocks, and
  this pattern has been inadvertently overlooked by prior works, leading to
  underutilization of computational resources. In this paper, by exploiting
  the block structure, we propose a novel blocked format for L and U factor
 s and re-design the large-scale sparse LU factorization accordingly, which
  leverages the data locality inherent in RF matrices. The data format tran
 sformation is streamlined, strategically eliminating the redundant data mo
 vement and costly indirect memory access. Moreover, the vector operations 
 is converted into matrix operations, enabling efficient data reuse and enh
 ancing data-level parallelism. The experiment results show that our method
  achieves superior performance to state-of-the-art implementation.\n\nTopi
 c: AI, Autonomous Systems, Cloud, Design, EDA, Embedded Systems, IP, Secur
 ity
END:VEVENT
END:VCALENDAR
