Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now

Control Flow Divergence Optimization by Exploiting Tensor Cores

SessionModeling, Software, and Architecture Just Got Smarter

DescriptionKernels are scheduled on Graphics Processing Units (GPUs) in the granularity of warp, a bunch of concurrently executing threads. When executing kernels with conditional branches, threads within a warp may execute different branches sequentially, resulting in a considerable utilization loss and unpredictable execution time, known as the control flow divergence. This paper proposes a novel method to predict threads' execution path before the kernel launch by deploying a branch prediction network on the GPU's tensor cores, capable of parallel running with CUDA cores. Combined with a well-designed thread data reorganization algorithm, this solution can mitigate GPUs' control flow divergence problem.

Authors

Weiguang Pang

Qilu University of Technology

Xu Jiang

University of Electronic Science and Technology of China

Songran Liu

Northeastern University

Lei Qiao

Beijing Institute of Control Engineering

kexue fu

Qilu University of Technology

longxiang Gao