Tzung-Han Juang

Maximizing Data and Hardware Reuse for HLS with Early-Stage Symbolic Partitioning

While traditional HLS (High-Level Synthesis) converts “high-level” C-like programs into hardware automatically, producing high-performan… (see more)ce designs still requires hardware expertise. Optimizations such as data partitioning can have a large impact on performance since they directly affect data reuse patterns and the ability to reuse hardware. However, optimizing partitioning is a difficult process since minor changes in the parameter choices can lead to totally unpredictable performance. Functional array-based languages have been proposed instead of C-based approaches, as they offer stronger performance guarantees. This paper proposes to follow a similar approach and exposes a divide-and-conquer primitive at the algorithmic level to let users partition any arbitrary computation. The compiler is then free to explore different partition shapes to maximize both data and hardware reuse automatically. The main challenge remains that the impact of partitioning is only known much later in the compilation flow. This is due to the hard-to-predict effects of the many optimizations applied during compilation. To solve this problem, the partitioning is expressed using a set of symbolic tunable parameters, introduced early in the compilation pipeline. A symbolic performance model is then used in the last compilation stage to predict performance based on the possible values of the tunable parameters. Using this approach, a design space exploration is conducted on an Intel Arria 10 FPGAs (Field Programmable Gate Arrays), and competitive performance is achieved on the classical VGG and TinyYolo neural networks.

2025-01-16

ACM Transactions on Architecture and Code Optimization (TACO) (published)

doi.org

Maximizing Data and Hardware Reuse for HLS with Early-Stage Symbolic Partitioning

Tzung-Han Juang

Christophe Dubach

2025-01-01

ACM Trans. Archit. Code Optim. (published)

doi.org

Let Coarse-Grained Resources Be Shared: Mapping Entire Neural Networks on FPGAs

Tzung-Han Juang

Christof Schlaak

Christophe Dubach

2023-09-09

ACM Transactions on Embedded Computing Systems (published)

doi.org

Memory-Aware Functional IR for Higher-Level Synthesis of Accelerators

Christof Schlaak

Tzung-Han Juang

Christophe Dubach

Specialized accelerators deliver orders of a magnitude of higher performance than general-purpose processors. The ever-changing nature of mo… (see more)dern workloads is pushing the adoption of Field Programmable Gate Arrays (FPGAs) as the substrate of choice. However, FPGAs are hard to program directly using Hardware Description Languages (HDLs). Even modern high-level HDLs, e.g., Spatial and Chisel, still require hardware expertise. This article adopts functional programming concepts to provide a hardware-agnostic higher-level programming abstraction. During synthesis, these abstractions are mechanically lowered into a functional Intermediate Representation (IR) that defines a specific hardware design point. This novel IR expresses different forms of parallelism and standard memory features such as asynchronous off-chip memories or synchronous on-chip buffers. Exposing such features at the IR level is essential for achieving high performance. The viability of this approach is demonstrated on two stencil computations and by exploring the optimization space of matrix-matrix multiplication. Starting from a high-level representation for these algorithms, our compiler produces low-level VHSIC Hardware Description Language (VHDL) code automatically. Several design points are evaluated on an Intel Arria 10 FPGA, demonstrating the ability of the IR to exploit different hardware features. This article also shows that the designs produced are competitive with highly tuned OpenCL implementations and outperform hardware-agnostic OpenCL code.

2022-01-31

ACM Transactions on Architecture and Code Optimization (TACO) (published)

doi.org

Memory-Aware Functional IR for Higher-Level Synthesis of Accelerators

Christof Schlaak

Tzung-Han Juang

Christophe Dubach

2022-01-31

ACM Transactions on Architecture and Code Optimization (published)

doi.org