DSAGEN: Democratizing Spatial Accelerator Research

Organizers

Jian Weng, Sihao Liu, Vidushi Dadu, Tony Nowatzki
PolyArch Research Group
University of California, Los Angeles
Date/Time: Saturday. Oct. 17th. 4PM - 7PM PST

Resources

Please download and build the binary release of our infrastructure or build it from the source. These include a full-stack implementation, including the extended ISA, binary linker, compiler, hardware simulator, RTL generator, Chipyard integration, and benchmarks. Please do get the infrastructure executable before the tutorial.
To set up the environment, you need to:
  • Use this Dockerfile to instantiate a container that has all the dependent packages installed. If you are not familiar with the usage of Docker please refer the quick tour below.
  • The commands below are for the purpose of tutorial. If you want to use our framework for your daily research, it is highly recommended to follow the instructions on our project wiki.
  • $ zsh # Please DO USE zsh or the behavior of source setup.sh may be undesirable
  • $ cd ~
  • $ wget "[the binary download link]" -O dsa-release.zip # note: DO NOT omit the quotes
  • $ unzip dsa-release.zip
  • $ source dsa-framework/setup.sh
  • $ git clone https://github.com/polyarch/dsa-examples # Examples for programming
  • $ git clone https://github.com/polyarch/dsa-cgra-gen # For hardware generation
  • $ cd dsa-examples/manual/01_vector_add && ./run.sh answer.out # Verify your installation
NOTE: If the link for binary release is too slow to download an alternative link is here.

Overview

Fig. 1: Synthesizing Programmable Accelerators

Because of the wanning benefit of transistor scaling, significant research has emerge for specialized accelerators, becuase of their promising performance and energy saving. While effective, the require intensive engineering for the hardware and software, and this efforts will be repeated when the underlying application domain shifts.
Ideally, one will be able to generate the accelerators based on the behaviors of the applications, and where these applications can be specified in a set of stead and user-friendly programming interfaces. In other words, we require a high-level synthesis flow for programmable accelerators. Figure 1 shows the paradigm of synthesizing programmable accelerators. In this tutorial, we will present our approach for programmable accelerator along with a research framework: DSAGEN, a full-stack infrastructure includes compilation, simulation, and RTL implementaion.
The first principle of our approach is to define a useful but restricted design space. Specifically we use decoupled-spatial accelerators, where memory accesses are decoupled from computation pipelines, and the underlying hardware network/storage/timing is exposed in the ISA. The second principle is to enable a rich accelerator design space by specifying architectures as a composition of simple primitives, including memories, processing elements, and network/synchronization components. An architecture instance can be represented as graph – the architecture description graph (ADG) – where each node is a hardware primitive. The ADG is an abstraction for the compiler (it is used to derive the ISA) as well as RTL generation.
DSAGEN Framework: This approach is embodied in our framework, DSAGEN, which is overviewed in Figure 2. DSAGEN targets C programs with custom, but application neutral pragmas. The compiler infrastructure uses Clang and LLVM as a frontend, and ultimately represents programs as a decoupled dataflow graph + memory streams. A low-level assembly-level interface is provided for ninja programmers. We include a custom spatial-architecture compiler and backend. The hardware design space includes many spatial architecture optimizations from prior works [1]–[4]. The compiler backend generates programs embedded in a RISCV ISA for control. DSAGEN supports multicore simulationi in gem5, and it uses Chisel for hardware generation.

Syllabus and Schedule

Fig. 2: The stack of DSAGEN

Introduction (20min): [slides]
  • The Decoupled-Spatial Programming Paradigm
  • The Principle of Composing Hardware Primitives
  • DSAGEN: A Framework for Decoupled-Spatial Research
Basic Programming of DSAGEN (40min): [slides]
  • Vector Add
    • Hardware/Software Interface Overview
    • Writing a Dataflow Graph
    • Writing the Control Intrinsics
  • Vector Normalization
    • Signaled Accumulation
    • Concurrent DFG's
    • Additional Control Intrinsics
5-Min Break
Advanced Programming of DSAGEN (50min): [slides] [videos] code: #CWsm3wS
  • An Introduction to Data-Dependent Specialization
  • Hands on Exercise: Sparse Dot-Product
  • Multicore Implementation for SpMSpV
Automated and Modular Compilation (20min): [slides]
  • Pragma-Hinted Compilation
  • The Compilation Pipeline
  • Modular Compilation
5-Min Break
Composing your own architecture (40min): [slides]
  • The Scala-embeded DSL for composing your own architecture
  • Integrate the spatial accelerator to Chipyard!
Hacking DSAGEN for your own research (60min): [slides]
  • Adding a New Instruction Capability to the PEs
  • Extending the RISCV ISA
Though I really really like this section, it is deleted for the sake of time. If you are interested, you can find reference in the slides.

Docker Usage

Related Papers

  1. V. Dadu and T. Nowatzki, “Towards general purpose acceleration by exploiting common data-dependence forms,” in 52nd MICRO, 2019.
  2. T. Nowatzki, V. Gangadhar, N. Ardalani, and K. Sankaralingam, “Streamdataflow acceleration,” in 44th ISCA, 2017
  3. J. Weng, S. Liu, V. Dadu, Z. Wang, and T. Nowatzki, “Dsagen: Synthesizing programmable spatial accelerators,” in ISCA, 2020.
  4. J. Weng, S. Liu, Z. Wang, V. Dadu, and T. Nowatzki, “A hybrid systolicdataflow architecture for inductive matrix algorithms,” in HPCA, 2020.