Tutorial on Software-Hardware Codesign with CORDIC

Similar documents
Lab 4: Convolutional Neural Networks Due Friday, November 3, 2017, 11:59pm

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

Lab 1: CORDIC Design Due Friday, September 8, 2017, 11:59pm

The Design of Sobel Edge Extraction System on FPGA

PYNQ Radio Final Report Team Members: Harveen Kaur, Rajat Gupta, Rohit Kulkarni, Vishwesh Rege

TP : System on Chip (SoC) 1

SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator

ECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design

The QR code here provides a shortcut to go to the course webpage.

SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator

Exploring OpenCL Memory Throughput on the Zynq

Copyright 2014 Xilinx

Creating a Processor System Lab

Lab 1 - Zynq RTL Design Flow

Advanced module: Video en/decoder on Virtex 5

Santa Fe (MAXREFDES5#) MicroZed Quick Start Guide

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Analyzing the Generation and Optimization of an FPGA Accelerator using High Level Synthesis

Optimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015.

ECE 5745 Complex Digital ASIC Design, Spring 2017 Lab 2: Sorting Accelerator

CADENCE TUTORIAL. San Diego State University, Department of Electrical and Computer Engineering. Amith Dharwadkar and Ashkan Ashrafi

An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware

ESE532: System-on-a-Chip Architecture. Today. Message. Graph Cycles. Preclass 1. Reminder

Experiment 3. Digital Circuit Prototyping Using FPGAs

Estimating Accelerator Performance and Events

Building an Embedded Processor System on a Xilinx Zync FPGA (Profiling): A Tutorial

Creating a base Zynq design with Vivado IPI

Design of Digital Circuits

Introduction to Embedded System Design using Zynq

Vivado Design Suite User Guide

Description: Write VHDL code for full_adder.vhd with inputs from switches and outputs to LEDs.

Method We follow- How to Get Entry Pass in SEMICODUCTOR Industries for 3rd year engineering. Winter/Summer Training

Quick Start Guide ZedboardOLED Display Controller IP v1.0

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Vivado Design Suite User Guide

Lab 5. Using Fpro SoC with Hardware Accelerators Fast Sorting

Simplify System Complexity

ECEN 449: Microprocessor System Design Department of Electrical and Computer Engineering Texas A&M University. Laboratory Exercise #1 Using the Vivado

Lab 4: Audio Pipeline on the FPGA

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

System-On-Chip Design with the Leon CPU The SOCKS Hardware/Software Environment

ECE 2400 Computer Systems Programming, Fall 2018 PA2: List and Vector Data Structures

Zynq-7000 All Programmable SoC: Embedded Design Tutorial. A Hands-On Guide to Effective Embedded System Design

ECE 2400 Computer Systems Programming, Fall 2017 Programming Assignment 3: Sorting Algorithms

discrete logic do not

Simplify System Complexity

ISim Hardware Co-Simulation Tutorial: Accelerating Floating Point Fast Fourier Transform Simulation

Midterm Exam. Solutions

University of California, Davis Department of Electrical and Computer Engineering. EEC180B DIGITAL SYSTEMS Spring Quarter 2018

Adding Custom IP to the System

FPGA Implementation and Validation of the Asynchronous Array of simple Processors

Basic HLS Tutorial. using C++ language and Vivado Design Suite to design two frequencies PWM. modulator system

Lab Exercise 4 System on chip Implementation of a system on chip system on the Zynq

ECE 5775 High-Level Digital Design Automation, Fall 2016 School of Electrical and Computer Engineering, Cornell University

Design AXI Master IP using Vivado HLS tool

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Improving Area and Resource Utilization Lab

The SOCks Design Platform. Johannes Grad

Use Vivado to build an Embedded System

Simple VERILOG example using VIVADO 2015 with ZYBO FPGA board v 0.1

Vivado Design Suite Tutorial: High-Level Synthesis

CS CS Tutorial 2 2 Winter 2018

ECE 2400 Computer Systems Programming, Fall 2017 Programming Assignment 2: List and Vector Data Structures

LegUp HLS Tutorial for Microsemi PolarFire Sobel Filtering for Image Edge Detection

ZYBO Video Workshop. Paris, FRANCE

NEW FPGA DESIGN AND VERIFICATION TECHNIQUES MICHAL HUSEJKO IT-PES-ES

CE 435 Embedded Systems Spring 2018

Lab 3 Sequential Logic for Synthesis. FPGA Design Flow.

10/02/2015 Vivado Linux Basic System

ECE 4750 Computer Architecture, Fall 2017 Lab 1: Iterative Integer Multiplier

ESE566A Modern System-on-Chip Design, Spring 2017 ECE 566A Modern System-on-Chip Design, Spring 2017 Lab 3: Cache Design Due: Mar 24, 5:00 pm

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA

Hardware In The Loop (HIL) Simulation for the Zynq-7000 All Programmable SoC Author: Umang Parekh

EE 361L Digital Systems and Computer Design Laboratory

Performance Verification for ESL Design Methodology from AADL Models

Vivado Tutorial. Introduction. Objectives. Procedure

ELEC 204 Digital System Design LABORATORY MANUAL

This guide is used as an entry point into the Petalinux tool. This demo shows the following:

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Xilinx Vivado/SDK Tutorial

GigaX API for Zynq SoC

Using FPGAs as Microservices

Gisselquist Technology, LLC

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cover TBD. intel Quartus prime Design software

ELE409 SPRING2018 LAB0

Implementation of Hardware Accelerators on Zynq

CMPEN 331 Computer Organization and Design, Lab 4 Due Wednesday April 5, 2017 at 7:0 am (Drop box on Canvas)

CSE P567 - Winter 2010 Lab 1 Introduction to FGPA CAD Tools

Lab: Setting up PL-App with a Raspberry Pi

The guide to Xillybus Block Design Flow for non-hdl users

81920**slide. 1Developing the Accelerator Using HLS

The board is powered by the USB connection, so to turn it on or off you plug it in or unplug it, respectively.

Avnet Zynq Mini Module Plus Embedded Design

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

1. Introduction EE108A. Lab 1: Combinational Logic: Extension of the Tic Tac Toe Game

LED display manager documentation

Transcription:

ECE5775 High-Level Digital Design Automation, Fall 2017 School of Electrical Computer Engineering, Cornell University Tutorial on Software-Hardware Codesign with CORDIC 1 Introduction So far in ECE5775 you have worked with Vivado HLS to automatically compile C/C++ programs into hardware in the form of RTL. You would then check the quality of the generated RTL by examining the HLS reports produced by the tool and/or performing some kind of simulation. This means that up until now in the coursework, you have not actually executed anything on real hardware. Furthremore, you have not had to deal with how to interface a software program with a hardware accelerator. In this tutorial you will learn how to take a modified version of the CORDIC design from Lab 1 and implement a full-fledged software-hardware system. You will then test the system on Zedboard, which is an FPGA development board that features the Xilinx Zynq-7000 programmable system-on-chip (SoC) device. Specifically, the Zynq SoC integrates an ARM microprocessor and reconfigurable logics on a single chip. The goal of this tutorial is to prepare you for Lab 3, where you will be designing a similar system, except targeting a different application in digit recognition. In addition, some of you may choose to use the same system setup for the final project as well. 2 Materials You are given a zip file named cordic tutorial.zip, which contains the following directories: ecelinux: contains a Vivado HLS project to synthesize the CORDIC hardware module. This should be done on ecelinux. zedboard: contains a C++ project to build the CORDIC host program. The host program should be run on the Zedboard after programming the FPGA. 3 System Overview A diagram of the system used in this tutorial is shown in Figure 1. The system roughly consists of three parts: Host Program The C/C++ program running on the ARM processor which sends data to the FPGA and receives results. The host must ensure the data exchange is in a format consistent with the FPGA interface. Hardware Accelerator The hardware module implemented on the FPGA reconfigurable fabrics. 1

Input FIFO Hardware Accelerator Host Program Output FIFO FPGA ARM CPU Figure 1: Block diagram of the software-hardware system. Input & Output FIFOs Two 32-bit FIFOs are used as the (latency-insensitive) communication media between the hardware and software components. These two FIFOs are exposed as two Linux I/O devices on the host. For this tutorial, all three components have been provided for you. The application we are targeting is a modified version of CORDIC from Lab 1. The hardware platform is the Zedboard. Your task is to synthesize the accelerator, generate a bitstream, and use it to program the Zynq FPGA. Then you will execute the host program on the ARM processor to invoke the FPGA accelerator. Please carefully read through the explanations in Section 5 to understand the provided code. 4 Zedboard Environment and Login There are ten Zedboard units accessible from ecelinux via ssh. You must login to ecelinux before you login to the Zedboards even if you use the lab computers. Since each Zedboard runs Linux using the embedded ARM processor, you should be comfortable with the work environment once you are logged in. The hostnames of the boards are: zhang-zedboard-xx.ece.cornell.edu where xx is any number between 01 and 10. All students have an account on each Zedboard with their NetID as the username and the first word in their last name (first letter capitalized) as the initial password. For example, the password for Amy Xuan Wang is Wang. However, the user accounts and file systems are not linked. So you can choose to either stick to the same Zedboard or back up your files using git or scp before you log out. Be sure to ssh into each Zedboard to set your permanent password before using the board in any way. To ensure two students are not simultaneously attempting to program one FPGA (Linux reboot is required after each FPGA reconfiguration), we have restricted each Zedboard such that only ONE student can login at a time. Before you attempt to login, please first check the occupancy of the available boards using the following link: http://www.csl.cornell.edu/courses/ece5775/zedboard.html This also means that you cannot open multiple ssh sessions for the Zedboard and that you cannot scp to a Zedboard while logged into it from another terminal. If 2

you get strange error messages from running these commands please first check for this issue. When you log out from a Zedboard, it will be possible for another student to log in, at which point you will no longer be able to access your code. We heavily recommend you back up your files using version control (such as git) or by copying the files back to ecelinux every time before you log out. 5 Source Code Explanation This section provides additional comments on several important source files to help you develop a better understanding of the overall system setup. typedefs.h Most of the typedef statements here should be familiar to you from Lab 1. However, note that we changed the bitwidth of the fixed-point types to 40. This is to demonstrate how to transfer wide data to the FPGA. ecelinux/cordic.cpp First note that the name of the top-level function has been changed to dut, which stands for design under test. This is to make scripting the flow simpler. The arguments of dut are a pair of hls::stream<bit32 t> objects. These templated streams are part of Xilinx s hls stream library; they represent the input and output FIFOs in Figure 1 and provide read() and write() methods as seen inside the function. dut reads the input angle through strm in, calls the cordic function, and write the results to strm out. The <bit32 t> indicate the data type which is stored in the buffers. A 32-bit integer is chosen to simplify the CPU- FPGA interface, as such types are standard in software programming. However, this presents two problems: (1) our interface is only 32-bit wide while the input angle is 40 bits; (2) our input is in integer format (ap uint) while the input angle to cordic is of fixed-point type which contains fractional bits. This is resolved using the bit slicing methods provided by the ap int and ap fixed data types. We use these methods to assign the appropriate (raw) bits from input to theta type, and similarly, assign cos sin type to output. zedboard/host.cpp This file contains the program responsible for invoking the hardware module. The Zedboard system is set up so that the FIFO channels connecting the FPGA are exposed to the Linux OS as two device files, which can be can be written to and read from like any regular files. /dev/xillybus_write_32 /dev/xillybus_read_32 We begin by opening the two files. To access them we simply use the C functions write and read, which take in a file handle, a pointer to the data, number of bytes to send/receive, and returns the number of bytes successfully communicated. We again use bit assignment to convert the input angle from theta type to a 64-bit integer (int64 t. Unlike in cordic.cpp, we can send the 64-bit value with a single function call instead of two seperate transactions. This is because the write and read functions actually breaks up the data for you and transfers one byte at a time. In host.cpp we send one angle at a time, wait for the hardware module to produce a result, and read the result before sending the next angle. This is actually very inefficient in our 3

system setup the communication channel between the ARM processor and the FPGA has a long latency but high throughput. Because we send only one piece of data each time and wait for the result, we incur this latency for every angle. zedboard/host batch.cpp This file provides a much more efficient implementation of the host program. The only difference is that we first write all 180 angles to the input FIFO of the CORDIC hardware module, then read all 180 sets of results. This allows us to take advantage of the high throughput of the processor-fpga communication channel and greatly reduce the runtime. Both host.cpp and host batch.cpp use a timer object to measure the runtime of invoking the hardware module (including data transfer). The timer is straightfoward to use, and will print the total time recorded when the program ends. This allows us to see the difference in execution time between the two host programs. 6 Tutorial Instructions 6.1 Generating the Bitstream As in previous labs, you will begin by synthesizing the C++ source files into Verilog using the Vivado HLS tool. On ecelinux do: unzip cordic_tutorial.zip cd cordic_tutorial/ecelinux source /classes/ece5775/setup-ece5775.sh make vivado_hls run.tcl # unzip the archive # setup env # builds and runs the csim # generate Verilog for CORDIC Compiling the csim is optional, but it will allow you to verify that the FPGA is producing the correct results. You can examine the directory cordic.prj/solution1/syn/verilog to check that the Verilog files have been generated. The next step is to implement the hardware circuit described by the Verilog on an the FPGA. This is done by Vivado (not Vivado HLS), and is usually quite complicated. Fortunately, we provide a script which automates this process. source run_bitstream.sh # generate FPGA bitstream Vivado will first map the circuit logic to components like wires, gates, and registers. It will then perform place and route, which places each component into a physical look-up table (LUT), and makes connections between LUTs using the FPGA s programmable interconnect. Finally, Vivado generates the bitstream this file contains the configuration bits which are sent to the LUTs and switches on the FPGA in order to (re)configure the FPGA to become the CORDIC circuit. Due to how complex this process is, this command may take upwards of 15 minutes. When it finishes, it will copy the bitstream file xillydemo.bit to the current directory. 4

6.2 Programming the FPGA To program the FPGA, we first copy the bitstream file onto the Zedboard. Then we will mount the SD card on the board (exposing it to Linux as a writeable directory) and move the bitstream file onto the card. When the Zedboard boots up, the it will read the bitstream from the SD card and reconfigure the FPGA. # Copy the bitstream from ecelinux to Zedboard scp xillydemo.bit <user>@zhang-zedboard-xx.ece.cornell.edu:~ # Login to the Zedboard ssh <user>@zhang-zedboard-0x.ece.cornell.edu # Copy the bitstream file to the SD card and reboot the Zedboard mount /mnt/sd cp xillydemo.bit /mnt/sd sudo reboot The Zedboard will restart. Wait about 30 seconds and login again the FPGA will be fully programmed after the reboot. If you encounter any issues with scp or ssh, please check the status of the Zedboard and try another Zedboard. If you encounter any issue copying the bitstream to the SD card, please first reboot the board and attempt the commands again. 6.3 Running The CORDIC Example Use scp to copy cordic tutorial.zip to the Zedboard, then do: unzip cordic_tutorial.zip cd cordic_tutorial/zedboard make./cordic-fpga make fpga_batch./cordic-fpga-batch # unzip the archive # build FPGA host program # run the baseline host # build FPGA host program (batch mode) # run the batched host You should see the same accumulated error as when you ran the csim on ecelinux. Furthermore, host batch should run much faster than host. 7 Acknowledgement The baseline FPGA+Linux setup used in tutorial is based on the Xillinux distribution provided by Xillybus Ltd. (http://xillybus.com/xillinux) References [1] Xilinx Inc., Introduction to FPGA Design with Vivado High-Level Synthesis, 2013. 5