1 / 20 Hardware Oriented Security SRC-7 Programming Basics and Pipelining Miaoqing Huang University of Arkansas Fall 2014
2 / 20 Outline Basics of SRC-7 Programming Pipelining
3 / 20 Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc software hardware macro m macro n macro p macro q macro x macro y The hardware part of an application may be distributed into multiple bitstream Each bitstream is specified by a MAP function MAP function is written in high level language, i.e., MAP C Complicated operations can be implemented using hardware module Multiple modules can be instantiated in a single MAP file Data access to memory generally is implemented in MAP C
Basic Flow of MAP Function 7.2 GB/s 7.2 GB/s Global 4.2 Common Memory GB/s 1 GB User Logic 1 Altera Stratix II EP2S180 Controller Altera Stratix II EP2S130 12.8 GB/s 4.8 GB/s 256b 19.2 GB/s Global 4.2 Common Memory GB/s 1 GB User Logic 2 Altera Stratix II EP2S180 16 Banks of On-Board Memory (64 MB) Each MAP function is defined in a MAP C file All the code in MAP C file will be converted into hardware description language Do not support complicated data structure and programming models, such as recursive calls No operating system or run time support on MAP processor Users need to handle the data communication, data access, and data operations explicitly Basic flow: move data onto OBM process data move result back to the main memory Small piece of data can be stored on FPGA using Block RAM 4 / 20
Where are the data? 7.2 GB/s 7.2 GB/s SNAP Memory μp PCI-X Gig Ethernet etc. SNAP Memory μp PCI-X Chaining GPIO Global 4.2 Common Memory GB/s 1 GB User Logic 1 Altera Stratix II EP2S180 Controller Altera Stratix II EP2S130 12.8 GB/s 4.8 GB/s 256b 19.2 GB/s Global 4.2 Common Memory GB/s 1 GB User Logic 2 Altera Stratix II EP2S180 Disk Storage Area Network Local Area Network Wide Area Network 16 Banks of On-Board Memory (64 MB) Data can be stored in main memory (i.e., host memory), global common memory, and on-board memory (OBM) Memory systems are separated Data transfer between memories is explicit Global common memory is accessible to both microprocessor and FPGA Data transfer into and from the OBM has to be explicitly initiated by user logic On-board memory is the major venue for user logic to store data Implemented using SRAM Supporting pipelined data access with some limitations 5 / 20
6 / 20 More on MAP function #include <libmap.h> void poly (int n, long long dt_source[], long long dt_res[], int mapno) {... } The type of MAP function has to be void Use square bracket [] to define an array of data to be transferred The size of the data to be transferred is specified by the user explicitly Pointer is still allowed in the MAP function Pointer arithmetic is NOT allowed Scalar variables can be returned using pointers
7 / 20 More on MAP function #include <libmap.h> void poly (int n, long long dt_source[], long long dt_res[], int mapno) {... } The type of MAP function has to be void Use square bracket [] to define an array of data to be transferred The size of the data to be transferred is specified by the user explicitly Pointer is still allowed in the MAP function Pointer arithmetic is NOT allowed Scalar variables can be returned using pointers void poly (long long dt_source[], long long *tproc int mapno) {... *tproc = x - y; }
8 / 20 Outline Basics of SRC-7 Programming Pipelining
9 / 20 Pipelining A pipeline is a set of data processing elements connected in series, so that the output of one element is the input of the next one. Each element carries out one part of a whole complicated operation Pipelining is the commonest technique in hardware design to achieve high performance
10 / 20 Why we need pipelining? Improve the throughput Mechanic shop v.s. Car assembly line Mechanic shop The mechanic needs to do everything It takes hours to fix just one car Sometimes it takes days!!! Car assembly line Many workers work together Each worker just puts one or more components into the car One assembly line can produce hundreds or thousands of cars per day
11 / 20 Classic Five Stage RISC Pipeline Five stages 1. Instruction fetch: a 32-bit instruction was fetched from the cache 2. Decode: figure out what the function of the instruction 3. Execute: carry out the instruction 4. Memory Access: access memory in necessary Always check cache first if there is one 5. Writeback: write result into the register file
12 / 20 Superpipleline in Modern Microprocessor The instruction pipeline on Pentium 4 consists of 20 stages 20 instructions can be executed simultaneously!!! The latency of each stage is very short The processor can run very high frequency, e.g., 3 4 GHz
13 / 20 Superpipleline in Modern Microprocessor The instruction pipeline on Pentium 4 consists of 20 stages 20 instructions can be executed simultaneously!!! The latency of each stage is very short The processor can run very high frequency, e.g., 3 4 GHz So, we should be happy. But we are not. Why?
14 / 20 Superpipleline in Modern Microprocessor The instruction pipeline on Pentium 4 consists of 20 stages 20 instructions can be executed simultaneously!!! The latency of each stage is very short The processor can run very high frequency, e.g., 3 4 GHz So, we should be happy. But we are not. Why? Each instruction performs very basic operations E.g., addition, multiplication, bit shift A complicated operation may take thousands of instructions DES encryption, image processing operations Use hardware to design a very long pipeline that can accommodate one complicated operation
15 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs
16 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs
17 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs
18 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs
19 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs
20 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs