Hardware Oriented Security

Similar documents
IMPLICIT+EXPLICIT Architecture

Introduction to Microprocessor

Hardware/Software Co-Design

An Implementation Comparison of an IDEA Encryption Cryptosystem on Two General-Purpose Reconfigurable Computers

The Nios II Family of Configurable Soft-core Processors

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Performance and Overhead in a Hybrid Reconfigurable Computer

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer

MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department

ECE332, Week 2, Lecture 3. September 5, 2007

ECE332, Week 2, Lecture 3

Chapter 2 Lecture 1 Computer Systems Organization

MARIE: An Introduction to a Simple Computer

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 04: Machine Instructions

Computed Tomography (CT) Scan Image Reconstruction on the SRC-7 David Pointer SRC Computers, Inc.

Cycle Time for Non-pipelined & Pipelined processors

XPU A Programmable FPGA Accelerator for Diverse Workloads

Segment 1A. Introduction to Microcomputer and Microprocessor

MA Unit 4. Question Option A Option B Option C Option D

Embedded Computing Platform. Architecture and Instruction Set

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

Reconfigurable Hardware Implementation of Mesh Routing in the Number Field Sieve Factorization

CPE300: Digital System Architecture and Design

Qsys and IP Core Integration

Caches 3/23/17. Agenda. The Dataflow Model (of a Computer)

Caches. Samira Khan March 23, 2017

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Homeschool Enrichment. The System Unit: Processing & Memory

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ECE 485/585 Microprocessor System Design

Caches. Hiding Memory Access Times

Multiple Issue ILP Processors. Summary of discussions

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

FPGA architecture and design technology

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

Overview of the MIPS Architecture: Part I. CS 161: Lecture 0 1/24/17

CS 101, Mock Computer Architecture

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

SYSTEM BUS AND MOCROPROCESSORS HISTORY

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Tools for Reconfigurable Supercomputing. Kris Gaj George Mason University

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Chapter 9: A Closer Look at System Hardware

Chapter 9: A Closer Look at System Hardware 4

ECE 341. Lecture # 15

ECE 471 Embedded Systems Lecture 2

Computer Architecture. Fall Dongkun Shin, SKKU

Heterogeneous Processing

Lecture 4: RISC Computers

x86 Architectures; Assembly Language Basics of Assembly language for the x86 and x86_64 architectures

The Gap Between the Virtual Machine and the Real Machine. Charles Forgy Production Systems Tech

The QR code here provides a shortcut to go to the course webpage.

Kampala August, Agner Fog

Universität Dortmund. ARM Architecture

BEng (Hons.) Telecommunications. BSc (Hons.) Computer Science with Network Security

5 Computer Organization

Chapter Seven Morgan Kaufmann Publishers

Lab Determining Data Storage Capacity

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

CS 3330 Exam 3 Fall 2017 Computing ID:

Fundamentals of Programming Session 1

Familiarity with data types, data structures, as well as standard program design, development, and debugging techniques.

Implementation of Elliptic Curve Cryptosystems over GF(2 n ) in Optimal Normal Basis on a Reconfigurable Computer

7/28/ Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc Prentice-Hall, Inc.

Real instruction set architectures. Part 2: a representative sample

Basic Concepts COE 205. Computer Organization and Assembly Language Dr. Aiman El-Maleh

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Chapter 2. Prepared By: Humeyra Saracoglu

The 9S12 in Expanded Mode - Using MSI logic to build ports Huang Chapter 14

Hardware Design I Chap. 10 Design of microprocessor

Chapter 4. MARIE: An Introduction to a Simple Computer

ECE 471 Embedded Systems Lecture 2

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Understand the factors involved in instruction set

Reduced Instruction Set Computers

How What When Why CSC3501 FALL07 CSC3501 FALL07. Louisiana State University 1- Introduction - 1. Louisiana State University 1- Introduction - 2

Single Instructions Can Execute Several Low Level

The S6000 Family of Processors

Computer Organization

Memory Systems IRAM. Principle of IRAM

Computer Organization

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

CSE : Introduction to Computer Architecture

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Technology in Action

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

The University of Adelaide, School of Computer Science 13 September 2018

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT-I

RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Chapter 4. Chapter 4 Objectives. MARIE: An Introduction to a Simple Computer

Transcription:

1 / 20 Hardware Oriented Security SRC-7 Programming Basics and Pipelining Miaoqing Huang University of Arkansas Fall 2014

2 / 20 Outline Basics of SRC-7 Programming Pipelining

3 / 20 Framework of Program Running on SRC-7 main.c map 1.mc map 2.mc map n.mc software hardware macro m macro n macro p macro q macro x macro y The hardware part of an application may be distributed into multiple bitstream Each bitstream is specified by a MAP function MAP function is written in high level language, i.e., MAP C Complicated operations can be implemented using hardware module Multiple modules can be instantiated in a single MAP file Data access to memory generally is implemented in MAP C

Basic Flow of MAP Function 7.2 GB/s 7.2 GB/s Global 4.2 Common Memory GB/s 1 GB User Logic 1 Altera Stratix II EP2S180 Controller Altera Stratix II EP2S130 12.8 GB/s 4.8 GB/s 256b 19.2 GB/s Global 4.2 Common Memory GB/s 1 GB User Logic 2 Altera Stratix II EP2S180 16 Banks of On-Board Memory (64 MB) Each MAP function is defined in a MAP C file All the code in MAP C file will be converted into hardware description language Do not support complicated data structure and programming models, such as recursive calls No operating system or run time support on MAP processor Users need to handle the data communication, data access, and data operations explicitly Basic flow: move data onto OBM process data move result back to the main memory Small piece of data can be stored on FPGA using Block RAM 4 / 20

Where are the data? 7.2 GB/s 7.2 GB/s SNAP Memory μp PCI-X Gig Ethernet etc. SNAP Memory μp PCI-X Chaining GPIO Global 4.2 Common Memory GB/s 1 GB User Logic 1 Altera Stratix II EP2S180 Controller Altera Stratix II EP2S130 12.8 GB/s 4.8 GB/s 256b 19.2 GB/s Global 4.2 Common Memory GB/s 1 GB User Logic 2 Altera Stratix II EP2S180 Disk Storage Area Network Local Area Network Wide Area Network 16 Banks of On-Board Memory (64 MB) Data can be stored in main memory (i.e., host memory), global common memory, and on-board memory (OBM) Memory systems are separated Data transfer between memories is explicit Global common memory is accessible to both microprocessor and FPGA Data transfer into and from the OBM has to be explicitly initiated by user logic On-board memory is the major venue for user logic to store data Implemented using SRAM Supporting pipelined data access with some limitations 5 / 20

6 / 20 More on MAP function #include <libmap.h> void poly (int n, long long dt_source[], long long dt_res[], int mapno) {... } The type of MAP function has to be void Use square bracket [] to define an array of data to be transferred The size of the data to be transferred is specified by the user explicitly Pointer is still allowed in the MAP function Pointer arithmetic is NOT allowed Scalar variables can be returned using pointers

7 / 20 More on MAP function #include <libmap.h> void poly (int n, long long dt_source[], long long dt_res[], int mapno) {... } The type of MAP function has to be void Use square bracket [] to define an array of data to be transferred The size of the data to be transferred is specified by the user explicitly Pointer is still allowed in the MAP function Pointer arithmetic is NOT allowed Scalar variables can be returned using pointers void poly (long long dt_source[], long long *tproc int mapno) {... *tproc = x - y; }

8 / 20 Outline Basics of SRC-7 Programming Pipelining

9 / 20 Pipelining A pipeline is a set of data processing elements connected in series, so that the output of one element is the input of the next one. Each element carries out one part of a whole complicated operation Pipelining is the commonest technique in hardware design to achieve high performance

10 / 20 Why we need pipelining? Improve the throughput Mechanic shop v.s. Car assembly line Mechanic shop The mechanic needs to do everything It takes hours to fix just one car Sometimes it takes days!!! Car assembly line Many workers work together Each worker just puts one or more components into the car One assembly line can produce hundreds or thousands of cars per day

11 / 20 Classic Five Stage RISC Pipeline Five stages 1. Instruction fetch: a 32-bit instruction was fetched from the cache 2. Decode: figure out what the function of the instruction 3. Execute: carry out the instruction 4. Memory Access: access memory in necessary Always check cache first if there is one 5. Writeback: write result into the register file

12 / 20 Superpipleline in Modern Microprocessor The instruction pipeline on Pentium 4 consists of 20 stages 20 instructions can be executed simultaneously!!! The latency of each stage is very short The processor can run very high frequency, e.g., 3 4 GHz

13 / 20 Superpipleline in Modern Microprocessor The instruction pipeline on Pentium 4 consists of 20 stages 20 instructions can be executed simultaneously!!! The latency of each stage is very short The processor can run very high frequency, e.g., 3 4 GHz So, we should be happy. But we are not. Why?

14 / 20 Superpipleline in Modern Microprocessor The instruction pipeline on Pentium 4 consists of 20 stages 20 instructions can be executed simultaneously!!! The latency of each stage is very short The processor can run very high frequency, e.g., 3 4 GHz So, we should be happy. But we are not. Why? Each instruction performs very basic operations E.g., addition, multiplication, bit shift A complicated operation may take thousands of instructions DES encryption, image processing operations Use hardware to design a very long pipeline that can accommodate one complicated operation

15 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs

16 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs

17 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs

18 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs

19 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs

20 / 20 Solve the Date Dependence in Pipeline Use shifter registers to save the unused inputs