Re-configurable VLIW processor for streaming data

Size: px
Start display at page:

Download "Re-configurable VLIW processor for streaming data"

Transcription

1 International Workshop NGNT 97 Re-configurable VLIW processor for streaming data V. Iossifov Studiengang Technische Informatik, FB Ingenieurwissenschaften 1, FHTW Berlin. G. Megson School of Computer Science, Cybernetics and Electronic Engineering, University of Reading Abstract This paper describes the ISA-level design of one re-configurable VLIW processor for streaming data applications with alternating data width. Design of re-configurable data stream processor. Design of VLIW processor for the re-configurable approach. Data, control and address path design of the configurable VLIW. Generating the FPGA code - VLIW re-configurable procedure. Open problems and concluding remarks. Keywords Hardware Genetic Algorithm Research at RUCS, VLIW processor, the FPGA code, Streaming Data 1 The Re-configurable Computing Approach This paper describes the ISA-level design of one re-configurable VLIW processor for streaming data applications with alternating data width. This design is based on the original designs of Hardware Genetic Algorithm Research at RUCS, Reading [1], Free configurable RISC processor for streaming data applications with different data widths at FHTW Berlin [3], and the Freedom CPU Project [5] for the host CPU. 1.1 Programmable Processors The stored programme processor with ISA architecture is the basics of computer architectures for at least two reasons: It allowed non-permanent customisation and application development after fabrication. It reused the same active computing resources in time in order to support large computations on small amounts of processing hardware. To make these possible, architects continued to rely on large memories to economically hold task descriptions and intermediate data and small amounts of active processing which is heavily multiplexed to perform the actual computations. The efficiency of the architecture for different data formats tells us what the architecture can provide when the task requirements match the architectural assumptions. If the task requires the native manipulation of small data words on a large word machine, we will yield only a fraction of that peak. Fig.1. Spatial vs. Temporal Computation for the expression y = Ax 2 + Bx + C [2]. 1.2 Re-configurable devices Re-configurable devices can be configured after fabrication to solve any computational task. These devices are best exemplified today by FPGA. In these re-

2 International Workshop NGNT 98 configurable devices, tasks are implemented by spatially composing primitive operations and operators with the possibility of temporally changing the hardware of the operators rather then temporally composing of instructions sequences in Princeton style processors. The re-configurable processor on FPGA can perform different operations on each bit, so re-configurable devices can be optimised to the data width of streaming data flows. The central theme of this work is to mix the advantages of Non-von-Neumann architectures with the advantages of re-configurable processing elements. 2 Design of re-configurable data stream processor 2.1 Configurable general-purpose devices Configurable architectures can perform any of a number of different operations. Once the instruction has been "configured" into the device, it is not changed during a data stream of equal data type is continuing. Configuration context is the collection of FPGA control bits that describe the behaviour of a general-purpose computing device on one operation cycle of few instructions for a data stream with defined data width. One programming stream for a conventional FPGA containing instructions for every array element along with interconnect composes a "configuration context". Integer data streams with variable data appear in application such: Video & 3D software algorithms Video encoding/decoding that operate in blocks of data FIR filter algorithms that operate on stream of data The re-configurable VLIW processor to be developed, have to compute integer numbers of 8-, 16-, 32- and 64 bit data width by dedicated register files and ALU in parallel. The register files, internal busses and ALU are re-configurable to the data width required. 2.2 The re-configurable streaming data approach Streaming Data applications require maximum performance for architectures with a customised number of instructions. This paper [3] explores the possibility of enabling a partial customisability of the instruction set of VLIW processors for embedded Streaming Data applications, by exploiting FPGA technology. In particular the formal methodology presented in [4] is modified for the custom instruction sets used for Streaming Data algorithms to select the computational hot spots in it. The novelty of the proposed method is the customising of the method for analysing the Control Graph in [4] to given Streaming Data application with different data widths of the operands to be implemented via reconfigurable R-CPU on FPGA. A skeleton of the proposed design flow is depicted in [3], Figure 2. This development focuses first, according to [4] on the construction of a theoretical model and of a strategy to identify the Streaming Data customised operations to be implemented via re-configurable R-CPU with different data width. A new op-code denoted in [4] as the fpga-opcode is correspondingly generated and it replaces the relevant segment of computation in the translation from high level code into machine code. The new fpga-opcode is made available to the compiler as an extension of the machine instruction set and information such as latency of the fpga-opcode which must be known for scheduling is also given. With this target architecture the computational procedure becomes that of extracting from the application algorithm the segments of computation that are to be implemented as fpga-opcodes. This approach, proposed in [4], and re-designed in this paper identifies the Streaming Data instructions based on the Control Graph (CG) corresponding to the application, from which suitable sub-graphs for operations with the same data width are extracted. Analysing the CG of the application algorithm identifies the Streaming Data instructions to be mapped onto the parallel R-CPU. The aim is to identify sub-algorithms with Streaming Data instructions and the usefully mapping onto a dedicated R-CPU [3],

3 International Workshop NGNT 99 [4]. The Binary Input and Unary Output (BIUO) nodes of the CG have two inputs at most and fan-out equal to one. 2.3 Formal definition of a BIUO A formal definition of a BIUO sub Control Graph B i/j is as follows: Denote by G i =< V i,j ; E i/j > is a sub-graph where V i,j is the set of nodes in G i where i={0,1,2 input edges, j = {1 output edges and E i/j is the set of all edges in G i departing from such nodes. An edge e i/j E i/j is described by its source node (v I,j V i,j ) and its destination node v I,j V i,j and it is denoted by e i/j (k, l). If for all v I,j V I,j it is true that e i/j (k, l) E i/j ; v I,j V i,j. Then G is BIUO. Any node in V I,j may have incoming edges originating from nodes not belonging to V I,j. The above property can be used as the basis for an algorithm (described in [3]) that extracts Streaming Data operations nodes (BIUO) from all computational hot spot nodes in the CG. The upper bound on CG build of BIUO is a binary tree with all topological properties of the binary tree. If n are a number of operands, V i,j = n-1, E i/j = 2n BIUO nodes extracting lemma Lemma 1 in [4] has to be converted for BIUO nodes as: All BIUOs in the CG are either BIUO or contained in a BIUO. The proof is immediate. In the following the algorithm for the identification of all BIUO in a CG in [4] is modified for BIUO operations and the re-configurable PU to be generated for this operations: { Node Nodes_to_be analysed do { { Generate BIUO(Node) Nodes_to_be_analysed - = Nodes_in_BIUO Generate_BIUO_nodes (Node) { for (node_index=number_of_nodes, node_index > 0; node_index --) if (fan-out==0&&fan-in==0) { Generate_fixed_PU_Node else if (fan-out==1&&fan-in==1) { Generate_BIUO_PU_Node else if (fan-out==1&&fan-in==2) { Generate_BIUO_PU_Node else Generate_fixed_PU_Node Fig. 2. Pseudo-code for the generation of all BIUO within the CG. The algorithm operates in two steps: first, a node is chosen to be the exit node, then the program activates a function which builds the BIUO related to such exit node. Exit nodes are chosen upwards, i.e. starting from the exits of the CG. Initially, the set of Nodes_to_be_analysed coincides with the set of nodes of the CG. When a BIUO has been generated, its nodes are removed from the Nodes to be analysed set. The function Generate BIUO starts from the chosen exit_node and recursively tries to include its parents in the BIUO being generated. Recursion ends when the encountered node is nonlegal (e.g. it is a non-streaming Data instruction) or has a non re-convergent fan-out. The proposed algorithm shows a complexity linear with the number of nodes in the examined CG as the algorithm proposed in [4].

4 International Workshop NGNT Design of VLIW processor for the re-configurable approach 3.1 Re-configurable RISC CPU for variable data widths - the calculator The re-configurable CPU core is a two-address machine with RISC ISA architecture and orthogonal GPR register file. Address bus width of 16 bit Data busses width of 8-, 16-, 32- and 64 bit for the different units (ALU, GPR) 3.2 Re-configurable Systolic array - the data width sorter The re-configurable Systolic array - the data width sorter is based on the hardware research in [6->1]. The research in Generic Algorithms (GA) is centred on the development of a novel design which uses systolic arrays. The generic concept is extended by exploiting the pipeline principle to design a device that is independent of the lengths of the chromosomes being used in a particular problem. The systolic arrays themselves are easily scalable to implement different population sizes. Prototype systolic array cells have been designed and targeted to the Xilinx XC4000 FPGA [1]. 3.3 Re-configurable VLIW-CPU instruction set and format The first task designing the instruction set is to discuss the instruction to join the instruction set for the data stream approach in order to ensure ISA and EXO compatibility of the processor. Each VLIW instruction has 8 major fields: The Systolic sorter fields controls the systolic operation ALU and the global LOAD/STORE operations via crossbar. The information on the streaming data type sorted on every data output of the systolic sorter is coded as output in the FPGA Condition Code Registers of the systolic sorter. The R-CPUa, R-CPUb, R-CPUc and R-CPUd fields control the four R-CPU s function. The R-CPU is a two-address machine. The FPU_memory and FPU_control fields controls the 32 bit RISC Fixed Procesor Unit (FPU) in performing LOAD/STORE and/or control oprerations [5]. The FPGA-code contains the FPGA-SRAM images of the RPU and systolic units. The VLIW control code in [3] Consider, for example, the following instruction format: size : 32 8, 8 free 16/24 16/24 16/24 16/24 8 6/8 bits : function: F-CPU Systolc sorter R-CPU R-CPU R-CPU R-CPU FPGA code VLIW control Fig. 3. The VLIW-CPU instruction format. 4 Data, control and address path design of the configurable VLIW The VLIW core implements the host function for the systolic sorter and the four reconfigurable R-CPU calculators. Furthermore, the VLIW core executes all ALU, control and LOAD/STORE instructions in the program, there are not streaming data instructions. The task of the VLIW core is to synchronise as Out-of-Order the operations of the R-CPU and the systolic sorter, to execute the FPGA-code to reconfigure the R-CPUs and to invoice the LOAD/STORE operations for the systolic sorter (Fig 4.). The crossbar between the R-CPU data registers, the main memory, and the execution units is a central part of the VLIW architecture. The R-CPU data register set is read-only through this device which virtually provides it with than four ports. The crossbar extends the R-CPU data register set's read ports, making four "vertical" buses for all R-CPU and each bus is connected to one of the input ports of the Dual-port-memory with "horizontal" buses. It also performs some width formatting (byte, word, etc). Accessing a R-CPU data register takes two cycles

5 International Workshop NGNT 101 from the time the register number has been decoded: one cycle for the register set and another for the crossbar. Fig. 4.The VLIW-CPU architecture. 5 Generating the FPGA code - VLIW re-configurable procedure The task of the systolic sorter is to generate a condition code for the different data widths as the result of sorting the streaming data. The compiler prior to execution of the application code drives reconfigurations of the FPGA, or possibly at the beginning of every section of code that requires reconfiguration. Some systolic sorter driven procedure designs for activating the fpga-code in the FPU are discussed in [3]. 6 Open problems and concluding remarks This paper presents the ISA level behavioural design of an "Re-configurable VLIW processor for data streams with variable word width". The topics below are open problems - behavioural description of the systolic array sorter, the data RAM, the VLIW crossbar, of the re-configurable data busses in the VLIW 7 References [1] Bland I.M., Megson, G.M., The systolic array genetic algorithm, an example of systolic arrays as a reconfigurable design methodology, Proc 6th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM98), IEEE Computer Society. ISBN , August [2] DeHon, Andre, Re-configurable Architectures for General-Purpose Computing, A.I. Technical Report No. 1586, M.I.T. Artifical Intelligence Lab., Oct [3] Iossifov, V., Megson, G.M., Re-configurable VLIW processor for data streams with variable word width, Technical report RUCS, University of Reading, July [4] Pozzi, L., Methodolgies for design of Application-Specific Re-configurable VLIW Processors, PhD Thesis, Politecnico di Milano, Dip. di Elettronica e Informazione, Jan [5] Freedom CPU Project F-CPU: [6] What Is Re-configurable Computing?

Reconfigurable Computing. Introduction

Reconfigurable Computing. Introduction Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory.

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

Computer Architecture 2/26/01 Lecture #

Computer Architecture 2/26/01 Lecture # Computer Architecture 2/26/01 Lecture #9 16.070 On a previous lecture, we discussed the software development process and in particular, the development of a software architecture Recall the output of the

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Computer Architecture

Computer Architecture Computer Architecture Topics: Machine Organization Machine Cycle Program Execution Machine Language Types of Memory & Access Von Neumann Design 1) Two key ideas 1) The stored program concept 1) instructions

More information

Integrating MRPSOC with multigrain parallelism for improvement of performance

Integrating MRPSOC with multigrain parallelism for improvement of performance Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

Processor Design. Introduction, part I

Processor Design. Introduction, part I Processor Design Introduction, part I Professor Jari Nurmi Institute of Digital and Computer Systems Tampere University of Technology, Finland email jari.nurmi@tut.fi Background Some trends in digital

More information

Basic Computer Architecture

Basic Computer Architecture Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I

More information

Novel Design of Dual Core RISC Architecture Implementation

Novel Design of Dual Core RISC Architecture Implementation Journal From the SelectedWorks of Kirat Pal Singh Spring May 18, 2015 Novel Design of Dual Core RISC Architecture Implementation Akshatha Rai K, VTU University, MITE, Moodbidri, Karnataka Basavaraj H J,

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination 1 Student name: Date: June 26, 2008 General requirements for the exam: 1. This is CLOSED BOOK examination; 2. No questions allowed within the examination period; 3. If something is not clear in question

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 3: von Neumann Architecture von Neumann Architecture Our goal: understand the basics of von Neumann architecture, including memory, control unit

More information

R.W. Hartenstein, et al.: A Reconfigurable Arithmetic Datapath Architecture; GI/ITG-Workshop, Schloß Dagstuhl, Bericht 303, pp.

R.W. Hartenstein, et al.: A Reconfigurable Arithmetic Datapath Architecture; GI/ITG-Workshop, Schloß Dagstuhl, Bericht 303, pp. # Algorithms Operations # of DPUs Time Steps per Operation Performance 1 1024 Fast Fourier Transformation *,, - 10 16. 10240 20 ms 2 FIR filter, n th order *, 2(n1) 15 1800 ns/data word 3 FIR filter, n

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

Efficient Self-Reconfigurable Implementations Using On-Chip Memory 10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University

More information

CS 24: INTRODUCTION TO. Spring 2018 Lecture 3 COMPUTING SYSTEMS

CS 24: INTRODUCTION TO. Spring 2018 Lecture 3 COMPUTING SYSTEMS CS 24: INTRODUCTION TO Spring 2018 Lecture 3 COMPUTING SYSTEMS LAST TIME Basic components of processors: Buses, multiplexers, demultiplexers Arithmetic/Logic Unit (ALU) Addressable memory Assembled components

More information

EC-801 Advanced Computer Architecture

EC-801 Advanced Computer Architecture EC-801 Advanced Computer Architecture Lecture 5 Instruction Set Architecture I Dr Hashim Ali Fall 2018 Department of Computer Science and Engineering HITEC University Taxila!1 Instruction Set Architecture

More information

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines Announcement Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Seung-Jong Park (Jay) http://wwwcsclsuedu/~sjpark 1 2 Chapter 9 Objectives 91 Introduction Learn the properties that often distinguish

More information

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor Designing the datapath Chapter 4 The Processor Designing the datapath Introduction CPU performance determined by Instruction Count Clock Cycles per Instruction (CPI) and Cycle time Determined by Instruction Set Architecure (ISA)

More information

Why Study Assembly Language?

Why Study Assembly Language? Why Study Assembly Language? This depends on the decade in which you studied assembly language. 1940 s You cannot study assembly language. It does not exist yet. 1950 s You study assembly language because,

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Processor design - MIPS

Processor design - MIPS EASY Processor design - MIPS Q.1 What happens when a register is loaded? 1. The bits of the register are set to all ones. 2. The bit pattern in the register is copied to a location in memory. 3. A bit

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

COMPUTER STRUCTURE AND ORGANIZATION

COMPUTER STRUCTURE AND ORGANIZATION COMPUTER STRUCTURE AND ORGANIZATION Course titular: DUMITRAŞCU Eugen Chapter 4 COMPUTER ORGANIZATION FUNDAMENTAL CONCEPTS CONTENT The scheme of 5 units von Neumann principles Functioning of a von Neumann

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Layered View of the Computer http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Assembly/Machine Programmer View

More information

RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA

RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA 1 HESHAM ALOBAISI, 2 SAIM MOHAMMED, 3 MOHAMMAD AWEDH 1,2,3 Department of Electrical and Computer Engineering, King Abdulaziz University

More information

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

Chapter One. Introduction to Computer System

Chapter One. Introduction to Computer System Principles of Programming-I / 131101 Prepared by: Dr. Bahjat Qazzaz -------------------------------------------------------------------------------------------- Chapter One Introduction to Computer System

More information

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the

More information

I ve been getting this a lot lately So, what are you teaching this term? Computer Organization. Do you mean, like keeping your computer in place?

I ve been getting this a lot lately So, what are you teaching this term? Computer Organization. Do you mean, like keeping your computer in place? I ve been getting this a lot lately So, what are you teaching this term? Computer Organization. Do you mean, like keeping your computer in place? here s the monitor, here goes the CPU, Do you need a class

More information

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design 1 TDT4255 Computer Design Lecture 4 Magnus Jahre 2 Outline Chapter 4.1 to 4.4 A Multi-cycle Processor Appendix D 3 Chapter 4 The Processor Acknowledgement: Slides are adapted from Morgan Kaufmann companion

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng. CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 3: von Neumann Architecture von Neumann Architecture Our goal: understand the basics of von Neumann architecture, including memory, control unit

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

Parallel Solutions of the Longest Increasing Subsequence Problem Using Pipelined Optical Bus Systems

Parallel Solutions of the Longest Increasing Subsequence Problem Using Pipelined Optical Bus Systems Parallel Solutions of the Longest Increasing Subsequence Problem Using Pipelined Optical Bus Systems David SEME and Sidney YOULOU LaRIA, Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf,

More information

SAE5C Computer Organization and Architecture. Unit : I - V

SAE5C Computer Organization and Architecture. Unit : I - V SAE5C Computer Organization and Architecture Unit : I - V UNIT-I Evolution of Pentium and Power PC Evolution of Computer Components functions Interconnection Bus Basics of PCI Memory:Characteristics,Hierarchy

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 3

ECE 571 Advanced Microprocessor-Based Design Lecture 3 ECE 571 Advanced Microprocessor-Based Design Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 January 2018 Homework #1 was posted Announcements 1 Microprocessors Also

More information

Cpu Architectures Using Fixed Length Instruction Formats

Cpu Architectures Using Fixed Length Instruction Formats Cpu Architectures Using Fixed Length Instruction Formats Fixed-length instructions (RISC's). + allow easy fetch Load-store architectures. can do: add r1=r2+r3 What would be a good thing about having many

More information

Instruction Set Overview

Instruction Set Overview MicroBlaze Instruction Set Overview ECE 3534 Part 1 1 The Facts MicroBlaze Soft-core Processor Highly Configurable 32-bit Architecture Master Component for Creating a MicroController Thirty-two 32-bit

More information

Team 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One

Team 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One CO200-Computer Organization and Architecture - Assignment One Note: A team may contain not more than 2 members. Format the assignment solutions in a L A TEX document. E-mail the assignment solutions PDF

More information

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION Radu Balaban Computer Science student, Technical University of Cluj Napoca, Romania horizon3d@yahoo.com Horea Hopârtean Computer Science student,

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 3 Fundamentals in Computer Architecture Computer Architecture Part 3 page 1 of 55 Prof. Dr. Uwe Brinkschulte,

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Computer Architecture

Computer Architecture Computer Architecture Context and Motivation To better understand a software system, it is mandatory understand two elements: - The computer as a basic building block for the application - The operating

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

A Scalable Multiprocessor for Real-time Signal Processing

A Scalable Multiprocessor for Real-time Signal Processing A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch

More information

Incremental Reconfiguration for Pipelined Applications

Incremental Reconfiguration for Pipelined Applications Incremental Reconfiguration for Pipelined Applications Herman Schmit Dept. of ECE, Carnegie Mellon University Pittsburgh, PA 15213 Abstract This paper examines the implementation of pipelined applications

More information

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture. Speaking with the computer Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design

More information

Computer Systems Organization

Computer Systems Organization The IAS (von Neumann) Machine Computer Systems Organization Input Output Equipment Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions

More information

Copyright 2007 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol.

Copyright 2007 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol. Copyright 2007 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol. 6937, 69370N, DOI: http://dx.doi.org/10.1117/12.784572 ) and is made

More information

Blog -

Blog - . Instruction Codes Every different processor type has its own design (different registers, buses, microoperations, machine instructions, etc) Modern processor is a very complex device It contains Many

More information

Computer Architecture

Computer Architecture Computer Architecture Computer Architecture Hardware INFO 2603 Platform Technologies Week 1: 04-Sept-2018 Computer architecture refers to the overall design of the physical parts of a computer. It examines:

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2014 Sources: Computer

More information

An Instruction Stream Compression Technique 1

An Instruction Stream Compression Technique 1 An Instruction Stream Compression Technique 1 Peter L. Bird Trevor N. Mudge EECS Department University of Michigan {pbird,tnm}@eecs.umich.edu Abstract The performance of instruction memory is a critical

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

EE 3170 Microcontroller Applications

EE 3170 Microcontroller Applications EE 3170 Microcontroller Applications Lecture 4 : Processors, Computers, and Controllers - 1.2 (reading assignment), 1.3-1.5 Based on slides for ECE3170 by Profs. Kieckhafer, Davis, Tan, and Cischke Outline

More information

The Von Neumann Architecture. Designing Computers. The Von Neumann Architecture. CMPUT101 Introduction to Computing - Spring 2001

The Von Neumann Architecture. Designing Computers. The Von Neumann Architecture. CMPUT101 Introduction to Computing - Spring 2001 The Von Neumann Architecture Chapter 5.1-5.2 Von Neumann Architecture Designing Computers All computers more or less based on the same basic design, the Von Neumann Architecture! CMPUT101 Introduction

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations Chapter 4 The Processor Part I Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations

More information

ETH, Design of Digital Circuits, SS17 Review Session Questions I

ETH, Design of Digital Circuits, SS17 Review Session Questions I ETH, Design of Digital Circuits, SS17 Review Session Questions I Instructors: Prof. Onur Mutlu, Prof. Srdjan Capkun TAs: Jeremie Kim, Minesh Patel, Hasan Hassan, Arash Tavakkol, Der-Yeuan Yu, Francois

More information

CMPUT101 Introduction to Computing - Summer 2002

CMPUT101 Introduction to Computing - Summer 2002 7KH9RQ1HXPDQQ$UFKLWHFWXUH Chapter 5.1-5.2 Von Neumann Architecture 'HVLJQLQJ&RPSXWHUV All computers more or less based on the same basic design, the Von Neumann Architecture! CMPUT101 Introduction to Computing

More information

Designing Computers. The Von Neumann Architecture. The Von Neumann Architecture. The Von Neumann Architecture

Designing Computers. The Von Neumann Architecture. The Von Neumann Architecture. The Von Neumann Architecture Chapter 5.1-5.2 Designing Computers All computers more or less based on the same basic design, the Von Neumann Architecture! Von Neumann Architecture CMPUT101 Introduction to Computing (c) Yngvi Bjornsson

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

Module 2: Introduction to AVR ATmega 32 Architecture

Module 2: Introduction to AVR ATmega 32 Architecture Module 2: Introduction to AVR ATmega 32 Architecture Definition of computer architecture processor operation CISC vs RISC von Neumann vs Harvard architecture AVR introduction AVR architecture Architecture

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Computer Organization

Computer Organization INF 101 Fundamental Information Technology Computer Organization Assistant Prof. Dr. Turgay ĐBRĐKÇĐ Course slides are adapted from slides provided by Addison-Wesley Computing Fundamentals of Information

More information

Major Advances (continued)

Major Advances (continued) CSCI 4717/5717 Computer Architecture Topic: RISC Processors Reading: Stallings, Chapter 13 Major Advances A number of advances have occurred since the von Neumann architecture was proposed: Family concept

More information

Network-on-Chip Micro-Benchmarks

Network-on-Chip Micro-Benchmarks Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Guoping Qiu School of Computer Science The University of Nottingham http://www.cs.nott.ac.uk/~qiu 1 The World of Computers Computers are everywhere Cell phones Game consoles

More information

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD

Pipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose

More information

Design of memory efficient FIFO-based merge sorter

Design of memory efficient FIFO-based merge sorter LETTER IEICE Electronics Express, Vol.15, No.5, 1 11 Design of memory efficient FIFO-based merge sorter Youngil Kim a), Seungdo Choi, and Yong Ho Song Department of Electronics and Computer Engineering,

More information

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1 ESE534: Computer Organization Day 23: November 21, 2016 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. March, 2015 Tabula closed doors 1 [src: www.tabula.com]

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

Introduction to reconfigurable systems

Introduction to reconfigurable systems Introduction to reconfigurable systems Reconfigurable system (RS)= any system whose sub-system configurations can be changed or modified after fabrication Reconfigurable computing (RC) is commonly used

More information

A Streaming Multi-Threaded Model

A Streaming Multi-Threaded Model A Streaming Multi-Threaded Model Extended Abstract Eylon Caspi, André DeHon, John Wawrzynek September 30, 2001 Summary. We present SCORE, a multi-threaded model that relies on streams to expose thread

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

CC312: Computer Organization

CC312: Computer Organization CC312: Computer Organization Dr. Ahmed Abou EL-Farag Dr. Marwa El-Shenawy 1 Chapter 4 MARIE: An Introduction to a Simple Computer Chapter 4 Objectives Learn the components common to every modern computer

More information