Re-configurable VLIW processor for streaming data
|
|
- Allyson Lyons
- 5 years ago
- Views:
Transcription
1 International Workshop NGNT 97 Re-configurable VLIW processor for streaming data V. Iossifov Studiengang Technische Informatik, FB Ingenieurwissenschaften 1, FHTW Berlin. G. Megson School of Computer Science, Cybernetics and Electronic Engineering, University of Reading Abstract This paper describes the ISA-level design of one re-configurable VLIW processor for streaming data applications with alternating data width. Design of re-configurable data stream processor. Design of VLIW processor for the re-configurable approach. Data, control and address path design of the configurable VLIW. Generating the FPGA code - VLIW re-configurable procedure. Open problems and concluding remarks. Keywords Hardware Genetic Algorithm Research at RUCS, VLIW processor, the FPGA code, Streaming Data 1 The Re-configurable Computing Approach This paper describes the ISA-level design of one re-configurable VLIW processor for streaming data applications with alternating data width. This design is based on the original designs of Hardware Genetic Algorithm Research at RUCS, Reading [1], Free configurable RISC processor for streaming data applications with different data widths at FHTW Berlin [3], and the Freedom CPU Project [5] for the host CPU. 1.1 Programmable Processors The stored programme processor with ISA architecture is the basics of computer architectures for at least two reasons: It allowed non-permanent customisation and application development after fabrication. It reused the same active computing resources in time in order to support large computations on small amounts of processing hardware. To make these possible, architects continued to rely on large memories to economically hold task descriptions and intermediate data and small amounts of active processing which is heavily multiplexed to perform the actual computations. The efficiency of the architecture for different data formats tells us what the architecture can provide when the task requirements match the architectural assumptions. If the task requires the native manipulation of small data words on a large word machine, we will yield only a fraction of that peak. Fig.1. Spatial vs. Temporal Computation for the expression y = Ax 2 + Bx + C [2]. 1.2 Re-configurable devices Re-configurable devices can be configured after fabrication to solve any computational task. These devices are best exemplified today by FPGA. In these re-
2 International Workshop NGNT 98 configurable devices, tasks are implemented by spatially composing primitive operations and operators with the possibility of temporally changing the hardware of the operators rather then temporally composing of instructions sequences in Princeton style processors. The re-configurable processor on FPGA can perform different operations on each bit, so re-configurable devices can be optimised to the data width of streaming data flows. The central theme of this work is to mix the advantages of Non-von-Neumann architectures with the advantages of re-configurable processing elements. 2 Design of re-configurable data stream processor 2.1 Configurable general-purpose devices Configurable architectures can perform any of a number of different operations. Once the instruction has been "configured" into the device, it is not changed during a data stream of equal data type is continuing. Configuration context is the collection of FPGA control bits that describe the behaviour of a general-purpose computing device on one operation cycle of few instructions for a data stream with defined data width. One programming stream for a conventional FPGA containing instructions for every array element along with interconnect composes a "configuration context". Integer data streams with variable data appear in application such: Video & 3D software algorithms Video encoding/decoding that operate in blocks of data FIR filter algorithms that operate on stream of data The re-configurable VLIW processor to be developed, have to compute integer numbers of 8-, 16-, 32- and 64 bit data width by dedicated register files and ALU in parallel. The register files, internal busses and ALU are re-configurable to the data width required. 2.2 The re-configurable streaming data approach Streaming Data applications require maximum performance for architectures with a customised number of instructions. This paper [3] explores the possibility of enabling a partial customisability of the instruction set of VLIW processors for embedded Streaming Data applications, by exploiting FPGA technology. In particular the formal methodology presented in [4] is modified for the custom instruction sets used for Streaming Data algorithms to select the computational hot spots in it. The novelty of the proposed method is the customising of the method for analysing the Control Graph in [4] to given Streaming Data application with different data widths of the operands to be implemented via reconfigurable R-CPU on FPGA. A skeleton of the proposed design flow is depicted in [3], Figure 2. This development focuses first, according to [4] on the construction of a theoretical model and of a strategy to identify the Streaming Data customised operations to be implemented via re-configurable R-CPU with different data width. A new op-code denoted in [4] as the fpga-opcode is correspondingly generated and it replaces the relevant segment of computation in the translation from high level code into machine code. The new fpga-opcode is made available to the compiler as an extension of the machine instruction set and information such as latency of the fpga-opcode which must be known for scheduling is also given. With this target architecture the computational procedure becomes that of extracting from the application algorithm the segments of computation that are to be implemented as fpga-opcodes. This approach, proposed in [4], and re-designed in this paper identifies the Streaming Data instructions based on the Control Graph (CG) corresponding to the application, from which suitable sub-graphs for operations with the same data width are extracted. Analysing the CG of the application algorithm identifies the Streaming Data instructions to be mapped onto the parallel R-CPU. The aim is to identify sub-algorithms with Streaming Data instructions and the usefully mapping onto a dedicated R-CPU [3],
3 International Workshop NGNT 99 [4]. The Binary Input and Unary Output (BIUO) nodes of the CG have two inputs at most and fan-out equal to one. 2.3 Formal definition of a BIUO A formal definition of a BIUO sub Control Graph B i/j is as follows: Denote by G i =< V i,j ; E i/j > is a sub-graph where V i,j is the set of nodes in G i where i={0,1,2 input edges, j = {1 output edges and E i/j is the set of all edges in G i departing from such nodes. An edge e i/j E i/j is described by its source node (v I,j V i,j ) and its destination node v I,j V i,j and it is denoted by e i/j (k, l). If for all v I,j V I,j it is true that e i/j (k, l) E i/j ; v I,j V i,j. Then G is BIUO. Any node in V I,j may have incoming edges originating from nodes not belonging to V I,j. The above property can be used as the basis for an algorithm (described in [3]) that extracts Streaming Data operations nodes (BIUO) from all computational hot spot nodes in the CG. The upper bound on CG build of BIUO is a binary tree with all topological properties of the binary tree. If n are a number of operands, V i,j = n-1, E i/j = 2n BIUO nodes extracting lemma Lemma 1 in [4] has to be converted for BIUO nodes as: All BIUOs in the CG are either BIUO or contained in a BIUO. The proof is immediate. In the following the algorithm for the identification of all BIUO in a CG in [4] is modified for BIUO operations and the re-configurable PU to be generated for this operations: { Node Nodes_to_be analysed do { { Generate BIUO(Node) Nodes_to_be_analysed - = Nodes_in_BIUO Generate_BIUO_nodes (Node) { for (node_index=number_of_nodes, node_index > 0; node_index --) if (fan-out==0&&fan-in==0) { Generate_fixed_PU_Node else if (fan-out==1&&fan-in==1) { Generate_BIUO_PU_Node else if (fan-out==1&&fan-in==2) { Generate_BIUO_PU_Node else Generate_fixed_PU_Node Fig. 2. Pseudo-code for the generation of all BIUO within the CG. The algorithm operates in two steps: first, a node is chosen to be the exit node, then the program activates a function which builds the BIUO related to such exit node. Exit nodes are chosen upwards, i.e. starting from the exits of the CG. Initially, the set of Nodes_to_be_analysed coincides with the set of nodes of the CG. When a BIUO has been generated, its nodes are removed from the Nodes to be analysed set. The function Generate BIUO starts from the chosen exit_node and recursively tries to include its parents in the BIUO being generated. Recursion ends when the encountered node is nonlegal (e.g. it is a non-streaming Data instruction) or has a non re-convergent fan-out. The proposed algorithm shows a complexity linear with the number of nodes in the examined CG as the algorithm proposed in [4].
4 International Workshop NGNT Design of VLIW processor for the re-configurable approach 3.1 Re-configurable RISC CPU for variable data widths - the calculator The re-configurable CPU core is a two-address machine with RISC ISA architecture and orthogonal GPR register file. Address bus width of 16 bit Data busses width of 8-, 16-, 32- and 64 bit for the different units (ALU, GPR) 3.2 Re-configurable Systolic array - the data width sorter The re-configurable Systolic array - the data width sorter is based on the hardware research in [6->1]. The research in Generic Algorithms (GA) is centred on the development of a novel design which uses systolic arrays. The generic concept is extended by exploiting the pipeline principle to design a device that is independent of the lengths of the chromosomes being used in a particular problem. The systolic arrays themselves are easily scalable to implement different population sizes. Prototype systolic array cells have been designed and targeted to the Xilinx XC4000 FPGA [1]. 3.3 Re-configurable VLIW-CPU instruction set and format The first task designing the instruction set is to discuss the instruction to join the instruction set for the data stream approach in order to ensure ISA and EXO compatibility of the processor. Each VLIW instruction has 8 major fields: The Systolic sorter fields controls the systolic operation ALU and the global LOAD/STORE operations via crossbar. The information on the streaming data type sorted on every data output of the systolic sorter is coded as output in the FPGA Condition Code Registers of the systolic sorter. The R-CPUa, R-CPUb, R-CPUc and R-CPUd fields control the four R-CPU s function. The R-CPU is a two-address machine. The FPU_memory and FPU_control fields controls the 32 bit RISC Fixed Procesor Unit (FPU) in performing LOAD/STORE and/or control oprerations [5]. The FPGA-code contains the FPGA-SRAM images of the RPU and systolic units. The VLIW control code in [3] Consider, for example, the following instruction format: size : 32 8, 8 free 16/24 16/24 16/24 16/24 8 6/8 bits : function: F-CPU Systolc sorter R-CPU R-CPU R-CPU R-CPU FPGA code VLIW control Fig. 3. The VLIW-CPU instruction format. 4 Data, control and address path design of the configurable VLIW The VLIW core implements the host function for the systolic sorter and the four reconfigurable R-CPU calculators. Furthermore, the VLIW core executes all ALU, control and LOAD/STORE instructions in the program, there are not streaming data instructions. The task of the VLIW core is to synchronise as Out-of-Order the operations of the R-CPU and the systolic sorter, to execute the FPGA-code to reconfigure the R-CPUs and to invoice the LOAD/STORE operations for the systolic sorter (Fig 4.). The crossbar between the R-CPU data registers, the main memory, and the execution units is a central part of the VLIW architecture. The R-CPU data register set is read-only through this device which virtually provides it with than four ports. The crossbar extends the R-CPU data register set's read ports, making four "vertical" buses for all R-CPU and each bus is connected to one of the input ports of the Dual-port-memory with "horizontal" buses. It also performs some width formatting (byte, word, etc). Accessing a R-CPU data register takes two cycles
5 International Workshop NGNT 101 from the time the register number has been decoded: one cycle for the register set and another for the crossbar. Fig. 4.The VLIW-CPU architecture. 5 Generating the FPGA code - VLIW re-configurable procedure The task of the systolic sorter is to generate a condition code for the different data widths as the result of sorting the streaming data. The compiler prior to execution of the application code drives reconfigurations of the FPGA, or possibly at the beginning of every section of code that requires reconfiguration. Some systolic sorter driven procedure designs for activating the fpga-code in the FPU are discussed in [3]. 6 Open problems and concluding remarks This paper presents the ISA level behavioural design of an "Re-configurable VLIW processor for data streams with variable word width". The topics below are open problems - behavioural description of the systolic array sorter, the data RAM, the VLIW crossbar, of the re-configurable data busses in the VLIW 7 References [1] Bland I.M., Megson, G.M., The systolic array genetic algorithm, an example of systolic arrays as a reconfigurable design methodology, Proc 6th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM98), IEEE Computer Society. ISBN , August [2] DeHon, Andre, Re-configurable Architectures for General-Purpose Computing, A.I. Technical Report No. 1586, M.I.T. Artifical Intelligence Lab., Oct [3] Iossifov, V., Megson, G.M., Re-configurable VLIW processor for data streams with variable word width, Technical report RUCS, University of Reading, July [4] Pozzi, L., Methodolgies for design of Application-Specific Re-configurable VLIW Processors, PhD Thesis, Politecnico di Milano, Dip. di Elettronica e Informazione, Jan [5] Freedom CPU Project F-CPU: [6] What Is Re-configurable Computing?
Reconfigurable Computing. Introduction
Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally
More informationAbstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE
A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany
More informationArchitectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.
Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central
More informationVLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design
VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory.
More informationComputer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can
More informationSoftware Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,
More informationECE 486/586. Computer Architecture. Lecture # 7
ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix
More informationCS 101, Mock Computer Architecture
CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically
More informationComputer Architecture 2/26/01 Lecture #
Computer Architecture 2/26/01 Lecture #9 16.070 On a previous lecture, we discussed the software development process and in particular, the development of a software architecture Recall the output of the
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationComputer Architecture
Computer Architecture Topics: Machine Organization Machine Cycle Program Execution Machine Language Types of Memory & Access Von Neumann Design 1) Two key ideas 1) The stored program concept 1) instructions
More informationIntegrating MRPSOC with multigrain parallelism for improvement of performance
Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,
More informationA Process Model suitable for defining and programming MpSoCs
A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.
More informationProcessor Design. Introduction, part I
Processor Design Introduction, part I Professor Jari Nurmi Institute of Digital and Computer Systems Tampere University of Technology, Finland email jari.nurmi@tut.fi Background Some trends in digital
More informationBasic Computer Architecture
Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I
More informationNovel Design of Dual Core RISC Architecture Implementation
Journal From the SelectedWorks of Kirat Pal Singh Spring May 18, 2015 Novel Design of Dual Core RISC Architecture Implementation Akshatha Rai K, VTU University, MITE, Moodbidri, Karnataka Basavaraj H J,
More informationA Lost Cycles Analysis for Performance Prediction using High-Level Synthesis
A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,
More informationEE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination
1 Student name: Date: June 26, 2008 General requirements for the exam: 1. This is CLOSED BOOK examination; 2. No questions allowed within the examination period; 3. If something is not clear in question
More informationCS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.
CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 3: von Neumann Architecture von Neumann Architecture Our goal: understand the basics of von Neumann architecture, including memory, control unit
More informationR.W. Hartenstein, et al.: A Reconfigurable Arithmetic Datapath Architecture; GI/ITG-Workshop, Schloß Dagstuhl, Bericht 303, pp.
# Algorithms Operations # of DPUs Time Steps per Operation Performance 1 1024 Fast Fourier Transformation *,, - 10 16. 10240 20 ms 2 FIR filter, n th order *, 2(n1) 15 1800 ns/data word 3 FIR filter, n
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationEfficient Self-Reconfigurable Implementations Using On-Chip Memory
10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University
More informationCS 24: INTRODUCTION TO. Spring 2018 Lecture 3 COMPUTING SYSTEMS
CS 24: INTRODUCTION TO Spring 2018 Lecture 3 COMPUTING SYSTEMS LAST TIME Basic components of processors: Buses, multiplexers, demultiplexers Arithmetic/Logic Unit (ALU) Addressable memory Assembled components
More informationEC-801 Advanced Computer Architecture
EC-801 Advanced Computer Architecture Lecture 5 Instruction Set Architecture I Dr Hashim Ali Fall 2018 Department of Computer Science and Engineering HITEC University Taxila!1 Instruction Set Architecture
More informationAnnouncement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines
Announcement Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Seung-Jong Park (Jay) http://wwwcsclsuedu/~sjpark 1 2 Chapter 9 Objectives 91 Introduction Learn the properties that often distinguish
More informationChapter 4. The Processor Designing the datapath
Chapter 4 The Processor Designing the datapath Introduction CPU performance determined by Instruction Count Clock Cycles per Instruction (CPI) and Cycle time Determined by Instruction Set Architecure (ISA)
More informationWhy Study Assembly Language?
Why Study Assembly Language? This depends on the decade in which you studied assembly language. 1940 s You cannot study assembly language. It does not exist yet. 1950 s You study assembly language because,
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationProcessor design - MIPS
EASY Processor design - MIPS Q.1 What happens when a register is loaded? 1. The bits of the register are set to all ones. 2. The bit pattern in the register is copied to a location in memory. 3. A bit
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationCOMPUTER STRUCTURE AND ORGANIZATION
COMPUTER STRUCTURE AND ORGANIZATION Course titular: DUMITRAŞCU Eugen Chapter 4 COMPUTER ORGANIZATION FUNDAMENTAL CONCEPTS CONTENT The scheme of 5 units von Neumann principles Functioning of a von Neumann
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Layered View of the Computer http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Assembly/Machine Programmer View
More informationRECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA
RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA 1 HESHAM ALOBAISI, 2 SAIM MOHAMMED, 3 MOHAMMAD AWEDH 1,2,3 Department of Electrical and Computer Engineering, King Abdulaziz University
More informationThe Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined
More informationSingle Pass Connected Components Analysis
D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected
More informationChapter One. Introduction to Computer System
Principles of Programming-I / 131101 Prepared by: Dr. Bahjat Qazzaz -------------------------------------------------------------------------------------------- Chapter One Introduction to Computer System
More informationOrganic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design
Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the
More informationI ve been getting this a lot lately So, what are you teaching this term? Computer Organization. Do you mean, like keeping your computer in place?
I ve been getting this a lot lately So, what are you teaching this term? Computer Organization. Do you mean, like keeping your computer in place? here s the monitor, here goes the CPU, Do you need a class
More informationPipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!
Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!
More informationPIPELINE AND VECTOR PROCESSING
PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates
More informationTDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design
1 TDT4255 Computer Design Lecture 4 Magnus Jahre 2 Outline Chapter 4.1 to 4.4 A Multi-cycle Processor Appendix D 3 Chapter 4 The Processor Acknowledgement: Slides are adapted from Morgan Kaufmann companion
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationCS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.
CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. Part 3: von Neumann Architecture von Neumann Architecture Our goal: understand the basics of von Neumann architecture, including memory, control unit
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationParallel Solutions of the Longest Increasing Subsequence Problem Using Pipelined Optical Bus Systems
Parallel Solutions of the Longest Increasing Subsequence Problem Using Pipelined Optical Bus Systems David SEME and Sidney YOULOU LaRIA, Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf,
More informationSAE5C Computer Organization and Architecture. Unit : I - V
SAE5C Computer Organization and Architecture Unit : I - V UNIT-I Evolution of Pentium and Power PC Evolution of Computer Components functions Interconnection Bus Basics of PCI Memory:Characteristics,Hierarchy
More informationUnit 9 : Fundamentals of Parallel Processing
Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing
More informationMARIE: An Introduction to a Simple Computer
MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.
More informationECE 571 Advanced Microprocessor-Based Design Lecture 3
ECE 571 Advanced Microprocessor-Based Design Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 January 2018 Homework #1 was posted Announcements 1 Microprocessors Also
More informationCpu Architectures Using Fixed Length Instruction Formats
Cpu Architectures Using Fixed Length Instruction Formats Fixed-length instructions (RISC's). + allow easy fetch Load-store architectures. can do: add r1=r2+r3 What would be a good thing about having many
More informationInstruction Set Overview
MicroBlaze Instruction Set Overview ECE 3534 Part 1 1 The Facts MicroBlaze Soft-core Processor Highly Configurable 32-bit Architecture Master Component for Creating a MicroController Thirty-two 32-bit
More informationTeam 1. Common Questions to all Teams. Team 2. Team 3. CO200-Computer Organization and Architecture - Assignment One
CO200-Computer Organization and Architecture - Assignment One Note: A team may contain not more than 2 members. Format the assignment solutions in a L A TEX document. E-mail the assignment solutions PDF
More informationTHE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION
THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION Radu Balaban Computer Science student, Technical University of Cluj Napoca, Romania horizon3d@yahoo.com Horea Hopârtean Computer Science student,
More informationRUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch
RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 3 Fundamentals in Computer Architecture Computer Architecture Part 3 page 1 of 55 Prof. Dr. Uwe Brinkschulte,
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationChapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics
Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.
More informationBasic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,
UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationComputer Architecture
Computer Architecture Context and Motivation To better understand a software system, it is mandatory understand two elements: - The computer as a basic building block for the application - The operating
More information55:132/22C:160, HPCA Spring 2011
55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is
More informationA Scalable Multiprocessor for Real-time Signal Processing
A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch
More informationIncremental Reconfiguration for Pipelined Applications
Incremental Reconfiguration for Pipelined Applications Herman Schmit Dept. of ECE, Carnegie Mellon University Pittsburgh, PA 15213 Abstract This paper examines the implementation of pipelined applications
More informationInstruction Set Architecture. "Speaking with the computer"
Instruction Set Architecture "Speaking with the computer" The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture Digital Design
More informationComputer Systems Organization
The IAS (von Neumann) Machine Computer Systems Organization Input Output Equipment Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions
More informationCopyright 2007 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol.
Copyright 2007 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol. 6937, 69370N, DOI: http://dx.doi.org/10.1117/12.784572 ) and is made
More informationBlog -
. Instruction Codes Every different processor type has its own design (different registers, buses, microoperations, machine instructions, etc) Modern processor is a very complex device It contains Many
More informationComputer Architecture
Computer Architecture Computer Architecture Hardware INFO 2603 Platform Technologies Week 1: 04-Sept-2018 Computer architecture refers to the overall design of the physical parts of a computer. It examines:
More informationProcessor (I) - datapath & control. Hwansoo Han
Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two
More informationENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design
ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2014 Sources: Computer
More informationAn Instruction Stream Compression Technique 1
An Instruction Stream Compression Technique 1 Peter L. Bird Trevor N. Mudge EECS Department University of Michigan {pbird,tnm}@eecs.umich.edu Abstract The performance of instruction memory is a critical
More informationNew Advances in Micro-Processors and computer architectures
New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,
More informationEE 3170 Microcontroller Applications
EE 3170 Microcontroller Applications Lecture 4 : Processors, Computers, and Controllers - 1.2 (reading assignment), 1.3-1.5 Based on slides for ECE3170 by Profs. Kieckhafer, Davis, Tan, and Cischke Outline
More informationThe Von Neumann Architecture. Designing Computers. The Von Neumann Architecture. CMPUT101 Introduction to Computing - Spring 2001
The Von Neumann Architecture Chapter 5.1-5.2 Von Neumann Architecture Designing Computers All computers more or less based on the same basic design, the Von Neumann Architecture! CMPUT101 Introduction
More informationUniversität Dortmund. ARM Architecture
ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture
More informationChapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations
Chapter 4 The Processor Part I Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations
More informationETH, Design of Digital Circuits, SS17 Review Session Questions I
ETH, Design of Digital Circuits, SS17 Review Session Questions I Instructors: Prof. Onur Mutlu, Prof. Srdjan Capkun TAs: Jeremie Kim, Minesh Patel, Hasan Hassan, Arash Tavakkol, Der-Yeuan Yu, Francois
More informationCMPUT101 Introduction to Computing - Summer 2002
7KH9RQ1HXPDQQ$UFKLWHFWXUH Chapter 5.1-5.2 Von Neumann Architecture 'HVLJQLQJ&RPSXWHUV All computers more or less based on the same basic design, the Von Neumann Architecture! CMPUT101 Introduction to Computing
More informationDesigning Computers. The Von Neumann Architecture. The Von Neumann Architecture. The Von Neumann Architecture
Chapter 5.1-5.2 Designing Computers All computers more or less based on the same basic design, the Von Neumann Architecture! Von Neumann Architecture CMPUT101 Introduction to Computing (c) Yngvi Bjornsson
More informationReal instruction set architectures. Part 2: a representative sample
Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length
More informationModule 2: Introduction to AVR ATmega 32 Architecture
Module 2: Introduction to AVR ATmega 32 Architecture Definition of computer architecture processor operation CISC vs RISC von Neumann vs Harvard architecture AVR introduction AVR architecture Architecture
More informationMultiple Instruction Issue. Superscalars
Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths
More informationComputer Organization
INF 101 Fundamental Information Technology Computer Organization Assistant Prof. Dr. Turgay ĐBRĐKÇĐ Course slides are adapted from slides provided by Addison-Wesley Computing Fundamentals of Information
More informationMajor Advances (continued)
CSCI 4717/5717 Computer Architecture Topic: RISC Processors Reading: Stallings, Chapter 13 Major Advances A number of advances have occurred since the von Neumann architecture was proposed: Family concept
More informationNetwork-on-Chip Micro-Benchmarks
Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationComputer Systems Architecture
Computer Systems Architecture Guoping Qiu School of Computer Science The University of Nottingham http://www.cs.nott.ac.uk/~qiu 1 The World of Computers Computers are everywhere Cell phones Game consoles
More informationPipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD
Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose
More informationDesign of memory efficient FIFO-based merge sorter
LETTER IEICE Electronics Express, Vol.15, No.5, 1 11 Design of memory efficient FIFO-based merge sorter Youngil Kim a), Seungdo Choi, and Yong Ho Song Department of Electronics and Computer Engineering,
More informationStructured Datapaths. Preclass 1. Throughput Yield. Preclass 1
ESE534: Computer Organization Day 23: November 21, 2016 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. March, 2015 Tabula closed doors 1 [src: www.tabula.com]
More informationManaging Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks
Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department
More informationIntroduction to reconfigurable systems
Introduction to reconfigurable systems Reconfigurable system (RS)= any system whose sub-system configurations can be changed or modified after fabrication Reconfigurable computing (RC) is commonly used
More informationA Streaming Multi-Threaded Model
A Streaming Multi-Threaded Model Extended Abstract Eylon Caspi, André DeHon, John Wawrzynek September 30, 2001 Summary. We present SCORE, a multi-threaded model that relies on streams to expose thread
More informationCOMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital
Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture
More informationCS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS
CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight
More informationCC312: Computer Organization
CC312: Computer Organization Dr. Ahmed Abou EL-Farag Dr. Marwa El-Shenawy 1 Chapter 4 MARIE: An Introduction to a Simple Computer Chapter 4 Objectives Learn the components common to every modern computer
More information