Lecture 8: RISC & Parallel Computers. Parallel computers

Size: px
Start display at page:

Download "Lecture 8: RISC & Parallel Computers. Parallel computers"

Transcription

1 Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer architecture. It is an attempt to produce more computation power by simplifying the instruction set of a computer. The opposed trend is Complex Instruction Set Computer (CISC), or the regular computers. Note, however, most recent processors are not typical RISC or CISC, but combine advantages of both approaches. Ex. new versions of PowerPC and recent Pentium processors. Zebo Peng, IDA, LiTH 2 1

2 Main features of CISC The main idea is to make machine instructions similar to high-level language statements. Ex. CASE (switch) instruction on the VAX computer. A large number of instructions (> 200) with complex instructions and data types. Many and complex addressing modes (e.g., indirect addressing is often used). Microprogramming techniques are used to implement the control unit, due to its complexity. Zebo Peng, IDA, LiTH 3 Arguments for CISC A rich instruction set can simplify the compiler by having instructions which match HLL statements. This works fine if the number of HL languages is small. If a program is smaller in size, due to the use of complex machine instructions, it might have better performance: It takes up less memory space and need fewer instruction fetches, which is important when we have limited memory space. Fewer number of instructions are executed, which may lead to smaller execution time. Program execution efficiency can be improved by implementing operations in microcode rather than machine code. Zebo Peng, IDA, LiTH 4 2

3 Problems with CISC A large instruction set requires complex and time consuming hardware steps to decode and execute instructions. Some complex machine instructions may not match HLL statements, in which case they may be of little use. This will be a major problem if the number of HL languages is getting bigger. Memory access is a major issue, due to complex addressing modes and multiple memory-access instructions. The irregularity of instruction execution reduces the efficiency of instruction pipelining. It will also lead to a complex design tasks, thus a larger timeto-market, and the use of older technology. Zebo Peng, IDA, LiTH 5 Instruction Execution Features (1) Several studies were conducted to determine the execution characteristics of machine instructions. The studies show: Frequency of machine instructions executed: moves of data: 33% conditional branches: 20% arithmetic/logic operations: 16% others: between 0.1% and 10% ca. 70% Addressing modes: the majority of instructions uses simple addressing modes. complex addressing modes (memory indirect, indexed + indirect, etc.) are only used by ~18% of the instructions. Zebo Peng, IDA, LiTH 6 3

4 Instruction Execution Features (2) Operand types: ca. 80% of the operands are scalars (integers, reals, characters, etc.). the rest (ca. 20%) are arrays/structures, and 90% of them are global variables. Ca. 80% of the scalars are local variables. Conclusion: 1) the majority of operands are local variables of scalar type, which can be stored in registers. 2) implementing many registers in an architecture is good, since we can reduce the number of memory accesses. Zebo Peng, IDA, LiTH 7 Procedure Calls Studies has also been done on the percentage of the CPU execution time spent executing different HLL statements. Most of the time is spent executing CALLs and RETURNs in HLL programs. Even if only 15% of the executed HLL statements is a CALL or RETURN, they consume most of the execution time, because of their complexity. A CALL or RETURN is compiled into a relatively long sequence of machine instructions with a lot of memory references. The contents of registers must be saved to memory and reinstall from it with each procedure CALL and RETURN. Zebo Peng, IDA, LiTH 8 4

5 Evaluation Conclusions An overwhelming dominance of simple (ALU and move) operations over complex operations. There is also a dominance of simple addressing modes. Large frequency of operand accesses; on average each instruction references 1.9 operands. Most of the referenced operands are scalars (can therefore be stored in registers) and are local variables or parameters. Optimizing the procedure CALL/RETURN mechanism promises large benefits in performance improvement. The RISC computers are developed to take advantages of the above results, and thus delivering better performance! Zebo Peng, IDA, LiTH 9 Main Characteristics of RISC A small number of simple instructions (desirably < 100). Simple and small decode and execution hardware is required. A hard-wired controller is needed, rather than microprogramming. CPU takes less silicon area to implement, and runs also faster. Execution of one instruction per clock cycle. The instruction pipeline performs more efficiently due to simple instructions and similar execution patterns. Complex operations are executed as a sequence of simple instructions. In the case of CISC they are executed as one single or a few complex instructions. Zebo Peng, IDA, LiTH 10 5

6 Main Characteristics of RISC (Cont d) Load-and-store architecture Only LOAD and STORE instructions reference data in memory. All other instructions operate only with registers (register-toregister instructions). Only a few simple addressing modes are used. Ex. register, direct, register indirect, and displacement. Instructions are of fixed length and uniform format. Loading and decoding of instructions are simple and fast. It is not needed to wait until the length of an instruction is known in order to start decoding it. Decoding is simplified because the opcode and address fields are located in the same position for all instructions. Zebo Peng, IDA, LiTH 11 Main Characteristics of RISC (Cont d) A large number of registers can be implemented. Variables and intermediate results can be stored in them in order to reduce memory accesses. All local variables of procedures and the passed parameters can also be stored in the registers. The large number of registers is due to that the reduced complexity of the processor leaves silicon space on the chip to implement them. This is usually not the case with CISC machines. Zebo Peng, IDA, LiTH 12 6

7 Register Windows A large number of registers is usually very useful. However, if contents of all registers must be saved at every procedure call, more registers mean longer delay. A solution to this problem is to divide the register file into a set of fixed-size windows. Each window is assigned to a procedure. Windows for adjacent procedures are overlapped to allow parameter passing. Zebo Peng, IDA, LiTH 13 Main Advantages of RISC Best support is given by optimizing the most used and most time consuming architecture aspects. Frequently executed instructions; Simple memory reference; Procedure call/return; and Pipeline design. Consequently, we have High performance for many applications; Less design complexity; Reduced power consumption; and reducing design cost and time-to-market, leading also to the use of newer technology. Zebo Peng, IDA, LiTH 14 7

8 Criticism of RISC An operation might need two, three, or more instructions to accomplish. More memory access cycles might be needed. Execution speed may be reduced for certain applications. It usually leads to longer programs, which needs a larger memory space to store. This is less a problem now with the development of memory technologies. RISC makes it more difficult to program machine codes and assembly programs. Programming becomes more time consuming. Luckily very little low level programming is done nowadays. Zebo Peng, IDA, LiTH 15 Historical CISC/RISC Examples Characteristics CISC Machines RISC Machines Processor IBM370 VAX i486 SPARC MIPS Year developed No. of instructions Inst. size (bytes) Addressing modes No. of G.P. registers Cont. M. size (Kbits) Zebo Peng, IDA, LiTH 16 8

9 A CISC Example Pentium I 32-bit integer processor: Registers: 8 general; 6 for segmented addresses (16-bits); 2 status/control; 1 instruction-pointer (PC). Two instruction pipelines: u-pipe can execute any instructions. v-pipe executes only simple instructions (control is hardwired!). On-chip floating-point unit. Pre-fetch buffer for instructions (16 bytes of instructions). Branch target buffer (512-entry branch history table). Microprogrammed control. Instruction set: Number of instructions: 235 (+ 57 MMX instructions). e.g., FYL2XP1 x y performs y log 2 (x+1) in 313 clock cycles. Instruction size: 1-12 bytes (3.2 on average). Addressing modes: 11. Zebo Peng, IDA, LiTH 17 CISC Pentium I (Cont d) Memory organization: Address length: 32 bits (4 GB space). Support virtual memory by paging. On-chip separate 8 KB I-cache and 8K D-cache. D-cache can be configured as write-through or write-back. L2 cache: 256 KB or 512 KB. Instruction execution: Execution models: reg-reg, reg-mem, and mem-mem. Five-stage instruction pipeline: Fetch, Decode1, Decode2 (address generation), Execute, and Write-Back. Zebo Peng, IDA, LiTH 18 9

10 A RISC Example SPARC Scalable Processor Architecture: 32-bit processor: An Integer Unit (IU) to execute all instructions. A Floating-Point Unit (FPU) co-processor working concurrently with IU for arithmetic operations on floating point numbers. 69 basic instructions. A linear, 32-bit virtual-address space (4G). 40 to 520 general purpose 32-bit registers. grouped into 2 to 32 overlapping reg windows + 8 global registers. Each group has16 registers. Scalability. Latest high-end processors (2016) are Fujitsu's SPARC64 X+, which is designed for 64-bit operations. Zebo Peng, IDA, LiTH 19 Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 20 10

11 Motivation for Parallel Processing Traditional computers are not able to meet high-performance requirements for many applications, such as: Simulation of large complex systems in physics, economy, biology... Distributed data base with search function. Artificial intelligence and autonomous systems. Visualization and multimedia. Such applications are characterized often by a huge amount of numerical computations and/or a high quantity of input data. The only way to deliver the needed performance is to implement many processors in a single computer system. Hardware and silicon technology make it also possible to build machines with huge degree of parallelism cost effectively. Zebo Peng, IDA, LiTH 21 Flynn s Classification of Computers Based on the nature of the instruction flow executed by a computer and the data flow on which the instructions operate. The multiplicity of instruction stream and data stream gives us four different classes: Single instruction, single data stream SISD Multiple instruction, single data stream MISD Single instruction, multiple data stream SIMD Multiple instruction, multiple data stream MIMD Zebo Peng, IDA, LiTH 22 11

12 Single Instruction, Multiple Data SIMD A single machine instruction stream. Simultaneous execution on different sets of data. A large number of processing elements is usually implemented. Lockstep synchronization among the process elements. The processing elements can: have their respective private data memory; or share a common memory via an interconnection network. Array and vector processors are the most common examples of SIMD machines. Zebo Peng, IDA, LiTH 23 Vector Processor Architecture Scalar Instructions Scalar Unit Scalar Registers Instruction Fetch and Decode unit Scalar Functional Units Vector Registers Memory System Vector Instructions Vector Functional Units Vector Unit Zebo Peng, IDA, LiTH 24 12

13 Array Processors Built with N identical processing elements and a number of memory modules. All PEs are under the control of a single control unit. They execute instructions in a lock-step manner. Processing units and memory elements communicate with each other through an interconnection network. Different topologies can be used, e.g., a mesh. Complexity of the control unit is at the same level of the uniprocessor system. The control unit is usually itself implemented as a computer with its own registers, local memory and ALU (the host computer). Zebo Peng, IDA, LiTH 25 Global Memory Organization Control Unit IS PE1 PE2... PEn Interconnection Network M 1 M 2... M k I/O System Shared Memory Zebo Peng, IDA, LiTH 26 13

14 Dedicated Memory Organization Control Unit M cont IS PE1 PE2 M 1 M 2... Interconnection Network PEn M n I/O System Zebo Peng, IDA, LiTH 27 Performance Measurement Units MIPS = Millions of Instructions Per Second. FLOPS = FLoating Point Operations Per Second. kiloflops (kflops) = 10 3 megaflops (MFLOPS) = 10 6 gigaflops (GFLOPS) = 10 9 teraflops (TFLOPS) = petaflops (PFLOPS) = exaflops (EFLOPS) = zettaflops (ZFLOPS) = yottaflops (YFLOPS) = Zebo Peng, IDA, LiTH 28 14

15 Multiple Instruction, Multiple Data MIMD It consists of a set of processors. Simultaneously execute different instruction sequences. Different sets of data are operated on. The MIMD class can be further divided: Shared memory (tightly coupled): Symmetric multiprocessor (SMP) Non-uniform memory access (NUMA) Distributed memory (loosely coupled) = Clusters Zebo Peng, IDA, LiTH 29 MIMD with Shared Memory CU1 IS PE1 DS CU2 IS PE2 DS Shared Memory CUn IS PEn DS Tightly coupled, not very scalable. Typically called Multi-processor systems. Zebo Peng, IDA, LiTH 30 15

16 MIMD with Distributed Memory CU1 IS PE1 DS LM1 CU2... IS PE2... DS LM2... Interconnection Network CUn IS PEn DS LMn Loosely coupled with message passing, scalable. Typically called Multi-computer systems. Zebo Peng, IDA, LiTH 31 Symmetric Multiprocessors (Ex) A set of processors of similar capacity. They share same memory. A single memory or a set of memory modules. Memory access time is the same for each processor. Zebo Peng, IDA, LiTH 32 16

17 Non-Uniform Memory Access (NUMA) Each node is, in general, an SMP. Each node has its own main memory. Each processor has its own L1 and L2 caches. Zebo Peng, IDA, LiTH 33 NUMA (Cont d) Nodes are connected by some networking facility. Each processor sees a single addressable memory space: Each memory location has a unique systemwide address. Zebo Peng, IDA, LiTH 34 17

18 NUMA (Cont d) Load X Memory request order: L1 cache (local to processor) L2 cache (local to processor) Main memory (local to node) Remote memory (via the interconnect network) Different processors access different regions of memory at different speeds! Zebo Peng, IDA, LiTH 35 Loosely Coupled MIMD Clusters Cluster: A set of computers is connected typically over a highbandwidth local area network, and used as a multi-computer system. A group of interconnected stand-alone computers. Work together as a unified resource. Each computer is called a node. A node can also be a multiprocessor itself, such as an SMP. Message passing for communication between nodes. There is no longer a shared memory space. Ex. Google uses several clusters of off-the-shelf PCs to provide a cheap solution for its services. More than 15,000 nodes form a cluster. Zebo Peng, IDA, LiTH 36 18

19 Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 37 The Carry-Away Messages Computers are data processing machines built with the von Neumann architecture principle: CPU and memory memory access is the bottleneck. Cache memory Memory hierarchy Growing instruction complexity complicated controller. Microprogramming RISC architecture Sequential execution of instructions speed limitation. Instruction execution pipelining Superscalar execution Parallel processing improves performance of computer systems dramatically the main trend of computers. Zebo Peng, IDA, LiTH 38 19

20 Saturday(!), Jan 13, Written Examination Don t forget to register for the exam. Closed book. But, an English/Swedish dictionary is permitted (NOTE: no electronic dictionary). Answers can be written in English or/and Swedish. Cover only the topics discussed in the lectures. Follow the lecture notes when preparing for the exam. Two previous exams are given at the course website. Reading instructions are also at the course website. Zebo Peng, IDA, LiTH 39 Question & Feedback Dedicated time for answering exam-related questions: Thursday, January 11, 8:15-12:00, at Zebo s office, Building B Please provide feedback of the course: by ; and fill in the course evaluation please! Zebo Peng, IDA, LiTH 40 20

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) represents an important

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

REDUCED INSTRUCTION SET COMPUTERS (RISC)

REDUCED INSTRUCTION SET COMPUTERS (RISC) Datorarkitektur Fö 5/6-1 Datorarkitektur Fö 5/6-2 What are RISCs and why do we need them? REDUCED INSTRUCTION SET COMPUTERS (RISC) RISC architectures represent an important innovation in the area of computer

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

Computer Organization. Chapter 16

Computer Organization. Chapter 16 William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data

More information

Parallel computer architecture classification

Parallel computer architecture classification Parallel computer architecture classification Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer

More information

Parallel Processors. Session 1 Introduction

Parallel Processors. Session 1 Introduction Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations

More information

Major Advances (continued)

Major Advances (continued) CSCI 4717/5717 Computer Architecture Topic: RISC Processors Reading: Stallings, Chapter 13 Major Advances A number of advances have occurred since the von Neumann architecture was proposed: Family concept

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

Organisasi Sistem Komputer

Organisasi Sistem Komputer LOGO Organisasi Sistem Komputer OSK 14 Parallel Processing Pendidikan Teknik Elektronika FT UNY Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple

More information

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2 Lecture 5: Instruction Pipelining Basic concepts Pipeline hazards Branch handling and prediction Zebo Peng, IDA, LiTH Sequential execution of an N-stage task: 3 N Task 3 N Task Production time: N time

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers William Stallings Computer Organization and Architecture Chapter 12 Reduced Instruction Set Computers Major Advances in Computers(1) The family concept IBM System/360 1964 DEC PDP-8 Separates architecture

More information

RISC Processors and Parallel Processing. Section and 3.3.6

RISC Processors and Parallel Processing. Section and 3.3.6 RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Processor Architecture and Interconnect

Processor Architecture and Interconnect Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing

More information

CSCI 4717 Computer Architecture

CSCI 4717 Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD 17 April 2018

ARSITEKTUR SISTEM KOMPUTER. Wayan Suparta, PhD   17 April 2018 ARSITEKTUR SISTEM KOMPUTER Wayan Suparta, PhD https://wayansuparta.wordpress.com/ 17 April 2018 Reduced Instruction Set Computers (RISC) CISC Complex Instruction Set Computer RISC Reduced Instruction Set

More information

RISC Principles. Introduction

RISC Principles. Introduction 3 RISC Principles In the last chapter, we presented many details on the processor design space as well as the CISC and RISC architectures. It is time we consolidated our discussion to give details of RISC

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Chapter 18. Parallel Processing. Yonsei University

Chapter 18. Parallel Processing. Yonsei University Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types

More information

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3 System Performance CPU Performance Disk Performance CMSC 313 Lecture 27 Announcement: Don t use oscillator in DigSim3 UMBC, CMSC313, Richard Chang Bottlenecks The performance of a process

More information

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory

Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Parallel Machine 1 CPU Usage Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Solution Use multiple CPUs or multiple ALUs For simultaneous

More information

Computer Architecture and Data Manipulation. Von Neumann Architecture

Computer Architecture and Data Manipulation. Von Neumann Architecture Computer Architecture and Data Manipulation Chapter 3 Von Neumann Architecture Today s stored-program computers have the following characteristics: Three hardware systems: A central processing unit (CPU)

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih

ARCHITECTURAL CLASSIFICATION. Mariam A. Salih ARCHITECTURAL CLASSIFICATION Mariam A. Salih Basic types of architectural classification FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE FENG S CLASSIFICATION Handler Classification Other types of architectural

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved. + EKT 303 WEEK 13 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. + Chapter 15 Reduced Instruction Set Computers (RISC) Table 15.1 Characteristics of Some CISCs, RISCs, and Superscalar

More information

Classification of Parallel Architecture Designs

Classification of Parallel Architecture Designs Classification of Parallel Architecture Designs Level of Parallelism Job level between jobs between phases of a job Program level between parts of a program within do-loops between different function invocations

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

Advanced Computer Architecture. The Architecture of Parallel Computers

Advanced Computer Architecture. The Architecture of Parallel Computers Advanced Computer Architecture The Architecture of Parallel Computers Computer Systems No Component Can be Treated In Isolation From the Others Application Software Operating System Hardware Architecture

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Chapter 2: Data Manipulation

Chapter 2: Data Manipulation Chapter 2 Data Manipulation Computer Science An Overview Tenth Edition by J. Glenn Brookshear Presentation files modified by Farn Wang Chapter 2 Data Manipulation 2.1 Computer Architecture 2.2 Machine

More information

Lecture 2: Memory Systems

Lecture 2: Memory Systems Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer

More information

Computer Organization and Design, 5th Edition: The Hardware/Software Interface

Computer Organization and Design, 5th Edition: The Hardware/Software Interface Computer Organization and Design, 5th Edition: The Hardware/Software Interface 1 Computer Abstractions and Technology 1.1 Introduction 1.2 Eight Great Ideas in Computer Architecture 1.3 Below Your Program

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

Flynn s Classification

Flynn s Classification Flynn s Classification Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware ggao@capsl.udel.edu 652-14F-PXM-intro 1 Execution

More information

Introduction to parallel computing

Introduction to parallel computing Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the

Evolution of ISAs. Instruction set architectures have changed over computer generations with changes in the Evolution of ISAs Instruction set architectures have changed over computer generations with changes in the cost of the hardware density of the hardware design philosophy potential performance gains One

More information

omputer Design Concept adao Nakamura

omputer Design Concept adao Nakamura omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu 1 1 Pascal s Calculator Leibniz s Calculator Babbage s Calculator Von Neumann Computer Flynn s Classification

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

Microprocessor Architecture Dr. Charles Kim Howard University

Microprocessor Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals Microprocessor Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer System CPU (with PC, Register, SR) + Memory 2 Computer Architecture ALU

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Lecture 4: Instruction Set Design/Pipelining

Lecture 4: Instruction Set Design/Pipelining Lecture 4: Instruction Set Design/Pipelining Instruction set design (Sections 2.9-2.12) control instructions instruction encoding Basic pipelining implementation (Section A.1) 1 Control Transfer Instructions

More information

Architecture of parallel processing in computer organization

Architecture of parallel processing in computer organization American Journal of Computer Science and Engineering 2014; 1(2): 12-17 Published online August 20, 2014 (http://www.openscienceonline.com/journal/ajcse) Architecture of parallel processing in computer

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Instruction Set Principles. (Appendix B)

Instruction Set Principles. (Appendix B) Instruction Set Principles (Appendix B) Outline Introduction Classification of Instruction Set Architectures Addressing Modes Instruction Set Operations Type & Size of Operands Instruction Set Encoding

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory.

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

COMPUTER STRUCTURE AND ORGANIZATION

COMPUTER STRUCTURE AND ORGANIZATION COMPUTER STRUCTURE AND ORGANIZATION Course titular: DUMITRAŞCU Eugen Chapter 4 COMPUTER ORGANIZATION FUNDAMENTAL CONCEPTS CONTENT The scheme of 5 units von Neumann principles Functioning of a von Neumann

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 3 Fundamentals in Computer Architecture Computer Architecture Part 3 page 1 of 55 Prof. Dr. Uwe Brinkschulte,

More information

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large

More information

Objectives of the Course

Objectives of the Course Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed

More information

Topics in computer architecture

Topics in computer architecture Topics in computer architecture Sun Microsystems SPARC P.J. Drongowski SandSoftwareSound.net Copyright 1990-2013 Paul J. Drongowski Sun Microsystems SPARC Scalable Processor Architecture Computer family

More information

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University ARM & IA-32 Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ARM (1) ARM & MIPS similarities ARM: the most popular embedded core Similar basic set

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

Control unit. Input/output devices provide a means for us to make use of a computer system. Computer System. Computer.

Control unit. Input/output devices provide a means for us to make use of a computer system. Computer System. Computer. Lecture 6: I/O and Control I/O operations Control unit Microprogramming Zebo Peng, IDA, LiTH 1 Input/Output Devices Input/output devices provide a means for us to make use of a computer system. Computer

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Chapter 2: Instructions How we talk to the computer

Chapter 2: Instructions How we talk to the computer Chapter 2: Instructions How we talk to the computer 1 The Instruction Set Architecture that part of the architecture that is visible to the programmer - instruction formats - opcodes (available instructions)

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines

Announcement. Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Chapter 9 Objectives. 9.2 RISC Machines Announcement Computer Architecture (CSC-3501) Lecture 25 (24 April 2008) Seung-Jong Park (Jay) http://wwwcsclsuedu/~sjpark 1 2 Chapter 9 Objectives 91 Introduction Learn the properties that often distinguish

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Chapter 11. Introduction to Multiprocessors

Chapter 11. Introduction to Multiprocessors Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)

More information

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra Binary Representation Computer Systems Information is represented as a sequence of binary digits: Bits What the actual bits represent depends on the context: Seminar 3 Numerical value (integer, floating

More information

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018

Introduction. CSCI 4850/5850 High-Performance Computing Spring 2018 Introduction CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University What is Parallel

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Chapter 1: Perspectives

Chapter 1: Perspectives Chapter 1: Perspectives Copyright @ 2005-2008 Yan Solihin Copyright notice: No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means (electronic, mechanical,

More information