Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Size: px
Start display at page:

Download "Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization"

Transcription

1 Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization How the elements cooperate and communicate. How data is transmitted between processors. Abstractions and primitives for cooperation. Performance and Scalability Performance enhancement of parallelism: Speedup. Scalabilty of performance to larger systems/problems. #1 lec # 1 Spring

2 The Need And Feasibility of Parallel Computing Application demands: More computing cycles: Scientific computing: CFD, Biology, Chemistry, Physics,... General-purpose computing: Video, Graphics, CAD, Databases, Transaction Processing, Gaming Mainstream multithreaded programs, are similar to parallel programs Technology Trends Number of transistors on chip growing rapidly Clock rates expected to go up but only slowly Architecture Trends Instruction-level parallelism is valuable but limited Coarser-level parallelism, as in MPs, the most viable approach Economics: Today s microprocessors have multiprocessor support eliminating the need for designing expensive custom PEs Lower parallel system cost. Multiprocessor systems to offer a cost-effective replacement of uniprocessor systems in mainstream computing. #2 lec # 1 Spring

3 Scientific Computing Demand #3 lec # 1 Spring

4 Scientific Supercomputing Trends Proving ground and driver for innovative architecture and advanced techniques: Market is much smaller relative to commercial segment Dominated by vector machines starting in 70s Meanwhile, microprocessors have made huge gains in floating-point performance High clock rates. Pipelined floating point units. Instruction-level parallelism. Effective use of caches. Large-scale multiprocessors replace vector supercomputers Well under way already #4 lec # 1 Spring

5 Raw Uniprocessor Performance: 10,000 LINPACK CRAY n = 1,000 CRAY n = 100 Micro n = 1,000 Micro n = 100 LINPACK (MFLOPS) 1, CRAY 1s Ymp Xmp/416 Xmp/14se T94 C90 DEC 8200 IBM Power2/990 MIPS R4400 DEC Alpha HP9000/735 DEC Alpha AXP HP 9000/750 IBM RS6000/ MIPS M/2000 MIPS M/120 Sun 4/ #5 lec # 1 Spring

6 10,000 Raw Parallel Performance: MPP peak CRA Y peak LINPACK 1,000 ASCI Red LINP ACK (GFLOPS) CM-2 CM-200 Paragon XP/S MP (6768) Paragon XP/S MP (1024) CM-5 T3D Delta T932(32) Paragon XP/S C90(16) Ymp/832(8) ipsc/860 ncube/2(1024) 1 Xmp /416(4) #6 lec # 1 Spring

7 General Technology Trends Microprocessor performance increases 50% - 100% per year Transistor count doubles every 3 years DRAM size quadruples every 3 years Sun MIPS M/120 MIPS M2000 IBM RS HP DEC alpha Integer FP #7 lec # 1 Spring

8 Clock Frequency Growth Rate 1,000 Clock rate (MHz) i8086 i8008 i4004 i8080 i80286 i80386 R10000 Pentium Currently increasing 30% per year #8 lec # 1 Spring

9 Transistor Count Growth Rate 100,000,000 Transistors 10,000,000 1,000, ,000 10,000 i80286 i8080 i8008 i4004 i8086 i80386 R3000 R2000 R10000 Pentium 1, million transistors on chip by early 2000 s A.D. Transistor count grows much faster than clock rate - Currently 40% per year #9 lec # 1 Spring

10 System Attributes to Performance Performance benchmarking is program-mix dependent. Ideal performance requires a perfect machine/program match. Performance measures: Cycles per instruction (CPI) Total CPU time = T = C x t = C / f = I c x CPI x t I c = Instruction count = I c x (p + m x k) x t p = Instruction decode cycles t = CPU cycle time m = Memory cycles k = Ratio between memory/processor cycles C = Total program clock cycles f = clock rate MIPS Rate = I c / (T x 10 6 ) = f / (CPI x 10 6 ) = f x I c /(C x 10 6 ) Throughput Rate: W p = f /(I c x CPI) = (MIPS) x 10 6 /I c Performance factors: (I c, p, m, k, t) are influenced by: instruction-set architecture, compiler design, CPU implementation and control, cache and memory hierarchy. #10 lec # 1 Spring

11 CPU Performance Trends The microprocessor is currently the most natural building block for multiprocessor systems in terms of cost and performance. 100 Supercomputers 10 Performance 1 Mainframes Minicomputers Microprocessors #11 lec # 1 Spring

12 Parallelism in Microprocessor VLSI Generations 100,000,000 Bit-level parallelism Instruction-level Thread-level (?) Transistors 10,000,000 1,000, ,000 i80286 R10000 Pentium i80386 R3000 R2000 i ,000 i8080 i8008 i4004 1, #12 lec # 1 Spring

13 The Goal of Parallel Computing Goal of applications in using parallel machines: Speedup Speedup (p processors) = Performance (p processors) Performance (1 processor) For a fixed problem size (input data set), performance = 1/time Speedup fixed problem (p processors) = Time (1 processor) Time (p processors) #13 lec # 1 Spring

14 Elements of Modern Computers Computing Problems Algorithms and Data Structures Mapping Hardware Architecture Programming High-level Languages Binding (Compile, Load) Operating System Applications Software Performance Evaluation #14 lec # 1 Spring

15 Elements of Modern Computers 1 Computing Problems: Numerical Computing: Science and technology numerical problems demand intensive integer and floating point computations. Logical Reasoning: Artificial intelligence (AI) demand logic inferences and symbolic manipulations and large space searches. 2 Algorithms and Data Structures Special algorithms and data structures are needed to specify the computations and communication present in computing problems. Most numerical algorithms are deterministic using regular data structures. Symbolic processing may use heuristics or non-deterministic searches. Parallel algorithm development requires interdisciplinary interaction. #15 lec # 1 Spring

16 Elements of Modern Computers 3 Hardware Resources Processors, memory, and peripheral devices form the hardware core of a computer system. Processor instruction set, processor connectivity, memory organization, influence the system architecture. 4 Operating Systems Manages the allocation of resources to running processes. Mapping to match algorithmic structures with hardware architecture and vice versa: processor scheduling, memory mapping, interprocessor communication. Parallelism exploitation at: algorithm design, program writing, compilation, and run time. #16 lec # 1 Spring

17 Elements of Modern Computers 5 System Software Support Needed for the development of efficient programs in highlevel languages (HLLs.) Assemblers, loaders. Portable parallel programming languages User interfaces and tools. 6 Compiler Support Preprocessor compiler: Sequential compiler and low-level library of the target parallel computer. Precompiler: Some program flow analysis, dependence checking, limited optimizations for parallelism detection. Parallelizing compiler: Can automatically detect parallelism in source code and transform sequential code into parallel constructs. #17 lec # 1 Spring

18 Approaches to Parallel Programming Programmer Programmer Source code written in sequential languages C, C++ FORTRAN, LISP.. Source code written in concurrent dialects of C, C++ FORTRAN, LISP.. Parallelizing compiler Concurrency preserving compiler Parallel object code (a) Implicit Parallelism Concurrent object code (b) Explicit Parallelism Execution by runtime system Execution by runtime system #18 lec # 1 Spring

19 Sequential Scalar I/E Overlap Lookahead Evolution of Computer Architecture Functional Parallelism Multiple Func. Units Pipeline I/E: Instruction Fetch and Execute SIMD: Single Instruction stream over Multiple Data streams MIMD: Multiple Instruction streams over Multiple Data streams Implicit Vector Memory-to -Memory SIMD Explicit Vector Register-to -Register MIMD Associative Processor Processor Array Multicomputer Massively Parallel Processors (MPPs) Multiprocessor

20 Parallel Architectures History Historically, parallel architectures tied to programming models Divergent architectures, with no predictable pattern of growth. Application Software Systolic Arrays Dataflow System Software SIMD Architecture Message Passing Shared Memory #20 lec # 1 Spring

21 Programming Models Programming methodology used in coding applications Specifies communication and synchronization Examples: Multiprogramming: No communication or synchronization at program level Shared memory address space: Message passing: Explicit point to point communication Data parallel: More regimented, global actions on data Implemented with shared address space or message passing #21 lec # 1 Spring

22 Flynn s 1972 Classification of Computer Architecture Single Instruction stream over a Single Data stream (SISD): Conventional sequential machines. Single Instruction stream over Multiple Data streams (SIMD): Vector computers, array of synchronized processing elements. Multiple Instruction streams and a Single Data stream (MISD): Systolic arrays for pipelined execution. Multiple Instruction streams over Multiple Data streams (MIMD): Parallel computers: Shared memory multiprocessors. Multicomputers: Unshared distributed memory, message-passing used instead. #22 lec # 1 Spring

23 Flynn s Classification of Computer Architecture Fig. 1.3 page 12 in Advanced Computer Architecture: Parallelism, Scalability, Programmability, Hwang, #23 lec # 1 Spring

24 Current Trends In Parallel Architectures The extension of computer architecture to support communication and cooperation: OLD: Instruction Set Architecture NEW: Communication Architecture Defines: Critical abstractions, boundaries, and primitives (interfaces) Organizational structures that implement interfaces (hardware or software) Compilers, libraries and OS are important bridges today #24 lec # 1 Spring

25 Modern Parallel Architecture Layered Framework CAD Database Scientific modeling Parallel applications Multiprogramming Shared address Message passing Data parallel Programming models Compilation or library Operating systems support Communication abstraction User/system boundary Communication hardware Hardware/software boundary Physical communication medium #25 lec # 1 Spring

26 Shared Address Space Parallel Architectures Any processor can directly reference any memory location Communication occurs implicitly as result of loads and stores Convenient: Location transparency Similar programming model to time-sharing in uniprocessors Except processes run on different processors Good throughput on multiprogrammed workloads Naturally provided on a wide range of platforms Wide range of scale: few to hundreds of processors Popularly known as shared memory machines or model Ambiguous: Memory may be physically distributed among processors #26 lec # 1 Spring

27 Shared Address Space (SAS) Model Process: virtual address space plus one or more threads of control Portions of address spaces of processes are shared Virtual address spaces for a collection of processes communicating via shared addresses Machine physical address space P n pr i vat e Load P n P 0 P 1 P 2 Common physical addresses St or e Shared portion of address space P 2 pr i vat e Private portion of address space P 1 pr i vat e P 0 pr i vat e Writes to shared address visible to other threads (in other processes too) Natural extension of the uniprocessor model: Conventional memory operations used for communication Special atomic operations needed for synchronization OS uses shared memory to coordinate processes #27 lec # 1 Spring

28 Models of Shared-Memory Multiprocessors The Uniform Memory Access (UMA) Model: The physical memory is shared by all processors. All processors have equal access to all memory addresses. Distributed memory or Nonuniform Memory Access (NUMA) Model: Shared memory is physically distributed locally among processors. The Cache-Only Memory Architecture (COMA) Model: A special case of a NUMA machine where all distributed main memory is converted to caches. No memory hierarchy at each processor. #28 lec # 1 Spring

29 Models of Shared-Memory Multiprocessors Uniform Memory Access (UMA) Model Mem Mem Mem Mem I/O ctrl Interconnect Interconnect I/O ctrl I/O devices Interconnect: Bus, Crossbar, Multistage network P: Processor M: Memory C: Cache D: Cache directory Processor Processor Network Network M $ M $ P P M $ P D C P D C P D C P Distributed memory or Nonuniform Memory Access (NUMA) Model Cache-Only Memory Architecture (COMA) #29 lec # 1 Spring

30 Uniform Memory Access Example: Intel Pentium Pro Quad Interrupt contr oller CPU 256-KB L 2 $ P-Pr o module P-Pr o module P-Pro module Bus interface P-Pr o bus (64-bit data, 36-bit addr ess, 66 MHz) PCI bridge PCI bridge Memory contr oller PCI I/O cards PCI bus PCI bus MIU 1-, 2-, or 4-way interleaved DRAM All coherence and multiprocessing glue in processor module Highly integrated, targeted at high volume Low latency and bandwidth #30 lec # 1 Spring

31 Uniform Memory Access Example: SUN Enterprise P $ P $ CPU/mem cards $ 2 $ 2 Mem ctrl Bus interface/switch Gigaplane bus (256 data, 41 addr ess, 83 MHz) Bus interface I/O car ds 100bT, SCSI SBUS SBUS SBUS 2 FiberChannel 16 cards of either type: processors + memory, or I/O All memory accessed over bus, so symmetric Higher bandwidth, higher latency bus #31 lec # 1 Spring

32 Distributed Shared-Memory Multiprocessor System Example: Cray T3E Exter nal I/O P $ Mem Mem ctrl and NI XY Switch Scale up to 1024 processors, 480MB/s links Memory controller generates communication requests for nonlocal references No hardware mechanism for coherence (SGI Origin etc. provide this) Z #32 lec # 1 Spring

33 Message-Passing Multicomputers Comprised of multiple autonomous computers (nodes). Each node consists of a processor, local memory, attached storage and I/O peripherals. Programming model more removed from basic hardware operations Local memory is only accessible by local processors. A message passing network provides point-to-point static connections among the nodes. Inter-node communication is carried out by message passing through the static connection network Process communication achieved using a message-passing programming environment. #33 lec # 1 Spring

34 Message-Passing Abstraction Match Receive Y, P, t Send X, Q, t Address Y Address X Local pr ocess addr ess space Local pr ocess addr ess space Process P Process Q Send specifies buffer to be transmitted and receiving process Recv specifies sending process and application storage to receive into Memory to memory copy, but need to name processes Optional tag on send and matching rule on receive User process names local data and entities in process/tag space too In simplest form, the send/recv match achieves pairwise synch event Many overheads: copying, buffer management, protection #34 lec # 1 Spring

35 Message-Passing Example: IBM SP-2 Power 2 CPU IBM SP-2 node L 2 $ Memory bus General interconnection network formed from 8-port switches Memory controller 4-way interleaved DRAM Made out of essentially complete RS6000 workstations Network interface integrated in I/O bus (bandwidth limited by I/O bus) MicroChannel bus I/O i860 NIC DMA NI DRAM #35 lec # 1 Spring

36 Message-Passing Example: Intel Paragon i860 L 1 $ i860 L 1 $ Intel Paragon node Memory bus (64-bit, 50 MHz) Mem ctrl DMA Sandia s Intel Paragon XP/S-based Supercomputer 4-way interleaved DRAM Driver NI 2D grid network with processing node attached to every switch 8 bits, 175 MHz, bidirectional #36 lec # 1 Spring

37 Message-Passing Programming Tools Message-passing programming libraries include: Message Passing Interface (MPI): Provides a standard for writing concurrent message-passing programs. MPI implementations include parallel libraries used by existing programming languages. Parallel Virtual Machine (PVM): Enables a collection of heterogeneous computers to used as a coherent and flexible concurrent computational resource. PVM support software executes on each machine in a userconfigurable pool, and provides a computational environment of concurrent applications. User programs written for example in C, Fortran or Java are provided access to PVM through the use of calls to PVM library routines. #37 lec # 1 Spring

38 Data Parallel Systems SIMD in Flynn taxonomy Programming model Operations performed in parallel on each element of data structure Logically single thread of control, performs sequential or parallel steps Control processor Conceptually, a processor is associated with each data element Architectural model Array of many simple, cheap processors each with little memory Processors don t sequence through instructions Attached to a control processor that issues instructions Specialized and general communication, cheap global synchronization Some recent machines: Thinking Machines CM-1, CM-2 (and CM-5) Maspar MP-1 and MP-2, PE PE PE PE PE PE PE PE PE #38 lec # 1 Spring

39 Dataflow Architectures Represent computation as a graph of essential dependences Logical processor at each node, activated by availability of operands Message (tokens) carrying tag of next instruction sent to next processor Tag compared with others in matching store; match fires execution 1 b c e a = (b +1) (b c) d = c e f = a d + d Dataflow graph a Network f Token store Program store Waiting Matching Instruction fetch Execute Form token Network Token queue Network #39 lec # 1 Spring

40 Systolic Architectures Replace single processor with an array of regular processing elements Orchestrate data flow for high throughput with less memory access M M PE PE PE PE Different from pipelining Nonlinear array structure, multidirection data flow, each PE may have (small) local instruction and data memory Different from SIMD: each PE may do something different Initial motivation: VLSI enables inexpensive special-purpose chips Represent algorithms directly by chips connected in regular pattern #40 lec # 1 Spring

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

Parallel Programming Models and Architecture

Parallel Programming Models and Architecture Parallel Programming Models and Architecture CS 740 September 18, 2013 Seth Goldstein Carnegie Mellon University History Historically, parallel architectures tied to programming models Divergent architectures,

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involved A Generic Parallel Computer Architecture The Need And Feasibility of Parallel Computing Scientific

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 arallel Computer Architecture Lecture 2 Architectural erspective Overview Increasingly attractive Economics, technology, architecture, application demand Increasingly central and mainstream arallelism

More information

Uniprocessor Computer Architecture Example: Cray T3E

Uniprocessor Computer Architecture Example: Cray T3E Chapter 2: Computer-System Structures MP Example: Intel Pentium Pro Quad Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure

More information

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System?

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System? Chapter 2: Computer-System Structures Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure of a computer system and understanding

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Three basic multiprocessing issues

Three basic multiprocessing issues Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated

More information

Learning Curve for Parallel Applications. 500 Fastest Computers

Learning Curve for Parallel Applications. 500 Fastest Computers Learning Curve for arallel Applications ABER molecular dynamics simulation program Starting point was vector code for Cray-1 145 FLO on Cray90, 406 for final version on 128-processor aragon, 891 on 128-processor

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involved A Generic Parallel Computer Architecture The Need And Feasibility of Parallel Computing Scientific

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involved A Generic Parallel Computer Architecture The Need And Feasibility of Parallel Computing Scientific

More information

Evolution and Convergence of Parallel Architectures

Evolution and Convergence of Parallel Architectures History Evolution and Convergence of arallel Architectures Historically, parallel architectures tied to programming models Divergent architectures, with no predictable pattern of growth. Todd C. owry CS

More information

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements

More information

NOW Handout Page 1 NO! Today s Goal: CS 258 Parallel Computer Architecture. What will you get out of CS258? Will it be worthwhile?

NOW Handout Page 1 NO! Today s Goal: CS 258 Parallel Computer Architecture. What will you get out of CS258? Will it be worthwhile? Today s Goal: CS 258 Parallel Computer Architecture Introduce you to Parallel Computer Architecture Answer your questions about CS 258 Provide you a sense of the trends that shape the field CS 258, Spring

More information

Conventional Computer Architecture. Abstraction

Conventional Computer Architecture. Abstraction Conventional Computer Architecture Conventional = Sequential or Single Processor Single Processor Abstraction Conventional computer architecture has two aspects: 1 The definition of critical abstraction

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involved A Generic Parallel Computer Architecture The Need And Feasibility of Parallel Computing Scientific

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involved A Generic Parallel Computer Architecture The Need And Feasibility of Parallel Computing Scientific

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involved A Generic Parallel Computer Architecture The Need And Feasibility of Parallel Computing Why? Scientific

More information

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today s CMPs message

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Issues in Multiprocessors

Issues in Multiprocessors Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing explicit sends & receives Which execution model control parallel

More information

Computer Architecture

Computer Architecture Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors

More information

Introduction to Parallel Processing

Introduction to Parallel Processing Introduction to Parallel Processing Parallel Computer Architecture: Definition & Broad issues involved A Generic Parallel Computer Architecture The Need And Feasibility of Parallel Computing Scientific

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

Convergence of Parallel Architectures

Convergence of Parallel Architectures History Historically, parallel architectures tied to programming models Divergent architectures, with no predictable pattern of growth. Systolic Arrays Dataflow Application Software System Software Architecture

More information

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Introduction (Chapter 1)

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Introduction (Chapter 1) CS/ECE 757: Advanced Computer Architecture II (arallel Computer Architecture) Introduction (Chapter 1) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived from work by Sarita

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

What is a parallel computer?

What is a parallel computer? 7.5 credit points Power 2 CPU L 2 $ IBM SP-2 node Instructor: Sally A. McKee General interconnection network formed from 8-port switches Memory bus Memory 4-way interleaved controller DRAM MicroChannel

More information

Advanced Computer Architecture. The Architecture of Parallel Computers

Advanced Computer Architecture. The Architecture of Parallel Computers Advanced Computer Architecture The Architecture of Parallel Computers Computer Systems No Component Can be Treated In Isolation From the Others Application Software Operating System Hardware Architecture

More information

Scalable Distributed Memory Machines

Scalable Distributed Memory Machines Scalable Distributed Memory Machines Goal: Parallel machines that can be scaled to hundreds or thousands of processors. Design Choices: Custom-designed or commodity nodes? Network scalability. Capability

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Module 5 Introduction to Parallel Processing Systems

Module 5 Introduction to Parallel Processing Systems Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

ECE5610/CSC6220 Models of Parallel Computers. Recap: What is Parallel Computer?

ECE5610/CSC6220 Models of Parallel Computers. Recap: What is Parallel Computer? ECE5610/CSC6220 Models of Parallel Computers Professor Cheng-Zhong Xu Department of Electrical/Computer Engineering Wayne State University Recap: What is Parallel Computer? A parallel computer is a collection

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Parallel Processors. Session 1 Introduction

Parallel Processors. Session 1 Introduction Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations

More information

Parallel Architecture Fundamentals

Parallel Architecture Fundamentals arallel Architecture Fundamentals Topics CS 740 September 22, 2003 What is arallel Architecture? Why arallel Architecture? Evolution and Convergence of arallel Architectures Fundamental Design Issues What

More information

Lecture1: Introduction. Administrative info

Lecture1: Introduction. Administrative info Lecture1: Introduction 1 Administrative info Welcome to ECE669! Welcome off-campus students! Csaba Andras Moritz, Associate Professor @ ECE/UMASS Questions/discussions/email questions are welcome! My Focus:

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence

Parallel Computer Architecture Spring Shared Memory Multiprocessors Memory Coherence Parallel Computer Architecture Spring 2018 Shared Memory Multiprocessors Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture

More information

CMSC 611: Advanced. Parallel Systems

CMSC 611: Advanced. Parallel Systems CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems

More information

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor Multiprocessing Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Almasi and Gottlieb, Highly Parallel

More information

Chap. 4 Multiprocessors and Thread-Level Parallelism

Chap. 4 Multiprocessors and Thread-Level Parallelism Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Cache Coherence in Bus-Based Shared Memory Multiprocessors Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition

More information

Why Multiprocessors?

Why Multiprocessors? Why Multiprocessors? Motivation: Go beyond the performance offered by a single processor Without requiring specialized processors Without the complexity of too much multiple issue Opportunity: Software

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

Parallel Computer Architecture

Parallel Computer Architecture Parallel Computer Architecture What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some broad issues: Resource Allocation:»

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

Processor Architecture and Interconnect

Processor Architecture and Interconnect Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Dheeraj Bhardwaj May 12, 2003

Dheeraj Bhardwaj May 12, 2003 HPC Systems and Models Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110 016 India http://www.cse.iitd.ac.in/~dheerajb 1 Sequential Computers Traditional

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

Introduction. What is Parallel Architecture? Why Parallel Architecture? Evolution and Convergence of Parallel Architectures. Fundamental Design Issues

Introduction. What is Parallel Architecture? Why Parallel Architecture? Evolution and Convergence of Parallel Architectures. Fundamental Design Issues What is Parallel Architecture? Why Parallel Architecture? Evolution and Convergence of Parallel Architectures Fundamental Design Issues 2 What is Parallel Architecture? A parallel computer is a collection

More information

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes: BIT 325 PARALLEL PROCESSING ASSESSMENT CA 40% TESTS 30% PRESENTATIONS 10% EXAM 60% CLASS TIME TABLE SYLLUBUS & RECOMMENDED BOOKS Parallel processing Overview Clarification of parallel machines Some General

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Why Parallel Architecture

Why Parallel Architecture Why Parallel Architecture and Programming? Todd C. Mowry 15-418 January 11, 2011 What is Parallel Programming? Software with multiple threads? Multiple threads for: convenience: concurrent programming

More information

What are Clusters? Why Clusters? - a Short History

What are Clusters? Why Clusters? - a Short History What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that

More information

COSC4201 Multiprocessors

COSC4201 Multiprocessors COSC4201 Multiprocessors Prof. Mokhtar Aboelaze Parts of these slides are taken from Notes by Prof. David Patterson (UCB) Multiprocessing We are dedicating all of our future product development to multicore

More information

Lecture 7: Parallel Processing

Lecture 7: Parallel Processing Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Objectives of the Course

Objectives of the Course Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Chapter Seven Morgan Kaufmann Publishers

Chapter Seven Morgan Kaufmann Publishers Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

Parallel Computers. c R. Leduc

Parallel Computers. c R. Leduc Parallel Computers Material based on B. Wilkinson et al., PARALLEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers c 2002-2004 R. Leduc Why Parallel Computing?

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

Intro to Multiprocessors

Intro to Multiprocessors The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple

More information

Dr. Joe Zhang PDC-3: Parallel Platforms

Dr. Joe Zhang PDC-3: Parallel Platforms CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model

More information

Chapter 18. Parallel Processing. Yonsei University

Chapter 18. Parallel Processing. Yonsei University Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Multi-Processor / Parallel Processing

Multi-Processor / Parallel Processing Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information