PARALLEL COMPUTER ARCHITECTURES
|
|
- Benedict Reed
- 5 years ago
- Views:
Transcription
1 8 ARALLEL COMUTER ARCHITECTURES 1
2 CU Shared memory (a) (b) Figure 8-1. (a) A multiprocessor with 16 CUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different CU.
3 M M M M rivate memory CU CU M M M M Messagepassing interconnection network M M Messagepassing interconnection network M M M M M M (a) (b) Figure 8-2. (a) A multicomputer with 16 CUs, each with each own private memory. (b) The bit-map image of Fig. 8-1 split up among the 16 memories.
4 Machine 1 Machine 2 Machine 1 Machine 2 Machine 1 Machine 2 Application Application Application Application Application Application Language run-time system Language run-time system Language run-time system Language run-time system Language run-time system Language run-time system Operating system Operating system Operating system Operating system Operating system Operating system Hardware Hardware Hardware Hardware Hardware Hardware Shared memory (a) Shared memory (b) Shared memory (c) Figure 8-3. Various layers where shared memory can be implemented. (a) The hardware. (b) The operating system. (c) The language runtime system.
5 (a) (b) (c) (d) (e) (f) (g) (h) Figure 8-4. Various topologies. The heavy dots represent switches. The CUs and memories are not shown. (a) A star. (b) A complete interconnect. (c) A tree. (d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube.
6 CU 1 A Input port Output port B End of packet C D Four-port switch Middle of packet CU 2 Front of packet Figure 8-5. An interconnection network in the form of a fourswitch square grid. Only two of the CUs are shown.
7 CU 1 Four-port switch Input port Output port A B A B A B Entire packet C D C D C D CU 2 Entire packet Entire packet (a) (b) (c) Figure 8-6. Store-and-forward packet switching.
8 CU 1 A B CU 2,CU 4 CU 3 C D Four-port switch Input port Output buffer Figure 8-7. Deadlock in a circuit-switched interconnection network.
9 60 50 Linear speedup N-body problem 40 Speedup 30 Awari Number of CUs Skyline matrix inversion Figure 8-8. Real programs achieve less than the perfect speedup indicated by the dotted line.
10 n CUs active Inherently sequential part otentially parallelizable part 1 CU active f 1 f f 1 f T ft (1 f)t/n (a) (b) Figure 8-9. (a) A program has a sequential part and a parallelizable part. (b) Effect of running part of the program in parallel.
11 CU Bus (a) (b) (c) (d) Figure (a) A 4-CU bus-based system. (b) A 16-CU bus-based system. (c) A 4-CU grid-based system. (d) A 16- CU grid-based system.
12 Work queue 1 2 Synchronization point Synchronization point 7 8 rocess 9 (a) (b) (c) (d) Figure Computational paradigms. (a) ipeline. (b) hased computation. (c) Divide and conquer. (d) Replicated worker.
13 hysical Logical (hardware) (software) Examples Multiprocessor Shared variables Image processing as in Fig. 8-1 Multiprocessor Message passing Message passing simulated with buffers in memory Multicomputer Shared variables DSM, Linda, Orca, etc. on an S/2 or a C network Multicomputer Message passing VM or MI on an S/2 or a network of Cs Figure Combinations of physical and logical sharing.
14 Instruction streams Data streams Name Examples 1 1 SISD Classical Von Neumann machine 1 Multiple SIMD Vector supercomputer, array processor Multiple 1 MISD Arguably none Multiple Multiple MIMD Multiprocessor, multicomputer Figure Flynn s taxonomy of parallel computers.
15 arallel computer architectures SISD SIMD MISD MIMD (Von Neumann)? Vector processor Array processor Multiprocessors Multicomputers UMA COMA NUMA M COW Bus Switched CC-NUMA NC-NUMA Grid Hypercube Shared memory Message passing Figure A taxonomy of parallel computers.
16 Input vectors Vector ALU Figure A vector ALU.
17 Operation Examples A i = f 1 (B i ) f 1 = cosine, square root Scalar = f 2 (A) f 2 = sum, minimum A i = f 3 (B i, C i ) f 3 = add, subtract A i = f 4 (scalar,b i ) f 4 = multiply B i by a constant Figure Various combinations of vector and scalar operations.
18 Step Name Values 1 Fetch operands Adjust exponent Execute subtraction Normalize result Figure Steps in a floating-point subtraction.
19 Cycle Step Fetch operands B 1,C 1 B 2,C 2 B 3,C 3 B 4,C 4 B 5,C 5 B 6,C 6 B 7,C 7 Adjust exponent B 1,C 1 B 2,C 2 B 3,C 3 B 4,C 4 B 5,C 5 B 6,C 6 Execute operation B 1 + C 1 B 2 + C 2 B 3 + C 3 B 4 + C 4 B 5 + C 5 Normalize result B 1 + C 1 B 2 + C 2 B 3 + C 3 B 4 + C 4 Figure A pipelined floating-point adder.
20 A B S T 8 24-Bit address registers Bit holding registers for addresses 8 64-Bit scalar registers Bit holding registers for scalars 64 Elements per register 8 64-Bit vector registers ADD ADD ADD ADD MUL Address units BOOLEAN SHIFT MUL RECI. BOOLEAN SHIFT O. COUNT Scalar integer units Scalar/vector floatng-point units Vector integer units Figure Registers and functional units of the Cray-1
21 1 CU Write Write 200 Read 2x x 3 W100 W200 R3 = 200 R3 = 200 R4 = 200 W100 R3 = 100 W200 R4 = 200 R3 = 200 W200 R4 = 200 W100 R3 = 100 R4 = 100 Read 2x R4 = 200 R4 = 200 R3 = (a) (b) (c) (d) Figure (a) Two CUs writing and two CUs reading a common memory word. (b) - (d) Three possible ways the two writes and four reads might be interleaved in time.
22 Write CU A 1A 1B 1C 1D 1E 1F CU B 2A 2B 2C 2D CU C 3A 3B 3C Synchronization point Time Figure Weakly consistent memory uses synchronization operations to divide time into sequential epochs.
23 Shared memory rivate memory Shared memory CU CU M CU CU M CU CU M Cache Bus (a) (b) (c) Figure Three bus-based multiprocessors. (a) Without caching. (b) With caching. (c) With caching and private memories.
24 Action Local request Remote request Read miss Fetch data from memory Read hit Use data from local cache Write miss Update data in memory Write hit Update cache and memory Invalidate cache entry Figure The write through cache coherence protocol. The empty boxes indicate that no action is taken.
25 (a) CU 1 CU 2 CU 3 A Memory CU 1 reads block A Exclusive Cache Bus (b) CU 1 CU 2 CU 3 A Memory CU 2 reads block A Shared Shared Bus (c) CU 1 CU 2 CU 3 A Memory CU 2 writes block A Modified Bus (d) CU 1 CU 2 CU 3 Memory A A Shared Shared Bus CU 3 reads block A (e) CU 1 CU 2 CU 3 A Memory CU 2 writes block A Modified Bus (f) CU 1 CU 2 CU 3 A Memory CU 1 writes block A Modified Bus Figure The MESI cache coherence protocol.
26 Memories Crosspoint switch is open (b) CUs Crosspoint switch is closed Closed crosspoint switch Open crosspoint switch (c) (a) Figure (a) An 8 8 crossbar crosspoint. (c) A closed crosspoint. switch. (b) An open
27 16 16 Crossbar switch (Gigaplane-XB) Transfer unit is 64-byte cache block Board had 4 GB + 4 CUs Four address buses for snooping UltraSARC CU GB memory module Figure The Sun Enterprise symmetric multiprocessor.
28 A B X Y Module Address Opcode Value (a) (b) Figure (a) A 2 2 switch. (b) A message format.
29 3 Stages CUs Memories b 1A 2A b 3A b B 1C b 2B 2C 3B 3C a a D a 2D a 3D 111 Figure An omega switching network.
30 CU Memory CU Memory CU Memory CU Memory MMU Local bus Local bus Local bus Local bus System bus Figure A NUMA machine based on two levels of buses. The Cm* was the first multiprocessor to use this design.
31 Directory Node 0 Node 1 Node 255 CU Memory Local bus CU Memory Local bus CU Memory Local bus Interconnection network (a) Bits Node Block Offset (b) (c) Figure (a) A 256-node directory-based multiprocessor. (b) Division of a 32-bit memory address into fields. (c) The directory at node 36.
32 CU with cache Intercluster interface Memory Intercluster bus (nonsnooping) 0 D D D D D D D D D D D D D D D D Local bus (snooping) Cluster Directory (a) Cluster Block State This is the directory for cluster 13. This bit tells whether cluster 0 has block 1 of the memory homed here in any of its caches Uncached, shared, modified (b) Figure (a) The DASH architecture. (b) A DASH directory.
33 Quad board with 4 entium ros and up to 4 GB of RAM Snooping bus interface Directory controller 32-MB cache RAM Directory Data pump IQ board SCI ring RAM CU Figure The NUMA-Q multiprocessor.
34 Local memory table at home node Bits Back State Tag Fwd Back State Tag Fwd Back State Tag Fwd 0 Node 4 cache directory Node 9 cache directory Node 22 cache directory Figure SCI chains all the holders of a given cache line together in a doubly-linked list. In this example, a line is shown cached at three nodes.
35 CU Memory Node Local interconnect Disk and I/O Local interconnect Disk and I/O Communication processor High-performance interconnection network Figure A generic multicomputer.
36 Network Disk Tape GigaRing Alpha Mem Alpha Mem Alpha Mem Shell Control + E registers Control + E registers Control + E registers Node Commun. processor Commun. processor Commun. processor Full-duplex 3D torus Figure The Cray Research T3E.
37 64-Bit local bus Kestrel board 38 ro ro 64 MB I/O NIC 32 ro ro 64 MB I/O NIC 2 64-Bit local bus (a) (b) Figure The Intel/Sandia Option Red system. (a) The kestrel board. (b) The interconnection network.
38 CU group CU group CU group Time (a) (b) (c) Figure Scheduling a COW. (a) FIFO. (b) Without headof-line blocking. (c) Tiling. The shaded areas indicate idle CUs.
39 CU CU CU Backplane acket going east (a) acket going west Line card Ethernet (b) Switch Figure (a) Three computers on an Ethernet. (b) An Ethernet switch.
40 CU Cell 5 6 acket 7 8 ort Virtual circuit ATM switch Figure Sixteen CUs connected by four ATM switches. Two virtual circuits are shown.
41 Globally shared virtual memory consisting of 16 pages Memory CU 0 CU 1 CU 2 CU 3 (a) Network CU 0 CU 1 CU 2 CU 3 (b) CU 0 CU 1 CU 2 CU 3 (c) Figure A virtual address space consisting of 16 pages spread over four nodes of a multicomputer. (a) The initial situation. (b) After CU 0 references page 10. (c) After CU 1 references page 10, here assumed to be a read-only page.
42 ( abc, 2,5) ( matrix-1, 1, 6, 3.14) ( family, is sister, Carolyn, Elinor) Figure Three Linda tuples.
43 Object implementation stack; top:integer; # storage for the stack stack: array [integer 0..N-1] of integer; operation push(item: integer); function returning nothing begin stack[top] := item; push item onto the stack top := top + 1; # increment the stack pointer end; operation pop( ): integer; # function returning an integer begin guard top > 0 do # suspend if the stack is empty top := top - 1; # decrement the stack pointer return stack[top]; # return the top item od; end; begin top := 0; end; # initialization Figure A simplified ORCA stack object, with internal data and two operations.
Computer Architecture
Computer Architecture Chapter 7 Parallel Processing 1 Parallelism Instruction-level parallelism (Ch.6) pipeline superscalar latency issues hazards Processor-level parallelism (Ch.7) array/vector of processors
More informationDr. Joe Zhang PDC-3: Parallel Platforms
CSC630/CSC730: arallel & Distributed Computing arallel Computing latforms Chapter 2 (2.3) 1 Content Communication models of Logical organization (a programmer s view) Control structure Communication model
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationMultiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)
Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed
More informationCPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner
CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that
More information10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems
1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase
More informationComputer parallelism Flynn s categories
04 Multi-processors 04.01-04.02 Taxonomy and communication Parallelism Taxonomy Communication alessandro bogliolo isti information science and technology institute 1/9 Computer parallelism Flynn s categories
More informationChapter 11. Introduction to Multiprocessors
Chapter 11 Introduction to Multiprocessors 11.1 Introduction A multiple processor system consists of two or more processors that are connected in a manner that allows them to share the simultaneous (parallel)
More informationParallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Parallel Computer Architectures Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Outline Flynn s Taxonomy Classification of Parallel Computers Based on Architectures Flynn s Taxonomy Based on notions of
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationNormal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory
Parallel Machine 1 CPU Usage Normal computer 1 CPU & 1 memory The problem of Von Neumann Bottleneck: Slow processing because the CPU faster than memory Solution Use multiple CPUs or multiple ALUs For simultaneous
More informationModule 5 Introduction to Parallel Processing Systems
Module 5 Introduction to Parallel Processing Systems 1. What is the difference between pipelining and parallelism? In general, parallelism is simply multiple operations being done at the same time.this
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationParallel Programming Platforms
arallel rogramming latforms Ananth Grama Computing Research Institute and Department of Computer Sciences, urdue University ayg@cspurdueedu http://wwwcspurdueedu/people/ayg Reference: Introduction to arallel
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationMultiprocessors - Flynn s Taxonomy (1966)
Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The
More informationEmbedded Systems Architecture. Multiprocessor and Multicomputer Systems
Embedded Systems Architecture Multiprocessor and Multicomputer Systems M. Eng. Mariusz Rudnicki 1/57 Multiprocessor Systems 2/57 Multiprocessor Systems 3/57 Multiprocessor Systems 4/57 UMA SMP Architectures
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationChapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348
Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationAdvanced Parallel Architecture. Annalisa Massini /2017
Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationCPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport
CPS 303 High Performance Computing Wensheng Shen Department of Computational Science SUNY Brockport Chapter 2: Architecture of Parallel Computers Hardware Software 2.1.1 Flynn s taxonomy Single-instruction
More informationThree basic multiprocessing issues
Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationIntroduction to parallel computing
Introduction to parallel computing 2. Parallel Hardware Zhiao Shi (modifications by Will French) Advanced Computing Center for Education & Research Vanderbilt University Motherboard Processor https://sites.google.com/
More informationUnit 9 : Fundamentals of Parallel Processing
Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing
More informationPipeline and Vector Processing 1. Parallel Processing SISD SIMD MISD & MIMD
Pipeline and Vector Processing 1. Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationLecture 2. Memory locality optimizations Address space organization
Lecture 2 Memory locality optimizations Address space organization Announcements Office hours in EBU3B Room 3244 Mondays 3.00 to 4.00pm; Thurs 2:00pm-3:30pm Partners XSED Portal accounts Log in to Lilliput
More informationMulti-Processor / Parallel Processing
Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms
More informationCopyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 2 Parallel Hardware and Parallel Software 1 Roadmap Some background Modifications to the von Neumann model Parallel hardware Parallel software
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationFLYNN S TAXONOMY OF COMPUTER ARCHITECTURE
FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE The most popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn s classification scheme is based on the notion of a stream of information.
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationTypes of Parallel Computers
slides1-22 Two principal types: Types of Parallel Computers Shared memory multiprocessor Distributed memory multicomputer slides1-23 Shared Memory Multiprocessor Conventional Computer slides1-24 Consists
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationObjectives of the Course
Objectives of the Course Parallel Systems: Understanding the current state-of-the-art in parallel programming technology Getting familiar with existing algorithms for number of application areas Distributed
More informationPerformance study example ( 5.3) Performance study example
erformance study example ( 5.3) Coherence misses: - True sharing misses - Write to a shared block - ead an invalid block - False sharing misses - ead an unmodified word in an invalidated block CI for commercial
More informationPortland State University ECE 588/688. Cray-1 and Cray T3E
Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector
More informationJNTUWORLD. 1. Discuss in detail inter processor arbitration logics and procedures with necessary diagrams? [15]
Code No: 09A50402 R09 Set No. 2 1. Discuss in detail inter processor arbitration logics and procedures with necessary diagrams? [15] 2. (a) Discuss asynchronous serial transfer concept? (b) Explain in
More informationINTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I
UNIT-I 1. List and explain the functional units of a computer with a neat diagram 2. Explain the computer levels of programming languages 3. a) Explain about instruction formats b) Evaluate the arithmetic
More informationComputer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1
Pipelining and Vector Processing Parallel Processing: The term parallel processing indicates that the system is able to perform several operations in a single time. Now we will elaborate the scenario,
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationParallel Numerics, WT 2013/ Introduction
Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature
More informationPIPELINE AND VECTOR PROCESSING
PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates
More informationChapter 18 Parallel Processing
Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationCSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing
Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationCrossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)
Chapter 2: Concepts and Architectures Computer System Architectures Disk(s) CPU I/O Memory Traditional Computer Architecture Flynn, 1966+1972 classification of computer systems in terms of instruction
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationLearning Curve for Parallel Applications. 500 Fastest Computers
Learning Curve for arallel Applications ABER molecular dynamics simulation program Starting point was vector code for Cray-1 145 FLO on Cray90, 406 for final version on 128-processor aragon, 891 on 128-processor
More informationIntro to Multiprocessors
The Big Picture: Where are We Now? Intro to Multiprocessors Output Output Datapath Input Input Datapath [dapted from Computer Organization and Design, Patterson & Hennessy, 2005] Multiprocessor multiple
More informationDHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering
DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY Department of Computer science and engineering Year :II year CS6303 COMPUTER ARCHITECTURE Question Bank UNIT-1OVERVIEW AND INSTRUCTIONS PART-B
More informationParallel Architecture, Software And Performance
Parallel Architecture, Software And Performance UCSB CS240A, T. Yang, 2016 Roadmap Parallel architectures for high performance computing Shared memory architecture with cache coherence Performance evaluation
More informationChapter 17 - Parallel Processing
Chapter 17 - Parallel Processing Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 17 - Parallel Processing 1 / 71 Table of Contents I 1 Motivation 2 Parallel Processing Categories
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More informationRISC Processors and Parallel Processing. Section and 3.3.6
RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More informationEmbedded Systems Architecture. Computer Architectures
Embedded Systems Architecture Computer Architectures M. Eng. Mariusz Rudnicki 1/18 A taxonomy of computer architectures There are many different types of architectures, and it is worth considering some
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationanced computer architecture CONTENTS AND THE TASK OF THE COMPUTER DESIGNER The Task of the Computer Designer
Contents advanced anced computer architecture i FOR m.tech (jntu - hyderabad & kakinada) i year i semester (COMMON TO ECE, DECE, DECS, VLSI & EMBEDDED SYSTEMS) CONTENTS UNIT - I [CH. H. - 1] ] [FUNDAMENTALS
More informationChap. 4 Multiprocessors and Thread-Level Parallelism
Chap. 4 Multiprocessors and Thread-Level Parallelism Uniprocessor performance Performance (vs. VAX-11/780) 10000 1000 100 10 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach,
More informationPerformance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.
Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution
More informationParallel Architectures
Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between
More informationTaxonomy of Parallel Computers, Models for Parallel Computers. Levels of Parallelism
Taxonomy of Parallel Computers, Models for Parallel Computers Reference : C. Xavier and S. S. Iyengar, Introduction to Parallel Algorithms 1 Levels of Parallelism Parallelism can be achieved at different
More informationA Multiprocessor system generally means that more than one instruction stream is being executed in parallel.
Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationChapter 18. Parallel Processing. Yonsei University
Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types
More informationCS4961 Parallel Programming. Lecture 4: Memory Systems and Interconnects 9/1/11. Administrative. Mary Hall September 1, Homework 2, cont.
CS4961 Parallel Programming Lecture 4: Memory Systems and Interconnects Administrative Nikhil office hours: - Monday, 2-3PM - Lab hours on Tuesday afternoons during programming assignments First homework
More informationCS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.
CS 265 Computer Architecture Wei Lu, Ph.D., P.Eng. CS 265 Midterm #1 Monday, Oct 18, 12:00pm-1:45pm, SCI 163 Questions on essential terms and concepts of Computer Architecture Mathematical questions on
More information2. Parallel Architectures
2. Parallel Architectures 2.1 Objectives To introduce the principles and classification of parallel architectures. To discuss various forms of parallel processing. To explore the characteristics of parallel
More informationCSE 392/CS 378: High-performance Computing - Principles and Practice
CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer
More informationDheeraj Bhardwaj May 12, 2003
HPC Systems and Models Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110 016 India http://www.cse.iitd.ac.in/~dheerajb 1 Sequential Computers Traditional
More informationMultiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.
Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network
More informationParallel Computing Introduction
Parallel Computing Introduction Bedřich Beneš, Ph.D. Associate Professor Department of Computer Graphics Purdue University von Neumann computer architecture CPU Hard disk Network Bus Memory GPU I/O devices
More informationUNIT I (Two Marks Questions & Answers)
UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationComputer Organization. Chapter 16
William Stallings Computer Organization and Architecture t Chapter 16 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data
More informationNumber of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization
Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements
More information