NOW Handout Page 1. Recap: Gigaplane Bus Timing. Scalability

Size: px
Start display at page:

Download "NOW Handout Page 1. Recap: Gigaplane Bus Timing. Scalability"

Transcription

1 Recap: Gigaplane Bus Timing Address Rd A Rd B Scalability State Arbitration 1 4,5 2 Share ~Own 6 Own 7 A D A D A D A D A D A D A D A D CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Status Data OK D 0 D 1 Cancel 3/3/99 CS258 S99 2 Enterprise rocessor and emory System 2 procs per board, external L2 caches, 2 mem banks with x-bar Data lines buffered through UDB to drive internal 1.3 GB/s UA bus Wide path to memory so full 64-byte line in 1 mem cycle (2 bus cyc) Addr controller adapts proc and bus protocols, does cache coherence its tags keep a subset of states needed by bus (e.g. no /E distinction) emory (16 72-bit SIS) L 2 UDB s UltraSparc L 2 UDB s UltraSparc FiberChannel module (2) SBUS 25 Hz SysIO SBUS slots SysIO 10/100 Ethernet 64 Fast wide SCSI Enterprise I/O System I/O board has same bus interface ASICs as processor boards But internal bus half as wide, and no memory path Only cache block sized transactions, like processing boards Uniformity simplifies design ASICs implement single-block cache, follows coherence protocol Two independent 64-bit, 25 Hz Sbuses One for two dedicated FiberChannel modules connected to disk One for Ethernet and fast wide SCSI Can also support three SBUS interface cards for arbitrary peripherals erformance and cost of I/O scale with no. of I/O boards D-tags Address controller Address controller Data controller (crossbar) Data controller (crossbar) Data 288 Control Address Data 288 Control Address 3/3/99 CS258 S99 3 Gigaplane connector Gigaplane connector 3/3/99 CS258 S99 4 Limited Scaling of a Bus Workstations in a LAN? Characteristic hysical Length Number of Connections aximum Bandwidth Interface to Comm. medium Global Order rotection Trust OS comm. abstraction Bus ~ 1 ft fixed fixed memory inf arbitration Virt -> physical total single HW Bus: each level of the system design is grounded in the scaling limits at the layers below and assumptions of close coupling between components 3/3/99 CS258 S99 5 Characteristic Bus LAN hysical Length ~ 1 ft K Number of Connections fixed many aximum Bandwidth fixed??? Interface to Comm. medium memory inf peripheral Global Order arbitration??? rotection Virt -> physical OS Trust total none OS single independent comm. abstraction HW SW No clear limit to physical scaling, little trust, no global order, consensus difficult to achieve. Independent failure and restart 3/3/99 CS258 S99 6 NOW Handout age 1 CS258 S99 1

2 Scalable achines What are the design trade-offs for the spectrum of machines between? specialize or commodity nodes? capability of node-to-network interface supporting programming models? Bandwidth Scalability S S S S Typical switches Bus Crossbar What does scalability mean? avoids inherent design limits on resources bandwidth increases with latency does not cost increases slowly with What fundamentally limits bandwidth? single set of wires ust have many independent wires Connect modules through switches Bus vs Network? ultiplexers 3/3/99 CS258 S99 7 3/3/99 CS258 S99 8 Dancehall Organization Generic Distributed emory Org. Scalable network Scalable network CA Network bandwidth? Bandwidth demand? independent processes? communicating processes? Latency? 3/3/99 CS258 S99 9 Network bandwidth? Bandwidth demand? independent processes? communicating processes? Latency? 3/3/99 CS258 S99 10 Key roperty Large number of independent communication paths between nodes => allow a large number of concurrent transactions using different wires initiated independently no global arbitration effect of a transaction only visible to the nodes involved effects propagated through additional transactions Latency Scaling T(n) = Overhead + Channel + Routing Delay Overhead? Channel (n) = n/b --- BW at bottleneck RoutingDelay(h,n) 3/3/99 CS258 S /3/99 CS258 S99 12 NOW Handout age 2 CS258 S99 2

3 Typical example max distance: log n number of switches: α n log n overhead = 1 us, BW = 64 B/s, 200 ns per hop ipelined T 64 (128) = 1.0 us us + 6 hops * 0.2 us/hop = 4.2 us T 1024 (128) = 1.0 us us + 10 hops * 0.2 us/hop = 5.0 us Store and Forward T 64 sf (128) = 1.0 us + 6 hops * ( ) us/hop = 14.2 us T 64 sf (1024) = 1.0 us + 10 hops * ( ) us/hop = 23 us Cost Scaling cost(p,m) = fixed cost + incremental cost (p,m) Bus Based S? Ratio of processors : memory : network : I/O? arallel efficiency(p) = Speedup() / Costup(p) = Cost(p) / Cost(1) Cost-effective: speedup(p) > costup(p) Is super-linear speedup 3/3/99 CS258 S /3/99 CS258 S99 14 Cost Effective? hysical Scaling Speedup = /(1+ log) Costup = Chip-level integration Board-level System level rocessors 2048 processors: 475 fold speedup at 206x cost 3/3/99 CS258 S /3/99 CS258 S99 16 ncube/2 achine Organization C-5 achine Organization Basic module Diagnostics network Control network Data network 1024 Nodes rocessing partition rocessing Control I/O partition partition processors interface U I-Fetch Operand & decode DA channels Router Hypercube network configuration SARC FU Data Control networks network Execution unit 64-bit integer IEEE floating point Single-chip node SRA NI BUS Vector unit Vector unit Entire machine synchronous at 40 Hz 3/3/99 CS258 S /3/99 CS258 S99 18 NOW Handout age 3 CS258 S99 3

4 System Level Integration Realizing rogramming odels CAD Database Scientific modeling arallel applications General interconnection network formed from 8-port switches ower 2 CU IB S-2 node L 2 emory bus emory 4-way interleaved controller icrochannel bus NIC I/O DA i860 NI ultiprogramming Shared address essage passing Data parallel rogramming models Compilation or library Communication abstraction User/system boundary Operating systems support Communication hardware hysical communication medium Hardware/software boundary 3/3/99 CS258 S /3/99 CS258 S99 20 Network Transaction rimitive Output buffer node Communication network Serialized data packet Input buffer node one-way transfer of information from a source output buffer to a dest. input buffer causes some action at the destination occurrence is not directly visible at source deposit data, state change, reply 3/3/99 CS258 S99 21 Bus Transactions vs Net Transactions Issues: protection check V->?? format wires flexible output buffering reg, FIFO?? media arbitration global local destination naming and routing input buffering limited many source action completion detection 3/3/99 CS258 S99 22 Shared Address Space Abstraction Consistency is challenging A=1; flag=1; while (flag==0); print A; (1) Initiate memory access (2) Address translation Load r [Global address] (4) Request transaction Read request emory emory emory A:0 flag:0->1 (5) Remote memory access Wait Read request emory access Delay 1: A=1 3: load A (6) Reply transaction Read response Read response (a) 2: flag=1 Interconnection network (7) Complete memory access 2 3 fixed format, request/response, simple action (b) 1 Congested path 3/3/99 CS258 S /3/99 CS258 S99 24 NOW Handout age 4 CS258 S99 4

5 Synchronous essage assing Asynch. essage assing: Optimistic (1) Initiate send (2) Address translation on src Send dest, local VA, len Recv src, local VA, len (1) Initiate send (2) Address translation Send ( dest, local VA, len) (4) Send-ready request Send-rdy req (4) Send data (5) Remote check for posted receive (assume success) Wait check (5) Remote check for posted receive; on fail, allocate data buffer Data-xfer req match Allocate buffer (6) Reply transaction Recv-rdy reply (7) Bulk data transfer VA Dest VA or ID Recv src, local VA, len Data-xfer req Storage??? 3/3/99 CS258 S /3/99 CS258 S99 26 Asynch. sg assing: Conservative Active essages (1) Initiate send (2) Address translation on dest (4) Send-ready request (5) Remote check for posted receive (assume fail); record send-ready Send dest, local VA, len Send-rdy req Return and compute check Request handler handler Reply (6) Receive-ready request Recv src, local VA, len (7) Bulk data reply VA Dest VA or ID Recv-rdy req Data-xfer reply User-level analog of network transaction Action is small user function Request/Reply ay also perform memory-to-memory transfer 3/3/99 CS258 S /3/99 CS258 S99 28 Common Challenges Input buffer overflow N-1 queue over-commitment => must slow sources reserve space per source (credit)» when available for reuse? Ack or Higher level Refuse input when full» backpressure in reliable network» tree saturation» deadlock free» what happens to traffic not bound for congested dest? Reserve ack back channel drop packets 3/3/99 CS258 S99 29 Challenges (cont) Fetch Deadlock For network to remain deadlock free, nodes must continue accepting messages, even when cannot source them what if incoming transaction is a request?» Each may generate a response, which cannot be sent!» What happens when internal buffering is full? logically independent request/reply networks physical networks virtual channels with separate input/output queues bound requests and reserve input buffer space K(-1) requests + K responses per node service discipline to avoid fetch deadlock? NACK on input buffer full NACK delivery? 3/3/99 CS258 S99 30 NOW Handout age 5 CS258 S99 5

6 Summary Scalability physical, bandwidth, latency and cost level of integration Realizing rogramming odels network transactions protocols safety» N-1» fetch deadlock Next: Communication Architecture Design Space how much hardware interpretation of the network transaction? 3/3/99 CS258 S99 31 NOW Handout age 6 CS258 S99 6

Outline. Limited Scaling of a Bus

Outline. Limited Scaling of a Bus Outline Scalability physical, bandwidth, latency and cost level of integration Realizing rogramming Models network transactions protocols safety input buffer problem: N-1 fetch deadlock Communication Architecture

More information

Architecture or Parallel Computers CSC / ECE 506

Architecture or Parallel Computers CSC / ECE 506 Architecture or Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models 6/19/2006 Dr Steve Hunter Back to Basics Parallel Architecture = Computer Architecture + Communication Architecture

More information

Scalable Multiprocessors

Scalable Multiprocessors arallel Computer Organization and Design : Lecture 7 er Stenström. 2008, Sally A. ckee 2009 Scalable ultiprocessors What is a scalable design? (7.1) Realizing programming models (7.2) Scalable communication

More information

Scalable Distributed Memory Machines

Scalable Distributed Memory Machines Scalable Distributed Memory Machines Goal: Parallel machines that can be scaled to hundreds or thousands of processors. Design Choices: Custom-designed or commodity nodes? Network scalability. Capability

More information

Under the Hood, Part 1: Implementing Message Passing

Under the Hood, Part 1: Implementing Message Passing Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Today s Theme 2 Message passing model (abstraction) Threads

More information

Learning Curve for Parallel Applications. 500 Fastest Computers

Learning Curve for Parallel Applications. 500 Fastest Computers Learning Curve for arallel Applications ABER molecular dynamics simulation program Starting point was vector code for Cray-1 145 FLO on Cray90, 406 for final version on 128-processor aragon, 891 on 128-processor

More information

[ 7.2.5] Certain challenges arise in realizing SAS or messagepassing programming models. Two of these are input-buffer overflow and fetch deadlock.

[ 7.2.5] Certain challenges arise in realizing SAS or messagepassing programming models. Two of these are input-buffer overflow and fetch deadlock. Buffering roblems [ 7.2.5] Certain challenges arise in realizing SAS or messagepassing programming models. Two of these are input-buffer overflow and fetch deadlock. Input-buffer overflow Suppose a large

More information

Evolution and Convergence of Parallel Architectures

Evolution and Convergence of Parallel Architectures History Evolution and Convergence of arallel Architectures Historically, parallel architectures tied to programming models Divergent architectures, with no predictable pattern of growth. Todd C. owry CS

More information

Snoop-Based Multiprocessor Design III: Case Studies

Snoop-Based Multiprocessor Design III: Case Studies Snoop-Based Multiprocessor Design III: Case Studies Todd C. Mowry CS 41 March, Case Studies of Bus-based Machines SGI Challenge, with Powerpath SUN Enterprise, with Gigaplane Take very different positions

More information

Three basic multiprocessing issues

Three basic multiprocessing issues Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated

More information

SGI Challenge Overview

SGI Challenge Overview CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 (Case Studies) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 arallel Computer Architecture Lecture 2 Architectural erspective Overview Increasingly attractive Economics, technology, architecture, application demand Increasingly central and mainstream arallelism

More information

Limitations of Memory System Performance

Limitations of Memory System Performance Slides taken from arallel Computing latforms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar! " To accompany the text ``Introduction to arallel Computing'', Addison Wesley, 2003. Limitations

More information

Under the Hood, Part 1: Implementing Message Passing

Under the Hood, Part 1: Implementing Message Passing Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Today s Theme Message passing model (abstraction) Threads operate within

More information

The Sun Fireplane Interconnect in the Mid- Range Sun Fire Servers

The Sun Fireplane Interconnect in the Mid- Range Sun Fire Servers TAK IT TO TH NTH Alan Charlesworth icrosystems The Fireplane Interconnect in the id- Range Fire Servers Vertical & Horizontal Scaling any CUs in one box Cache-coherent shared memory (S) Usually proprietary

More information

NOW Handout Page 1. Recap: Performance Trade-offs. Shared Memory Multiprocessors. Uniprocessor View. Recap (cont) What is a Multiprocessor?

NOW Handout Page 1. Recap: Performance Trade-offs. Shared Memory Multiprocessors. Uniprocessor View. Recap (cont) What is a Multiprocessor? ecap: erformance Trade-offs Shared ory Multiprocessors CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley rogrammer s View of erformance Speedup < Sequential Work Max (Work + Synch

More information

Parallel Programming Platforms

Parallel Programming Platforms arallel rogramming latforms Ananth Grama Computing Research Institute and Department of Computer Sciences, urdue University ayg@cspurdueedu http://wwwcspurdueedu/people/ayg Reference: Introduction to arallel

More information

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Cray XE6 Performance Workshop

Cray XE6 Performance Workshop Cray XE6 erformance Workshop odern HC Architectures David Henty d.henty@epcc.ed.ac.uk ECC, University of Edinburgh Overview Components History Flynn s Taxonomy SID ID Classification via emory Distributed

More information

Multiprocessor Interconnection Networks

Multiprocessor Interconnection Networks Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 19, 1998 Topics Network design space Contention Active messages Networks Design Options: Topology Routing Direct vs. Indirect Physical

More information

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines

More information

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Introduction (Chapter 1)

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Introduction (Chapter 1) CS/ECE 757: Advanced Computer Architecture II (arallel Computer Architecture) Introduction (Chapter 1) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived from work by Sarita

More information

PARALLEL COMPUTER ARCHITECTURES

PARALLEL COMPUTER ARCHITECTURES 8 ARALLEL COMUTER ARCHITECTURES 1 CU Shared memory (a) (b) Figure 8-1. (a) A multiprocessor with 16 CUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

Parallel Programming Models and Architecture

Parallel Programming Models and Architecture Parallel Programming Models and Architecture CS 740 September 18, 2013 Seth Goldstein Carnegie Mellon University History Historically, parallel architectures tied to programming models Divergent architectures,

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

NOW Handout Page 1. Natural Extensions of Memory System. Distributed Memory Multiprocessors. Fundamental Issues. Fundamental Issue #1: Naming

NOW Handout Page 1. Natural Extensions of Memory System. Distributed Memory Multiprocessors. Fundamental Issues. Fundamental Issue #1: Naming NOW Handout age 1 Natural xtensions of emory ystem 1 witch n cale (nterleaved) First-level 1 n istributed emory ultiprocessors (nterleaved) ain memory nterconnection network 252, pring 2005 avid. uller

More information

Scalable Cache Coherence

Scalable Cache Coherence arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy

More information

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) UC Berkeley Fall 2010 Unit-Transaction Level

More information

Approaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures

Approaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures Approaches to Building arallel achines Switch/Bus n Scale Shared ory Architectures (nterleaved) First-level (nterleaved) ain memory n Arvind Krishnamurthy Fall 2004 (nterleaved) ain memory Shared Cache

More information

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels

More information

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!! Garcia UCB!

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!!  Garcia UCB! cs10.berkeley.edu CS10 : Beauty and Joy of Computing Internet II!!Senior Lecturer SOE Dan Garcia!!!www.cs.berkeley.edu/~ddgarcia CS10 L17 Internet II (1)! Why Networks?! Originally sharing I/O devices

More information

Scalable Multiprocessors

Scalable Multiprocessors Scalable Multiprocessors [ 11.1] scalable system is one in which resources can be added to the system without reaching a hard limit. Of course, there may still be economic limits. s the size of the system

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Lecture 3. The Network Layer (cont d) Network Layer 1-1

Lecture 3. The Network Layer (cont d) Network Layer 1-1 Lecture 3 The Network Layer (cont d) Network Layer 1-1 Agenda The Network Layer (cont d) What is inside a router? Internet Protocol (IP) IPv4 fragmentation and addressing IP Address Classes and Subnets

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Parallel Computing Trends: from MPPs to NoWs

Parallel Computing Trends: from MPPs to NoWs Parallel Computing Trends: from MPPs to NoWs (from Massively Parallel Processors to Networks of Workstations) Fall Research Forum Oct 18th, 1994 Thorsten von Eicken Department of Computer Science Cornell

More information

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview Chapter 4: chapter goals: understand principles behind services service models forwarding versus routing how a router works generalized forwarding instantiation, implementation in the Internet 4- Network

More information

Performance study example ( 5.3) Performance study example

Performance study example ( 5.3) Performance study example erformance study example ( 5.3) Coherence misses: - True sharing misses - Write to a shared block - ead an invalid block - False sharing misses - ead an unmodified word in an invalidated block CI for commercial

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

LS Example 5 3 C 5 A 1 D

LS Example 5 3 C 5 A 1 D Lecture 10 LS Example 5 2 B 3 C 5 1 A 1 D 2 3 1 1 E 2 F G Itrn M B Path C Path D Path E Path F Path G Path 1 {A} 2 A-B 5 A-C 1 A-D Inf. Inf. 1 A-G 2 {A,D} 2 A-B 4 A-D-C 1 A-D 2 A-D-E Inf. 1 A-G 3 {A,D,G}

More information

Uniprocessor Computer Architecture Example: Cray T3E

Uniprocessor Computer Architecture Example: Cray T3E Chapter 2: Computer-System Structures MP Example: Intel Pentium Pro Quad Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure

More information

Lecture 17: Parallel Architectures and Future Computer Architectures. Shared-Memory Multiprocessors

Lecture 17: Parallel Architectures and Future Computer Architectures. Shared-Memory Multiprocessors Lecture 17: arallel Architectures and Future Computer Architectures rof. Kunle Olukotun EE 282h Fall 98/99 1 Shared-emory ultiprocessors Several processors share one address space» conceptually a shared

More information

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System?

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System? Chapter 2: Computer-System Structures Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure of a computer system and understanding

More information

Review: Symmetric Multiprocesors (SMP)

Review: Symmetric Multiprocesors (SMP) CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from work

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University Chapter 3 Top Level View of Computer Function and Interconnection Contents Computer Components Computer Function Interconnection Structures Bus Interconnection PCI 3-2 Program Concept Computer components

More information

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2 Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99

More information

History of Distributed Systems. Joseph Cordina

History of Distributed Systems. Joseph Cordina History of Distributed Systems Joseph Cordina joseph.cordina@um.edu.mt otivation Computation demands were always higher than technological status quo Obvious answer Several computing elements working in

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #8 2/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes

More information

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13 I/O Handling ECE 650 Systems Programming & Engineering Duke University, Spring 2018 Based on Operating Systems Concepts, Silberschatz Chapter 13 Input/Output (I/O) Typical application flow consists of

More information

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization

Number of processing elements (PEs). Computing power of each element. Amount of physical memory used. Data access, Communication and Synchronization Parallel Computer Architecture A parallel computer is a collection of processing elements that cooperate to solve large problems fast Broad issues involved: Resource Allocation: Number of processing elements

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

Scalable Multiprocessors

Scalable Multiprocessors Topics Scalable Multiprocessors Scaling issues Supporting programming models Network interface Interconnection network Considerations in Bluegene/L design 2 Limited Scaling of a Bus Comparing with a LAN

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 21 Routing Outline Routing Switch Design Flow Control Case Studies Routing Routing algorithm determines which of the possible paths are used as routes how

More information

EE 457 Unit 7b. Main Memory Organization

EE 457 Unit 7b. Main Memory Organization 1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Lecture 8. Network Layer (cont d) Network Layer 1-1

Lecture 8. Network Layer (cont d) Network Layer 1-1 Lecture 8 Network Layer (cont d) Network Layer 1-1 Agenda The Network Layer (cont d) What is inside a router Internet Protocol (IP) IPv4 fragmentation and addressing IP Address Classes and Subnets Network

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 11

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 11 CMPE 150/L : Introduction to Computer Networks Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 11 1 Midterm exam Midterm this Thursday Close book but one-side 8.5"x11" note is allowed (must

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Conventional Computer Architecture. Abstraction

Conventional Computer Architecture. Abstraction Conventional Computer Architecture Conventional = Sequential or Single Processor Single Processor Abstraction Conventional computer architecture has two aspects: 1 The definition of critical abstraction

More information

Cache Coherence: Part II Scalable Approaches

Cache Coherence: Part II Scalable Approaches ache oherence: art II Scalable pproaches Hierarchical ache oherence Todd. Mowry S 74 October 27, 2 (a) 1 2 1 2 (b) 1 Topics Hierarchies Directory rotocols Hierarchies arise in different ways: (a) processor

More information

Lecture notes for CS Chapter 4 11/27/18

Lecture notes for CS Chapter 4 11/27/18 Chapter 5: Thread-Level arallelism art 1 Introduction What is a parallel or multiprocessor system? Why parallel architecture? erformance potential Flynn classification Communication models Architectures

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

Concepts of Distributed Systems 2006/2007

Concepts of Distributed Systems 2006/2007 Concepts of Distributed Systems 2006/2007 Introduction & overview Johan Lukkien 1 Introduction & overview Communication Distributed OS & Processes Synchronization Security Consistency & replication Programme

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

A Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines

A Key Theme of CIS 371: Parallelism. CIS 371 Computer Organization and Design. Readings. This Unit: (In-Order) Superscalar Pipelines A Key Theme of CIS 371: arallelism CIS 371 Computer Organization and Design Unit 10: Superscalar ipelines reviously: pipeline-level parallelism Work on execute of one instruction in parallel with decode

More information

Parallel Architecture Fundamentals

Parallel Architecture Fundamentals arallel Architecture Fundamentals Topics CS 740 September 22, 2003 What is arallel Architecture? Why arallel Architecture? Evolution and Convergence of arallel Architectures Fundamental Design Issues What

More information

A Scalable SAS Machine

A Scalable SAS Machine arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Scalable ache oherence Design principles of scalable cache protocols Overview of design space (8.1) Basic operation

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

Parallel Arch. Review

Parallel Arch. Review Parallel Arch. Review Zeljko Zilic McConnell Engineering Building Room 536 Main Points Understanding of the design and engineering of modern parallel computers Technology forces Fundamental architectural

More information

Lecture 4 - Network Layer. Transport Layer. Outline. Introduction. Notes. Notes. Notes. Notes. Networks and Security. Jacob Aae Mikkelsen

Lecture 4 - Network Layer. Transport Layer. Outline. Introduction. Notes. Notes. Notes. Notes. Networks and Security. Jacob Aae Mikkelsen Lecture 4 - Network Layer Networks and Security Jacob Aae Mikkelsen IMADA September 23, 2013 September 23, 2013 1 / 67 Transport Layer Goals understand principles behind network layer services: network

More information

Computer Networks and reference models. 1. List of Problems (so far)

Computer Networks and reference models. 1. List of Problems (so far) Computer s and reference models Chapter 2 1. List of Problems (so far) How to ensure connectivity between users? How to share a wire? How to pass a message through the network? How to build Scalable s?

More information

Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005

Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005 Name: SID: Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005 There are 10 questions in total. Please write your SID

More information

SunFire range of servers

SunFire range of servers TAKE IT TO THE NTH Frederic Vecoven Sun Microsystems SunFire range of servers System Components Fireplane Shared Interconnect Operating Environment Ultra SPARC & compilers Applications & Middleware Clustering

More information

Three parallel-programming models

Three parallel-programming models Three parallel-programming models Shared-memory programming is like using a bulletin board where you can communicate with colleagues. essage-passing is like communicating via e-mail or telephone calls.

More information

Network Layer: Router Architecture, IP Addressing

Network Layer: Router Architecture, IP Addressing Network Layer: Router Architecture, IP Addressing UG3 Computer Communications & Networks (COMN) Mahesh Marina mahesh@ed.ac.uk Slides thanks to Myungjin Lee and copyright of Kurose and Ross Router Architecture

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 27 Course Wrap Up What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some

More information

Chapter 4: network layer

Chapter 4: network layer Chapter 4: network layer chapter goals: understand principles behind network layer services: network layer service models forwarding versus routing how a router works routing (path selection) broadcast,

More information

ECE 697J Advanced Topics in Computer Networks

ECE 697J Advanced Topics in Computer Networks ECE 697J Advanced Topics in Computer Networks Switching Fabrics 10/02/03 Tilman Wolf 1 Router Data Path Last class: Single CPU is not fast enough for processing packets Multiple advanced processors in

More information

Lecture 16: Network Layer Overview, Internet Protocol

Lecture 16: Network Layer Overview, Internet Protocol Lecture 16: Network Layer Overview, Internet Protocol COMP 332, Spring 2018 Victoria Manfredi Acknowledgements: materials adapted from Computer Networking: A Top Down Approach 7 th edition: 1996-2016,

More information

Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems. Scott Marshall and Stephen Twigg

Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems. Scott Marshall and Stephen Twigg Incorporating DMA into QoS Policies for Maximum Performance in Shared Memory Systems Scott Marshall and Stephen Twigg 2 Problems with Shared Memory I/O Fairness Memory bandwidth worthless without memory

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (3 rd Week)

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) + (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Science 46 Computer Architecture Spring 24 Harvard University Instructor: Prof dbrooks@eecsharvardedu Lecture 22: More I/O Computer Science 46 Lecture Outline HW5 and Project Questions? Storage

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information