Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
|
|
- Alyson Wood
- 6 years ago
- Views:
Transcription
1 Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance An increase in compute power typically demands proportional increases in lower latency / higher bandwidth communication services. 1
2 Cluster Networks Issues with cluster interconnections are similar to those with normal networks: Latency & Bandwidth Topology type (bus, ring, torus, hypercube etc). Routing Direct connections (point-to-point) or indirect connections. NIC (Network Interface Card) capabilities. Physical medium (Twisted pair, fibre optic) Balance between performance and cost 2
3 Interconnection Topologies In standard LANs we have two general structures: Shared network (bus) All messages are broadcast each processor listens to every message. Requires complex access control (e.g. CSMA/CD). CSMA/CD: Carriers Sense Multiple Access with Collision Detection Collisions can occur: requires back-off policies and retransmissions. Suitable when the offered load is low - inappropriate for high performance applications. Very little reason to use this form of network today. Switched network Permits point-to-point communications between sender & receiver. Fast internal transport provides high aggregate bandwidth. Multiple messages are sent simultaneously. 3
4 Metrics to evaluate network topology Useful metrics for switched network topology: Scalability : the network s switch scalability with nodes. Degree: number of links to / from a node measure aggregate bandwidth Diameter: the shortest path between the furthest nodes. measure latency Bisection width: the minimum number of links that must be cut in order to divide the topology into two independent networks of the same size (+/- one node). Essentially a measure of bottleneck bandwidth - if higher, the network will perform better under load. 4
5 Interconnection Topologies Crossbar switch: Low latency and high throughput. Switch scalability is poor - O(N 2 ) Lots of wiring 5
6 Interconnection Topologies Linear Arrays and Rings Consider networks with switch scaling costs better than O(N 2 ). In one dimension, we have simple linear arrays. O(N) switches. These can wrap around to make a ring or 1D torus. latency is high. 2D/3D Cartesian applications will perform poorly with this network. 6
7 2D 3D 1
8 Interconnection Topologies 2D Meshes Can wrap-around as a 2D torus. Switch scaling: O(N) Average degree: 4 Diameter: O(2n 1/2 ) Bisection width: O(n 1/2 ) 7
9 Interconnection Topologies Hypercubes: K dimension, Switches N= 2 K. Diameter: O(K). Good bisectional width (O(2 K-1 )). 8
10 Interconnection Topologies Binary Tree: Scaling: n = 2 d processor nodes (where d = depth) 2 d+1-1 switches Degree: 3 Diameter: O(2d) Bisection width: O(1) 9
11 Interconnection Topologies Fat trees: Similar in diameter to a binary tree. Bisection width (which equates to bottleneck) is greatly improved due to additional dimensions. 10
12 Interconnection Topologies Summary of topologies: Topology Degree Diameter Bisection 1D Array 2 N-1 1 1D Ring 2 N/2 2 2D Mesh 4 2N 1/2 N 1/2 2D Torus 4 N 1/2 2N 1/2 Hypercube n=log 2 (N) n N/2 11
13 Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a non-dedicated environment (I.e. a LAN). usually, there is a finite buffer size so it is possible that packets will be dropped under heavy load. Also impose a larger in-switch latency. Can detect errors in the packets Worm hole routing (Also called cut-through switching): Packet is divided into small flits (flow control digits). Switch examines the first flit (header) which contains the destination address, sets up a circuit and forwards the flit immediately. Subsequent flits of the message are forwarded as they arrive (near wirespeed). Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection 12
14 1
15 2
16 3
17 Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1
18 1
19 Interconnects in Top500 list 11/2009 2
20 Interconnects in Top500 list 11/2008 3
21 Cluster Network Technologies Gigabit Ethernet: The technology has matured and now offers very good performance at a very low cost. Latency performance is moderate - many Ethernet switches are designed for general LANs (store & forward) where latency reduction is not necessary the primary incentive (the latency is order of ms). Zero-copy OS-bypass message passing can be supported with programmable NIC and direct memory access. 4
22 Cluster Network Technologies Myrinet: using fibre optic cable Uses a fat-tree structure Low latency (7-10 µsec) with a peak bandwidth of 4G bps. Provides zero-copy message passing and can offload packet processing to the NIC. Uses cut-through/worm-hole switching to reduce latency. More expensive than Ethernet (a) Twisted pair cable in Ethernet (b) Fibre optic cable 5
23 Zero copy protocol 6
24 Cluster Network Technologies Quadrics: product of a strategic partnership between Quadrics & Compaq (used in ASCI/Q). Uses a fat quad-tree topology Very low latency of 2-5 µsec; bandwidth is about 2Gbps 7
25 Cluster Network Technologies InfiniBand: by Intel. Basic link speed of 2.5Gb/s. Cut-through/worm-hole switches are used. Latency is about 200 nanoseconds. 8
26 BlueGene/L No. 1 in Top500 list from Source: IBM 10
27 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours (bidirectional). Routing achieved in hardware. each link with 1.4 Gbit/s. 1.4 x 6 x 2= 16.8 Gbit/s aggregate bandwidth 11
28 BlueGene/L Other three networks: Binary combining tree Used for collective/global operations - reductions, sums, products, barriers etc. Low latency (2μS) Gigabit Ethernet I/O network Support file I/O An I/O node is responsible for performing I/O operations for 128 processors Diagnostic & control network Booting nodes, monitoring processors. Each chip has the above four network interfaces (torus, tree, i/o, diagnostics) Note specialised networks are used for different purposes - quite different from many other HPC cluster architectures. 12
29 BlueGene/L Message Passing: The BlueGene focussed a good deal of energy developing an efficient MPI implementation to reduce latency in the software stack. Using the MPICH code-base as a start-point: MPI library was enhanced with respect to machine architecture. For example, using the combining tree for reductions & broadcasts. Reading paper: Filtering Failure Logs for a BlueGene/L Prototype 13
30 ASCI Q The Q supercomputing system at Los Alamos National Laboratory (LANL) Product of Advanced Simulation and Computing (ASCI) program Used for simulation and computational modelling No. 2 in 2002 in Top500 supercomputer list 14
31 ASCI Q Classical cluster architecture SMPs (AlphaServer ES45s from HP) are put in one segment Each with four EV Ghz CPUs with 16-MB cache the whole system has 3 segments The three segments can operate independently or as a single system Aggregate 60 TeraFLOPS capability. 33 Terabytes of memory 664 TB of global storage Interconnection using Quadrics switch interconnect (QSNet) High bandwidth (250MB/s) and Low latency (5us) network. Top500 list: 15
32 Earth Simulator Built by NEC, located in the Earth Simulator Centre in Japan Used for running global climate models to evaluate the effects of global warming No.1 from
33 Earth Simulator 640 nodes, each with 8 vector processors and 16GB memory Two nodes are installed in one cabinet In total: 5120 processors (NEC SX-5) 10 TeraByte memory 700 TeraByte of disk storage and 1.6 PetaByte of Tape storage Computing capacity: 36 TFlop/s Networking: Crossbar interconnection (very expensive) Bandwidth: 16GB/s between any two nodes Latency: 5us Dual level parallelism: OpenMP in-node, MPI out of node Physical installation: Machine resides on 3th floor; Cables on 2nd ; Power generation & cooling on 1st and ground floor. 17
BlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationReduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection
Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a
More informationOutline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers
Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS
More informationTDT Appendix E Interconnection Networks
TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages
More informationParallel Computer Architecture II
Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More information1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects
Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationEE 4683/5683: COMPUTER ARCHITECTURE
3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationInterconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection
More informationPhysical Organization of Parallel Platforms. Alexandre David
Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:
More informationInterconnection Network
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationNetworks. Distributed Systems. Philipp Kupferschmied. Universität Karlsruhe, System Architecture Group. May 6th, 2009
Networks Distributed Systems Philipp Kupferschmied Universität Karlsruhe, System Architecture Group May 6th, 2009 Philipp Kupferschmied Networks 1/ 41 1 Communication Basics Introduction Layered Communication
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationEN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationLecture: Interconnection Networks
Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationECE/CS 757: Advanced Computer Architecture II Interconnects
ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationOutline: Connecting Many Computers
Outline: Connecting Many Computers Last lecture: sending data between two computers This lecture: link-level network protocols (from last lecture) sending data among many computers 1 Review: A simple point-to-point
More informationScalable Ethernet Clos-Switches. Norbert Eicker John von Neumann-Institute for Computing Ferdinand Geier ParTec Cluster Competence Center GmbH
Scalable Ethernet Clos-Switches Norbert Eicker John von Neumann-Institute for Computing Ferdinand Geier ParTec Cluster Competence Center GmbH Outline Motivation Clos-Switches Ethernet Crossbar Switches
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationCCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM
CCS HC taisuke@cs.tsukuba.ac.jp 1 2 CU memoryi/o 2 2 4single chipmulti-core CU 10 C CM (Massively arallel rocessor) M IBM BlueGene/L 65536 Interconnection Network 3 4 (distributed memory system) (shared
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationHubs. twisted pair. hub. 5: DataLink Layer 5-1
Hubs Hubs are essentially physical-layer repeaters: bits coming from one link go out all other links at the same rate no frame buffering no CSMA/CD at : adapters detect collisions provides net management
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationCray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET
CRAY XD1 DATASHEET Cray XD1 Supercomputer Release 1.3 Purpose-built for HPC delivers exceptional application performance Affordable power designed for a broad range of HPC workloads and budgets Linux,
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationCS575 Parallel Processing
CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
More informationHigh Performance Computing: Concepts, Methods & Means Enabling Technologies 2 : Cluster Networks
High Performance Computing: Concepts, Methods & Means Enabling Technologies 2 : Cluster Networks Prof. Amy Apon Department of Computer Science and Computer Engineering University of Arkansas March 15 th,
More informationHigh Performance Computing - Parallel Computers and Networks. Prof Matt Probert
High Performance Computing - Parallel Computers and Networks Prof Matt Probert http://www-users.york.ac.uk/~mijp1 Overview Parallel on a chip? Shared vs. distributed memory Latency & bandwidth Topology
More informationHigh Performance Computing Programming Paradigms and Scalability Part 2: High-Performance Networks
High Performance Computing Programming Paradigms and Scalability Part 2: High-Performance Networks PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Scientific Computing (SCCS)
More informationInfiniBand SDR, DDR, and QDR Technology Guide
White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses
More informationThe Tofu Interconnect D
The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji
More informationCMSC 611: Advanced. Interconnection Networks
CMSC 611: Advanced Computer Architecture Interconnection Networks Interconnection Networks Massively parallel processor networks (MPP) Thousands of nodes Short distance (
More informationInterconnection Networks. Issues for Networks
Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time
More informationLecture 20: Distributed Memory Parallelism. William Gropp
Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order
More informationMore on LANS. LAN Wiring, Interface
More on LANS Chapters 10-11 LAN Wiring, Interface Mostly covered this material already NIC = Network Interface Card Separate processor, buffers incoming/outgoing data CPU might not be able to keep up network
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationThe Impact of Optics on HPC System Interconnects
The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes
More informationHigh Performance Computing Programming Paradigms and Scalability
High Performance Computing Programming Paradigms and Scalability PD Dr. rer. nat. habil. Ralf Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Summer Term 208
More informationThe NE010 iwarp Adapter
The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter
More informationChapter 16 Networking
Chapter 16 Networking Outline 16.1 Introduction 16.2 Network Topology 16.3 Network Types 16.4 TCP/IP Protocol Stack 16.5 Application Layer 16.5.1 Hypertext Transfer Protocol (HTTP) 16.5.2 File Transfer
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationThe Optimal CPU and Interconnect for an HPC Cluster
5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationLocal Area Network Overview
Local Area Network Overview Chapter 15 CS420/520 Axel Krings Page 1 LAN Applications (1) Personal computer LANs Low cost Limited data rate Back end networks Interconnecting large systems (mainframes and
More informationParallel Architectures
Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between
More informationCSC630/CSC730: Parallel Computing
CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationLecture 18: Communication Models and Architectures: Interconnection Networks
Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationMessage Passing Models and Multicomputer distributed system LECTURE 7
Message Passing Models and Multicomputer distributed system LECTURE 7 DR SAMMAN H AMEEN 1 Node Node Node Node Node Node Message-passing direct network interconnection Node Node Node Node Node Node PAGE
More informationIntel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins
Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications
More informationWhat is the Future for High-Performance Networking?
What is the Future for High-Performance Networking? Wu-chun (Wu) Feng feng@lanl.gov RADIANT: Research And Development in Advanced Network Technology http://www.lanl.gov/radiant Computer & Computational
More informationInterface The exit interface a packet will take when destined for a specific network.
The Network Layer The Network layer (also called layer 3) manages device addressing, tracks the location of devices on the network, and determines the best way to move data, which means that the Network
More informationInterconnection networks
Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory
More informationChapter Seven. Local Area Networks: Part 1. Data Communications and Computer Networks: A Business User s Approach Seventh Edition
Chapter Seven Local Area Networks: Part 1 Data Communications and Computer Networks: A Business User s Approach Seventh Edition After reading this chapter, you should be able to: State the definition of
More informationInitial Performance Evaluation of the Cray SeaStar Interconnect
Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on
More informationFault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson
Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip
More informationOutline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)
Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn
More informationCluster Computing. Interconnect Technologies for Clusters
Interconnect Technologies for Clusters Interconnect approaches WAN infinite distance LAN Few kilometers SAN Few meters Backplane Not scalable Physical Cluster Interconnects FastEther Gigabit EtherNet 10
More informationLecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimised Programming Preliminary discussion, 17.7.2007 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de Dipl.-Geophys.
More informationCSCI Computer Networks
CSCI-1680 - Computer Networks Link Layer III: LAN & Switching Chen Avin Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti, Peterson & Davie, Rodrigo Fonseca Today: Link Layer (cont.)
More informationData Link Layer, Part 5. Medium Access Control
CS 455 Medium Access Control, Page 1 Data Link Layer, Part 5 Medium Access Control These slides are created by Dr. Yih Huang of George Mason University. Students registered in Dr. Huang s courses at GMU
More informationFast-Response Multipath Routing Policy for High-Speed Interconnection Networks
HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope
More informationPrepared by Agha Mohammad Haidari Network Manager ICT Directorate Ministry of Communication & IT
Network Basics Prepared by Agha Mohammad Haidari Network Manager ICT Directorate Ministry of Communication & IT E-mail :Agha.m@mcit.gov.af Cell:0700148122 After this lesson,you will be able to : Define
More informationCS/COE1541: Intro. to Computer Architecture
CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency
More informationInterconnection Networks
Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially
More informationNetworks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized
More informationInfiniband Fast Interconnect
Infiniband Fast Interconnect Yuan Liu Institute of Information and Mathematical Sciences Massey University May 2009 Abstract Infiniband is the new generation fast interconnect provides bandwidths both
More informationInternetworking is connecting two or more computer networks with some sort of routing device to exchange traffic back and forth, and guide traffic on
CBCN4103 Internetworking is connecting two or more computer networks with some sort of routing device to exchange traffic back and forth, and guide traffic on the correct path across the complete network
More informationResource allocation and utilization in the Blue Gene/L supercomputer
Resource allocation and utilization in the Blue Gene/L supercomputer Tamar Domany, Y Aridor, O Goldshmidt, Y Kliteynik, EShmueli, U Silbershtein IBM Labs in Haifa Agenda Blue Gene/L Background Blue Gene/L
More informationCSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca
CSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti Administrivia Homework I out later today, due next Thursday Today: Link Layer (cont.)
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationCSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca
CSCI-1680 Link Layer Wrap-Up Rodrigo Fonseca Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti Today: Link Layer (cont.) Framing Reliability Error correction Sliding window Medium
More informationIntel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances
Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world
More information