A MIMD Multi Threaded Processor
|
|
- Shon Norman
- 6 years ago
- Views:
Transcription
1 F. Lesser, A MIMD Multi Threaded Processor Falk Lesser V. Angelov, J. de Cuveland, V. Lindenstruth, C. Reichling, R. Schneider, M.W. Schulz Kirchhoff Institute for Physics University Heidelberg, Germany Phone: ti@kip.uni-heidelberg.de (speaker): lesser@kip.uni-heidelberg.de WWW: Outline Introduction Application for for the the MIMD processor Electronics environment The The MIMD processor architecture Summary
2 F. Lesser, 2 High Energy Physics Study of fundamental constituents of matter and forces between them» quarks, electrons, neutrinos, photon, Z 0, W ±, etc. Higher and higher energies are required to delve deeper and deeper into matter i.e. larger, more powerful, more expensive No longer affordable by individual countries
3 CERN Conseil Europeen pour la Recherche Nucleaire located in Geneva area Switzerland high energy and nuclear physics research center Established in 1954» kickstart research in Europe Currently funded by 20 European Countries Users: ~6500 scientists from 500 institutes in ~80 countries Next main research program: Large Hadron Collider Member state Participating non member state Start of operation: 2005, data taking ~20 years F. Lesser, 3
4 F. Lesser, 4 Large Hadron Collider & The Alps 4 Interaction points Geneva International airport ~100m deep Superconducting accelerator 7 TeV p p 5.5 TeV Pb Pb 27km circumference
5 Nucleons In Collision 8000 collisions per second Pb ~ speed of light Each interaction generating > particles in acceptance of detector Task: Find within one specific particle pair within 6 µs out of charged particles Pb ~ speed of light 6 µs to: digitize 1.2 million data 10 MHz/10 Bit process 29 Mbytes form global decision Main goal is to create the Quark-Gluon Plasma, A state of matter existing during the first few microseconds after the big bang F. Lesser, 5
6 F. Lesser, 6 A Particle Detector named ALICE ITS: 2.62 million channels (1 Bit) TPC: channels (10 Bit) TRD: 1.2 million channels (10 Bit) 1,425 million ADCs Digitization rate: 10 MHz/10 Bit Peak data rate: 17,8 TB/sec MIMD processors Computing time of 6 µs Target clock rate: 120 MHz Magnet (0,4 Tesla) Measures particle trajectories, momentum and provides particle identification
7 TRD Electronics Overview to/from next MCM (right within padrow) PS 16 Channels.. 16x. Trigger/ Data out µcpu million million analog analog channels MHz MHz channels channels per per module module sources sources 1 trigger trigger bit bit track track segment segment processing processing on on chamber chamber maximum maximum latency latency 6µs 6µs ~20 ~20 space space points points per pertracklet tracklets (layers) (layers) per per track track PS PS to/from next MCM (left within padrow) preamp Digital chip Signal layers Multi-Chip-Module : 16 Channels Detector : AGND DGND APWR DPWR F. Lesser, 6 Chips (96 Channel) 24 padrows Chamber 18 subdivisions in φ 5 subdiv. in z 7
8 Tracklet Fit Concept Pad-No a b Pipelined Calculation : 20 timebin i MCM m MCM m+1 During During Drift Drift Time: Time: N hitcount hitcount y y i position i position x x i timebin timebin sum sum y y i position position sum i i sum x x i y y i timebin*position timebin*position sum sum y 2 i i y i 2 position² position² sum sum i After After Drift Drift Time: Time: a a intercept intercept b b slope slope a χ χ 2 2 track track quality quality merge merge track track segments segments in in padrow padrow 2 i ( xi) x y x xy 2 i i i i i 2 2 i Nx ( x ) 2 i Nxy ii xy i i b Nx A P+1 10 A P-1 A P A x Ax/Ac LUT 7 LUT yi 6 2 x i y i x i y i x i x i y i x i y i x i 2 y i 2 y i 2 x i y i x i y i y i 2 HC Amplitude Build Ratio F. Lesser, Calculate Position Calculate fit summands Calculate sums Sum Memory 8
9 TRD Trigger Timing and Data Flow Data reduction 29 MByte 4,35 MByte 1,14 MByte 86.4 kbyte 1 Bit Data bandwidth 18 TByte 1,85 TByte Data ship Global Tracking ADC pipeline latency Calculate Σ fit relevant pipeline ADC output Calculate Tracklets Visible TEC drift area Drift time ADC pipeline latency LHC Clock MHz / ns t 0 10 MHz power 120 MHz Highly I/O bound: 17.8 TB/sec Very tight time budget: µs F. Lesser, 9
10 F. Lesser, 10 Next Step: Compute and Combine L22 L20 L13 L12 L11 R02 R03 R10 L02 L03 L * * * Readout Board (RB) 23*2 23*2 φ 0 φ 1 φ 2 φ 3
11 Transistors Computer Architecture Trends Bit-level parallelism -level Thread- level (?) i8080 i8008 i4004 i80286 i8086 R10000 Pentium i80386 R3000 R Performance Fraction of total cycles (%) µproc 60%/yr. (2X/1.5yr) DRAM 6%/yr. (2X/15 yrs) CPU Number of instructions issued Processor-Memory Performance Gap: (grows 50% / year) DRAM F. Culler, Lesser, Patterson, UC Berkeley Time
12 Why MIMD with Shared Memory? 1.5 µs computing 120 MHz 180 clock cycles ~120 clock cycles required to finish one task Four tasks have to be done ~ 480 cycles needed The processor must execute up to four tasks on different data objects Arithmetic operations should be executed in one clock cycle Data has to be shared without overhead Quad Ported SRAM (QPM) Global Register File (GRF) Four tasks Four CPUs Same code Quad port instruction memory saves chip area MIMD F. Lesser, 12
13 F. Lesser, 13 Generic Processor Architecture Sequencer Memory Sequencer Memory Source Bus I/0 Load/ Store Unit ALU Source Bus Register File Write Back Bus Load/ Store Unit Internal RAM Preprocessor Interface I/0 I/0 I/0 Load/ Store Unit Load/ Store Unit Load/ Store Unit Load/ ALU Register Store File Unit Write Back Bus Sequencer Memory Source Bus Load/ ALU Register Store File Unit Write Back Bus Sequencer Memory Source Bus Load/ ALU Register Store File Unit Write Back Bus Internal RAM Internal RAM Internal RAM Sequencer Memory Source Bus I/0 Load/ Store Unit ALU Register File Load/ Store Unit Internal RAM To execute up to four tasks in real time, four CPUs are needed Independent CPUs can't share data or program Write Back Bus
14 MIMD Architecture Shared I-Mem External I/O Sequencer Sequencer Sequencer Memory Projection of Pre-Processor data inside the MIMD Arbitter Load/ Store Unit FIT ALU Sorce Sorce Bus Bus Source Sorce Data Bus Bus PRF PRF 21 PRF PRF 43 GRF Load/ Store Unit Internal RAM PrePro Write Write Back Back Bus Bus Write Write Back Back Data Bus Bus Four Harvard CPUs coupled by multi port instruction memory multi port data memory Global Register File (GRF) Register based interface to Pre-Processor Two stage pipeline (fetch/decode, execute/write back) Architecture can be adopted to any general purpose processor F. Lesser, Four ALUs Shared data via GRF and D-Mem 14
15 F. Lesser, 15 Some Features 16 Bit data word 16 private registers per node 16 global registers common to all nodes 24 Kbytes quad port
16 Set and Format 24 Bit fixed length instruction word 70 instructions in total 22 ALU instructions 26 branch instructions 3 instructions for synchronization 14 Load/Store instructions 4 instructions to handle interrupts Opcode Source 1 Source 2 Destination Opcode Immediate Source 2 Destination Opcode Immediate Destination Opcode Source 1 Immediate Opcode Immediate Branch Opcode Source 1 Branch Opcode Immediate F. Lesser, 16
17 17
18 Quad Port Memory Write data Addresses WE Write data registers Clock WE Address and WE registers Write units Write Data Address Word lines Address decoders Memory Matrix Bit lines Bit cells Read Data Sense amplifiers t w t acc F. Lesser, Read data 18
19 F. Lesser, 19 A Closer Look Inside the MPM Full custom design used for quad port 16 bit data memory and 24 bit Prechargesignal instruction memory Precharge Delivers/receives data to/from 4 CPUs simultaneously Line 1 Max. access time is about 2 ns 4 (0.18 µm) Needed access time 6 ns Precharge 1 Bit 1 Bit Not_bitline1-4 Bitline1-4Not_bitline1-4 Bitline1-4Not_bitline1-4 sel prepower bit not_bit vdd Organized in blocks of 64 lines bit Line width is parametrizable Line Bit 1 Bit Senseamplifiers Senseamplifiers word1 word2 word3 word4 bit vdd gnd word4 word3 word2 word1 not_bit Dout1 Dout2 Dout3 Dout4 Dout1 Dout2 Dout3 Dout4 out Four blocks of the MPM with additional test logic in 0.35 µm process gnd
20 F. Lesser, 20 Synchronization Three instructions for synchronization SEM sets the synchronization mask SYN suspend the PC SYT copies the synchronization register Implementation of flexible synchronization patterns Synchronization implemented as a side effect of access to the GRF SYN Write '0' Synchronization Register Write '0' Synchronization Register Write '0' Synchronization Register Write '0' Synchronization Register GRF Assigned to CPU 1 Write Assigned to CPU 2 Assigned to CPU 3 Assigned to CPU 4
21 F. Lesser, 21 Data I/O Global I/O P1 P2 P3 PN Priority Arbiter CPU 4 CPU 3 Addr Data CPU 2 CPU 1 Direct input from preprocessor via multi ported registers Private I/O to external links System bus for peripheries Arbiter to select a CPU Private I/O Private I/O Private I/O Private I/O Fit Register File
22 22
23 23
24 Summary High Energy Physics presents very interesting challenges in the computer science (high throughput, low latency, low overhead, massively parallel processing) Use of Multi Ported Memories (MPM) enables integration of multiple generic processor cores as MIMD unit MPM as global register file allows zero overhead, asynchronous multi processor communication (semaphores, locks, etc.) MPM operating as buffer memory provides scalable, independent access to shared data structures MPM operating as buffered crossbar switch allows tight integration of I/O for network and I/O processors For further information contact F. Lesser, US/PCT Patend Pending: 6,067,595 24
Low-Power, Networked MIMD Processor for Particle Physics
Low-Power, Networked MIMD Processor for Particle Physics Outline Introduction to High Energy Physics Introduction to Triggers ALICE TRD Trigger Architecture The MIMD Processor Tests and Measurements Summary
More informationPoS(High-pT physics09)036
Triggering on Jets and D 0 in HLT at ALICE 1 University of Bergen Allegaten 55, 5007 Bergen, Norway E-mail: st05886@alf.uib.no The High Level Trigger (HLT) of the ALICE experiment is designed to perform
More informationHigh Level Trigger System for the LHC ALICE Experiment
High Level Trigger System for the LHC ALICE Experiment H Helstrup 1, J Lien 1, V Lindenstruth 2,DRöhrich 3, B Skaali 4, T Steinbeck 2, K Ullaland 3, A Vestbø 3, and A Wiebalck 2 for the ALICE Collaboration
More informationReal-time Analysis with the ALICE High Level Trigger.
Real-time Analysis with the ALICE High Level Trigger C. Loizides 1,3, V.Lindenstruth 2, D.Röhrich 3, B.Skaali 4, T.Steinbeck 2, R. Stock 1, H. TilsnerK.Ullaland 3, A.Vestbø 3 and T.Vik 4 for the ALICE
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationOverview. About CERN 2 / 11
Overview CERN wanted to upgrade the data monitoring system of one of its Large Hadron Collider experiments called ALICE (A La rge Ion Collider Experiment) to ensure the experiment s high efficiency. They
More informationThe Nios II Family of Configurable Soft-core Processors
The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture
More information2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.
ESE534: Computer Organization Previously Day 7: February 6, 2012 Memories Arithmetic: addition, subtraction Reuse: pipelining bit-serial (vectorization) Area/Time Tradeoffs Latency and Throughput 1 2 Today
More informationConference The Data Challenges of the LHC. Reda Tafirout, TRIUMF
Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment
More informationA Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment.
A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment. 8th Workshop on Electronics for LHC Experiments 9-13 Sept.
More informationThe Memory Hierarchy Part I
Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1 Outline: Memory components: RAM memory cells
More information1. Introduction. Outline
Outline 1. Introduction ALICE computing in Run-1 and Run-2 2. ALICE computing in Run-3 and Run-4 (2021-) 3. Current ALICE O 2 project status 4. T2 site(s) in Japan and network 5. Summary 2 Quark- Gluon
More informationRecap: Machine Organization
ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,
More informationInternal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.
Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Internal Memory http://www.yildiz.edu.tr/~naydin 1 2 Outline Semiconductor main memory Random Access Memory
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationMemory Hierarchy Technology. The Big Picture: Where are We Now? The Five Classic Components of a Computer
The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Datapath Today s Topics: technologies Technology trends Impact on performance Hierarchy The principle of locality
More informationECE7995 (4) Basics of Memory Hierarchy. [Adapted from Mary Jane Irwin s slides (PSU)]
ECE7995 (4) Basics of Memory Hierarchy [Adapted from Mary Jane Irwin s slides (PSU)] Major Components of a Computer Processor Devices Control Memory Input Datapath Output Performance Processor-Memory Performance
More information2000 N + N <100N. When is: Find m to minimize: (N) m. N log 2 C 1. m + C 3 + C 2. ESE534: Computer Organization. Previously. Today.
ESE534: Computer Organization Previously Day 5: February 1, 2010 Memories Arithmetic: addition, subtraction Reuse: pipelining bit-serial (vectorization) shared datapath elements FSMDs Area/Time Tradeoffs
More informationThe University of Adelaide, School of Computer Science 13 September 2018
Computer Architecture A Quantitative Approach, Sixth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationCS152 Computer Architecture and Engineering Lecture 16: Memory System
CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson
More informationAn introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More informationTopic 21: Memory Technology
Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,
More informationThe GAP project: GPU applications for High Level Trigger and Medical Imaging
The GAP project: GPU applications for High Level Trigger and Medical Imaging Matteo Bauce 1,2, Andrea Messina 1,2,3, Marco Rescigno 3, Stefano Giagu 1,3, Gianluca Lamanna 4,6, Massimiliano Fiorini 5 1
More informationUnleashing the Power of Embedded DRAM
Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers
More informationCERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008
CERN openlab II CERN openlab and Intel: Today and Tomorrow Sverre Jarp CERN openlab CTO 16 September 2008 Overview of CERN 2 CERN is the world's largest particle physics centre What is CERN? Particle physics
More informationDetector Control LHC
Detector Control Systems @ LHC Matthias Richter Department of Physics, University of Oslo IRTG Lecture week Autumn 2012 Oct 18 2012 M. Richter (UiO) DCS @ LHC Oct 09 2012 1 / 39 Detectors in High Energy
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationThe Memory Hierarchy & Cache
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory
More informationCpE 442. Memory System
CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)
More informationTing Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China
CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX
More informationVirtualizing a Batch. University Grid Center
Virtualizing a Batch Queuing System at a University Grid Center Volker Büge (1,2), Yves Kemp (1), Günter Quast (1), Oliver Oberst (1), Marcel Kunze (2) (1) University of Karlsruhe (2) Forschungszentrum
More informationMemories: Memory Technology
Memories: Memory Technology Z. Jerry Shi Assistant Professor of Computer Science and Engineering University of Connecticut * Slides adapted from Blumrich&Gschwind/ELE475 03, Peh/ELE475 * Memory Hierarchy
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationBasics DRAM ORGANIZATION. Storage element (capacitor) Data In/Out Buffers. Word Line. Bit Line. Switching element HIGH-SPEED MEMORY SYSTEMS
Basics DRAM ORGANIZATION DRAM Word Line Bit Line Storage element (capacitor) In/Out Buffers Decoder Sense Amps... Bit Lines... Switching element Decoder... Word Lines... Memory Array Page 1 Basics BUS
More informationThe creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM
The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM Lukas Nellen ICN-UNAM lukas@nucleares.unam.mx 3rd BigData BigNetworks Conference Puerto Vallarta April 23, 2015 Who Am I? ALICE
More informationECE232: Hardware Organization and Design
ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast
More informationarxiv: v1 [physics.ins-det] 11 Jul 2015
GPGPU for track finding in High Energy Physics arxiv:7.374v [physics.ins-det] Jul 5 L Rinaldi, M Belgiovine, R Di Sipio, A Gabrielli, M Negrini, F Semeria, A Sidoti, S A Tupputi 3, M Villa Bologna University
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationStraw Detectors for the Large Hadron Collider. Dirk Wiedner
Straw Detectors for the Large Hadron Collider 1 Tracking with Straws Bd π π? B-Mesons properties? Charge parity symmetry violation? 2 Tracking with Straws Bd proton LHC Start 2007 π proton 14 TeV π? B-Mesons
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Organization Prof. Michel A. Kinsy The course has 4 modules Module 1 Instruction Set Architecture (ISA) Simple Pipelining and Hazards Module 2 Superscalar Architectures
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationTracking and compression techniques
Tracking and compression techniques for ALICE HLT Anders Strand Vestbø The ALICE experiment at LHC The ALICE High Level Trigger (HLT) Estimated data rate (Central Pb-Pb, TPC only) 200 Hz * 75 MB = ~15
More informationECE 485/585 Microprocessor System Design
Microprocessor System Design Lecture 4: Memory Hierarchy Memory Taxonomy SRAM Basics Memory Organization DRAM Basics Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering
More informationSemiconductor Memory Classification. Today. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. CPU Memory Hierarchy.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 4, 7 Memory Overview, Memory Core Cells Today! Memory " Classification " ROM Memories " RAM Memory " Architecture " Memory core " SRAM
More informationESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems
ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 26: November 9, 2018 Memory Overview Dynamic OR4! Precharge time?! Driving input " With R 0 /2 inverter! Driving inverter
More informationECE468 Computer Organization and Architecture. OS s Responsibilities
ECE468 Computer Organization and Architecture OS s Responsibilities ECE4680 buses.1 April 5, 2003 Recap: Summary of Bus Options: Option High performance Low cost Bus width Separate address Multiplex address
More informationCS 426 Parallel Computing. Parallel Computing Platforms
CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:
More informationarxiv: v1 [physics.ins-det] 16 Oct 2017
arxiv:1710.05607v1 [physics.ins-det] 16 Oct 2017 The ALICE O 2 common driver for the C-RORC and CRU read-out cards Boeschoten P and Costa F for the ALICE collaboration E-mail: pascal.boeschoten@cern.ch,
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Review: Major Components of a Computer Processor Devices Control Memory Input Datapath Output Secondary Memory (Disk) Main Memory Cache Performance
More informationRT2016 Phase-I Trigger Readout Electronics Upgrade for the ATLAS Liquid-Argon Calorimeters
RT2016 Phase-I Trigger Readout Electronics Upgrade for the ATLAS Liquid-Argon Calorimeters Nicolas Chevillot (LAPP/CNRS-IN2P3) on behalf of the ATLAS Liquid Argon Calorimeter Group 1 Plan Context Front-end
More information! Memory Overview. ! ROM Memories. ! RAM Memory " SRAM " DRAM. ! This is done because we can build. " large, slow memories OR
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 2: April 5, 26 Memory Overview, Memory Core Cells Lecture Outline! Memory Overview! ROM Memories! RAM Memory " SRAM " DRAM 2 Memory Overview
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationSemiconductor Memory Classification
ESE37: Circuit-Level Modeling, Design, and Optimization for Digital Systems Lec 6: November, 7 Memory Overview Today! Memory " Classification " Architecture " Memory core " Periphery (time permitting)!
More informationLECTURE 10: Improving Memory Access: Direct and Spatial caches
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses
More informationDesign with Microprocessors
Design with Microprocessors Lecture 12 DRAM, DMA Year 3 CS Academic year 2017/2018 1 st semester Lecturer: Radu Danescu The DRAM memory cell X- voltage on Cs; Cs ~ 25fF Write: Cs is charged or discharged
More informationEEC 483 Computer Organization
EEC 483 Computer Organization Chapter 5 Large and Fast: Exploiting Memory Hierarchy Chansu Yu Table of Contents Ch.1 Introduction Ch. 2 Instruction: Machine Language Ch. 3-4 CPU Implementation Ch. 5 Cache
More informationECE468 Computer Organization and Architecture. Memory Hierarchy
ECE468 Computer Organization and Architecture Hierarchy ECE468 memory.1 The Big Picture: Where are We Now? The Five Classic Components of a Computer Processor Control Input Datapath Output Today s Topic:
More informationCSCS CERN videoconference CFD applications
CSCS CERN videoconference CFD applications TS/CV/Detector Cooling - CFD Team CERN June 13 th 2006 Michele Battistin June 2006 CERN & CFD Presentation 1 TOPICS - Some feedback about already existing collaboration
More informationLow-Power SRAM and ROM Memories
Low-Power SRAM and ROM Memories Jean-Marc Masgonty 1, Stefan Cserveny 1, Christian Piguet 1,2 1 CSEM, Neuchâtel, Switzerland 2 LAP-EPFL Lausanne, Switzerland Abstract. Memories are a main concern in low-power
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationCS 152 Computer Architecture and Engineering. Lecture 6 - Memory
CS 152 Computer Architecture and Engineering Lecture 6 - Memory Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs152!
More informationMARIE: An Introduction to a Simple Computer
MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.
More informationThe ALICE TPC Readout Control Unit 10th Workshop on Electronics for LHC and future Experiments September 2004, BOSTON, USA
Carmen González Gutierrez (CERN PH/ED) The ALICE TPC Readout Control Unit 10th Workshop on Electronics for LHC and future Experiments 13 17 September 2004, BOSTON, USA Outline: 9 System overview 9 Readout
More informationChapter 5B. Large and Fast: Exploiting Memory Hierarchy
Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCourse Administration
Spring 207 EE 363: Computer Organization Chapter 5: Large and Fast: Exploiting Memory Hierarchy - Avinash Kodi Department of Electrical Engineering & Computer Science Ohio University, Athens, Ohio 4570
More informationWhere Have We Been? Ch. 6 Memory Technology
Where Have We Been? Combinational and Sequential Logic Finite State Machines Computer Architecture Instruction Set Architecture Tracing Instructions at the Register Level Building a CPU Pipelining Where
More informationMark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness
EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology
More informationRegisters. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH
PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O T R O L ALU CTL ISTRUCTIO FETCH ISTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMOR ACCESS WRITE BACK A D D A D D A L U
More informationLet!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies
1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of
More informationThe coolest place on earth
The coolest place on earth Large Scale Messaging with ActiveMQ for Particle Accelerators at CERN 2 Overview Examples 30min Introduction to CERN Operation Usage of ActiveMQ 3 About the Speaker Member of
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface COEN-4710 Computer Hardware Lecture 7 Large and Fast: Exploiting Memory Hierarchy (Chapter 5) Cristinel Ababei Marquette University Department
More informationLecture-14 (Memory Hierarchy) CS422-Spring
Lecture-14 (Memory Hierarchy) CS422-Spring 2018 Biswa@CSE-IITK The Ideal World Instruction Supply Pipeline (Instruction execution) Data Supply - Zero-cycle latency - Infinite capacity - Zero cost - Perfect
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationCPU Pipelining Issues
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
More informationMemory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy
ENG338 Computer Organization and Architecture Part II Winter 217 S. Areibi School of Engineering University of Guelph Hierarchy Topics Hierarchy Locality Motivation Principles Elements of Design: Addresses
More informationMemory Hierarchy. Maurizio Palesi. Maurizio Palesi 1
Memory Hierarchy Maurizio Palesi Maurizio Palesi 1 References John L. Hennessy and David A. Patterson, Computer Architecture a Quantitative Approach, second edition, Morgan Kaufmann Chapter 5 Maurizio
More informationMemory Hierarchy and Caches
Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline
More informationThe GTPC Package: Tracking and Analysis Software for GEM TPCs
The GTPC Package: Tracking and Analysis Software for GEM TPCs Linear Collider TPC R&D Meeting LBNL, Berkeley, California (USA) 18-19 October, 003 Steffen Kappler Institut für Experimentelle Kernphysik,
More information,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics
,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationEEM 486: Computer Architecture. Lecture 9. Memory
EEM 486: Computer Architecture Lecture 9 Memory The Big Picture Designing a Multiple Clock Cycle Datapath Processor Control Memory Input Datapath Output The following slides belong to Prof. Onur Mutlu
More informationCycle Time for Non-pipelined & Pipelined processors
Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all
More informationCPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?
cps 14 memory.1 RW Fall 2 CPS11 Computer Organization and Programming Lecture 13 The System Robert Wagner Outline of Today s Lecture System the BIG Picture? Technology Technology DRAM A Real Life Example
More informationISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A REVIEWARTICLE OF SDRAM DESIGN WITH NECESSARY CRITERIA OF DDR CONTROLLER Sushmita Bilani *1 & Mr. Sujeet Mishra 2 *1 M.Tech Student
More informationMemory Hierarchy: Motivation
Memory Hierarchy: Motivation The gap between CPU performance and main memory speed has been widening with higher performance CPUs creating performance bottlenecks for memory access instructions. The memory
More informationIntegrated CMOS sensor technologies for the CLIC tracker
Integrated CMOS sensor technologies for the CLIC tracker Magdalena Munker (CERN, University of Bonn) On behalf of the collaboration International Conference on Technology and Instrumentation in Particle
More information14:332:331. Week 13 Basics of Cache
14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Week131 Spring 2006
More informationMemory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5 th Edition Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More information! Memory. " RAM Memory. " Serial Access Memories. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 5, 8 Memory: Periphery circuits Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery " Serial Access Memories
More information1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola
1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device
More informationECE 152 Introduction to Computer Architecture
Introduction to Computer Architecture Main Memory and Virtual Memory Copyright 2009 Daniel J. Sorin Duke University Slides are derived from work by Amir Roth (Penn) Spring 2009 1 Where We Are in This Course
More informationThe Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs.
The Hierarchical Memory System The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory Hierarchy:
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationEE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems
EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 23 Memory Systems Mattan Erez The University of Texas at Austin EE382: Principles of Computer Architecture, Fall 2011 -- Lecture
More informationIntroduction to Multicore architecture. Tao Zhang Oct. 21, 2010
Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)
More informationChapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics
Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.
More information