Arquitetura e Organização de Processadores. Aula 17. Arquiteturas multi-core
|
|
- Joan Fleming
- 6 years ago
- Views:
Transcription
1 Universidade Federal do Rio Grande do Sul Instituto de Informática Programa de Pós-Graduação em Computação Arquitetura e Organização de Processadores Aula 17 Arquiteturas multi-core
2 Motivation Future applications will require still more performance Power is current bottleneck Performance driven by higher frequency and ILP hits power wall Performance may be obtained by multiple processors running at lower frequencies
3 Single processor core Types of parallelism ILP Instruction-level parallelism VLIW Very Long Instruction Word SIMD Single Instruction Multiple Data SMT Simultaneous Multi-Threading Multiprocessing SMP Symmetrical Multi-Processing CMP Chip Multi-Processing (usually homogeneous) MPSoC Multi-Processor SoCs (usually heterogeneous)
4 Parallelism Massive parallelism required in the foreseeable future Frequency (MHz) Gigaops/s Operations per cycle Source: ITRS Roadmap, 2003
5 Complexity of media applications Ops/sample 100 TOPS H.263 QCIF MPEG1 CIF 10 TOPS MPEG2 MPEG2 MPEG2 100K Decode 10K Encode 1K MOPS 100 MOPS 1 GOPS 10 GOPS 1 TOPS GOPS 100K 1M 10M 100M 1G 10G Sampling rate / sec Source: Shen, SIPS 2003
6 Parallelism 100 Many-core era Massively parallel applications Increasing HW threads per socket 10 1 Hyper-thread Multi-core era Scalar and parallel applications Source:
7 General-purpose processing Tera-level computing involves three distinct types of workloads, or computing capabilities: Recognition: the ability to recognize patterns and models of interest to a specific user or application scenario Mining: the ability to mine large amounts of real-world data for the patterns or models of interest Synthesis: the ability to synthesize large datasets or a virtual world based on the patterns or models of interest Intel foresees a multi-core architecture that is scalable, adaptable, and programmable Source:
8 IBM Power4 General-purpose solutions - IBM
9 General-purpose solutions - IBM IBM Power4 2 cores, f = 1.4 GHz, 174 Mtransistors Single clock over entire die Power = 85 W One core may be turned off Trend: multiple processors on die, bus communication, shared cache 4 Power4 chips into single module Chips connected via bit buses Up to 128 MB L3 cache Bus speed = ½ processor speed Total throughput = 35 GB/s Trend: multiple processors on MCM, on-module communication, huge cache Source: Franza, MPSoC 05
10 General-purpose solutions - Sun Sun Ultrasparc IV 4-way MT SPARC pipe 4-way MT SPARC pipe 4-way MT SPARC pipe 4-way MT SPARC pipe 4-way MT SPARC pipe 4-way MT SPARC pipe 4-way MT SPARC pipe 4-way MT SPARC pipe 2 cores, f = 1.8 GHz Shared 2 MB L2 cache 300 Mtransistors Sun Niagara I/O shared functions crossbar 8 cores 4 threads per core Shared 3 MB L2 cache To be released in way banked L2 cache Memory controllers & I/O Source: Franza, MPSoC 05
11 General-purpose solutions - AMD AMD dual-core Opteron 2 cores, f = 1.8 GHz 106 Mtransistors Power = 70 W 2 x 1 MB L2 caches Unshared caches Source: Franza, MPSoC 05
12 General-purpose solutions - Intel Intel Pentium D 2 HT processors on MCM 2 x 1 MB L2 caches, unshared f = 3.2 GHz 230 Mtransistors Intel Itanium Montecito 2 VLIW cores, f = 1.5 GHz Power = 100 W 1.72 Btransistors 2 x 12 MB L3 asynchronous caches Multiple clock domains Power management Dynamic voltage and frequency adjustment Source: Franza, MPSoC 05
13 CMP com cache compartilhada Vantagens Baixa latência de comunicação entre os cores Interface entre a cache e a E/S é usada somente para comunicação off-chip A cache pode ser dinamicamente alocada entre os cores Desvantagens Maior complexidade Necessidade de maior banda para a cache Exemplos IBM Power 4/5 Sun UltraSPARC-IV+
14 CMP com E/S compartilhada Vantagens Simplicidade em relação ao modelo de cache compartilhada Não é necessário sair do chip para fazer comunicação entre os cores Desvantagens Desperdício de recursos devido à cache não compartilhada A banda entre a cache e o barramento é compartilhada pelo tráfego in-chip e offchip Exemplos Intel Itanium 2 (Montecito) AMD Opteron Dual-Core
15 CMP com encapsulamento compartilhado Vantagens Não requer modificações na lógica da CPU Tempo curto de projeto relativo aos outros modelos Desvantagens Latência da comunicação entre as CPUs Limita a freqüência do barramento de interconexão Exemplos Intel Pentium D (Smithfield) Intel Pentium D (Presley) Intel Xeon (Dempsey)
16 MPSoC issues Heterogeneous x homogeneous multi-processing: tradeoff between programmability and efficiency Heterogeneous ISAs DSP processors for media applications Hardwired blocks Configurable processors Heterogeneous memory systems and address spaces Heterogeneous interconnects MPSoCs are custom architectures, derived from configurable platforms, driven by standards Standards usually define I/O relationships, not algorithms
17 MPSoC issues Programming model and software development tools Memory model Heterogeneous memory systems are harder to program Support to real-time constraints and performance Communication architecture Support to real-time constraints and performance Design methodologies and tools How to configure a platform to meet application constraints? Time-to-market requires support from tools Market for tools is too limited More simulation-oriented (ASIC tools are more synthesisoriented)
18 Examples of multi-cores for the embedded market ST Nomadik Cell - IBM / Sony / Toshiba ARM11 MPCore Toshiba media processor MeP NEC MP-211 Panasonic UniPhier Infineon 3G-baseband MPSoC
19 ST Nomadik platform Memory, Storage & Connectivity Peripheral Interfaces General-purpose CPU ARM cache System DMA Embedded Memory cache cache cache Multi-media DSP Multi-media DSP Multi-media DSP Symmetrical DSPs HW1 DMA HW2 DMA Loosely-coupled Sub-systems Graphics Acceleration Source: Artieri, MPSoC 05
20 Nomadik - MPSoC benefits High-computing performance Multiple non-interfering domains of intense activity, each having its own processor, DMA services, and hardware accelerators for data intensive functions Hardware acceleration embedding standard functions Highest and predictable performance through a careful bus and memory hierarchy design Low-power Intrinsic low-power sub-systems Fine grain power management at sub-system level Leakage management by switching on & off sub-systems Source: Artieri, MPSoC 05
21 Nomadik - MPSoC benefits Software flexibility General-purpose CPU allows fast porting of new features Performance through optimization on DSP with reasonable effort Full performance at low power using HW functions Three levels from simplest to most advanced usage Monolithic general-purpose CPU Monolithic general-purpose CPU, multiple symmetrical DSPs Monolithic general-purpose CPU, multiple symmetrical DSPs, hardware accelerators Source: Artieri, MPSoC 05
22 Nomadik - Multi-media DSP processor profile Short pipeline, high VLIW parallelism efficiency 1 convolution tap per cycle (2 loads + 2 pointer updates + 1 multiplication + 1 MAC) Incremental architectural evolution, no race for frequency Floating point unit IEEE754 compliant Division and square root operation SIMD support Low power Level 0 cache for power saving Low-power instructions Massive gated clock physical implementation Programmed only in ANSI C Reduced learning curve and development time Allow seamless DSP architecture evolution Source: Artieri, MPSoC 05
23 Nomadik - Memory hierarchy and bus Becomes the main design bottleneck Memory cache hierarchy Bus matrix Usage of shared embedded memory to offload bandwidth from external memory Smart caching in embedded memory is key Managed by software Hardware controlled L1-cache at sub-system level is sized in accordance with average latency A very manageable bottleneck Source: Artieri, MPSoC 05
24 Nomadik - Memory hierarchy Bandwidth bottleneck, High latency DMA Sub-system 1 L1 cache Very high bandwidth, Low latency External Mass Memory (SDRAM) DMA Sub-system 1 L1 cache Embedded Memory (L2 cache) System DMA Source: Artieri, MPSoC 05
25 Nomadik Software platform User interface gaming MP3 player telephone messaging PIM browser High-level client API Multi-media framework Communication infra-structure Telephony Networking Java Symbian WinCE Operating system core Linux (kernel, device drivers, file system, ) Multi-media Accelerators & Audio-video codec (MP3, AAC, Midi, MPEG4, H.264, ) Low-level API (HCL) Communication interfaces (UARTs, USB, BT, ) Security Framework Peripheral interfaces (LCD, cameras, memory, ) Power management Source: Artieri, MPSoC 05
26 Nomadik - Software overview Applic Applic Applic Applic Applic Open OS middleware Driver Driver Driver Driver Driver HCL HCL HCL HCL HCL Nomadik kernel Component Manager ARM OS OS DSPs FW FW FW FW
27 Nomadik - Programming model Nomadik kernel A set of system services and API on which Open OS drivers are built Sub-system firmware is built Open OS agnostic Provides execution resource abstraction for user applications and firmware Source: Artieri, MPSoC 05
28 Nomadik - Programming model Component = process = service A dynamically downloadable object Component Manager A unique gateway to all sub-systems Aware of all sub-system resources state and activity Transparently execute a component on any of the sub-systems Manage the life cycle of a component Create, start, stop, kill component instances Apply policy rules Memory management Image installation Memory allocation Garbage collection Source: Artieri, MPSoC 05
29 Nomadik - Programming model Sub-system OS Real-time micro task scheduler Communication and synchronization services A sound execution framework Clear separation between invocation (component manager side) and execution (component instances) Highly scalable and flexible Best use of platform resources Source: Artieri, MPSoC 05
30 Nomadik - Tool support Multiple core approach ARM No a priori: whatever is available from the market for both compilation and debug Multi-media based sub-systems Dedicated and optimized tools for Compilation Simulation and analysis Debug and trace Compilation All C-based approach, no assembly code Highly optimized and robust ANSI C compiler DSP extensions matching the ITU/ETSI basic operation package Multi-platform tools Source: Artieri, MPSoC 05
31 ARM11 MPCore
32 ARM11 MPCore OS support: AMP vs. SMP Asymmetric multiprocessor (AMP) Programmer statically allocates tasks Uses a distributed view of memory Synchronization and communication via explicit message passing mechanism Same model as traditionally used in heterogeneous designs Workloads are partitioned and manually offloaded to specific processors Symmetric multiprocessing (SMP) OS dynamically allocates tasks to CPU Programmer uses a shared view of memory Synchronization and communication via common state in shared memory Normally homogeneous CPU arrangement Workloads are partitioned and dynamically shared between any processors OS related requirements Cache coherency Generic interrupt controller Watchdog timer per processor Source: Zivojnovic, MPSoC 05
33 Toshiba MeP (Media Processor) HW extensions MeP module MeP CPU core Instruction RAM/cache Data RAM/cache DSP Unit UCI Unit VLIW co-processor HW engine Heterogeneous multiprocessor bus bridge Local bus DMA Configurable processors N Global bus Source: Matsui, MPSoC 05
34 Toshiba MeP (Media Processor) Configurable processor MeP-C2 core Base processor: 32-bit RISC 5-stage pipeline 350 MHz 50 Kgates Configuration memory size optional instructions bus width (32/64 bits) interrupt (# channels, # levels) debug support unit User extensions User Custom Instruction (UCI) Unit single-cycle ALU instructions DSP unit multi-cycle ALU instructions VLIW co-processor 2-way or 3-way up to 10 hardware engines control register extension up to 4 Kwords Source: Matsui, MPSoC 05
35 Toshiba MeP (Media Processor) Example of application: MPSoC 4 MeP processors Main control Filter Video processor, with MPEG4 / H.264 codec accelerators Audio DSP, with DSP extension
36 Panasonic UniPhier Market: home electronics equipment TV, DVD, cell phones DPP encourages future signal processing functions DPP is an optional part for cell phones Hardware engines are normally ASIC design parts for standardized functions Complex functions which are not yet standardized as realized by DPP Processing Element Array Excursion units Instruction Parallel Processor (IPP) Control Unit Data Parallel Processor (DPP) Hardware Engine Fundamental Extension Extension Source: Nishitani, MPSoC 05
37 Market: cell phones NEC MP211 Current business acceleration Different OSs in component processors Poor future expandability due to single DSP Multi-layer AHB ARM926 (CPU0) ARM926 (CPU1) ARM926 (CPU1) DSP SPX-K602 Source: Nishitani, MPSoC 05
38 Massive multi-core CISCO CRS-1 Carrier Router System Continuous operation, service flexibility, extended longevity 92 Terabits per second Software programmable network processor (SPP) Each SPP processes 40 Gbps Parallel array of 188 Xtensa-based SPP processors Source: Fu, MPSoC 05
CSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationSoftware Defined Modem A commercial platform for wireless handsets
Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from
More informationMulticore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.
CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationMultithreading: Exploiting Thread-Level Parallelism within a Processor
Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers
William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved
More informationIntel Enterprise Processors Technology
Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology
More informationThe Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006
The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content
More informationIBM Cell Processor. Gilbert Hendry Mark Kretschmann
IBM Cell Processor Gilbert Hendry Mark Kretschmann Architectural components Architectural security Programming Models Compiler Applications Performance Power and Cost Conclusion Outline Cell Architecture:
More informationAn Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection
An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,
More informationAn Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki
An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &
More informationSimultaneous Multithreading on Pentium 4
Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on
More informationAdvanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University
Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationShared Memory Parallel Programming. Shared Memory Systems Introduction to OpenMP
Shared Memory Parallel Programming Shared Memory Systems Introduction to OpenMP Parallel Architectures Distributed Memory Machine (DMP) Shared Memory Machine (SMP) DMP Multicomputer Architecture SMP Multiprocessor
More informationAdvanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationSpring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University
18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?
More informationMulti-core Architectures. Dr. Yingwu Zhu
Multi-core Architectures Dr. Yingwu Zhu Outline Parallel computing? Multi-core architectures Memory hierarchy Vs. SMT Cache coherence What is parallel computing? Using multiple processors in parallel to
More informationAge nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications
Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor
More informationAdvanced Processor Architecture
Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong
More informationSaman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology
Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology http://cag.csail.mit.edu/ps3 6.189-chair@mit.edu A new processor design pattern emerges: The Arrival of Multicores MIT Raw 16 Cores
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationTECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS
TECHNOLOGY BRIEF March 1999 Compaq Computer Corporation ISSD Technology Communications CONTENTS Executive Overview1 Notice2 Introduction 3 8-Way Architecture Overview 3 Processor and I/O Bus Design 4 Processor
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationModeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano
Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market
More informationHW Trends and Architectures
Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty
More informationHardware and Software solutions for scaling highly threaded processors. Denis Sheahan Distinguished Engineer Sun Microsystems Inc.
Hardware and Software solutions for scaling highly threaded processors Denis Sheahan Distinguished Engineer Sun Microsystems Inc. Agenda Chip Multi-threaded concepts Lessons learned from 6 years of CMT
More informationComp. Org II, Spring
Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationParallel Processing & Multicore computers
Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)
More informationDepartment of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.
Department of Computer Science, Institute for System Architecture, Operating Systems Group Real-Time Systems '08 / '09 Hardware Marcus Völp Outlook Hardware is Source of Unpredictability Caches Pipeline
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationLecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform
More informationTopic & Scope. Content: The course gives
Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors
More informationComputer Architecture. Fall Dongkun Shin, SKKU
Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses
More informationToday. SMP architecture. SMP architecture. Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )
Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Systems Group Department of Computer Science ETH Zürich SMP architecture
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationSoC Platforms and CPU Cores
SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationLow-Power Processor Solutions for Always-on Devices
Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile
More informationTile Processor (TILEPro64)
Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth
More informationKaisen Lin and Michael Conley
Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC
More informationMulti-core Architectures. Dr. Yingwu Zhu
Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster
More informationEmbedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory
Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives
More information45-year CPU Evolution: 1 Law -2 Equations
4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there
More informationConfigurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.
Configurable s for SOC Design Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Why Listen to This Presentation? Understand how SOC design techniques, now nearly 20 years old, are
More informationHyperthreading Technology
Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationTHREAD LEVEL PARALLELISM
THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture
More informationComp. Org II, Spring
Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationMPSoC Design Space Exploration Framework
MPSoC Design Space Exploration Framework Gerd Ascheid RWTH Aachen University, Germany Outline Motivation: MPSoC requirements in wireless and multimedia MPSoC design space exploration framework Summary
More informationSimplifying the Development and Debug of 8572-Based SMP Embedded Systems. Wind River Workbench Development Tools
Simplifying the Development and Debug of 8572-Based SMP Embedded Systems Wind River Workbench Development Tools Agenda Introducing multicore systems Debugging challenges of multicore systems Development
More informationContents of this presentation: Some words about the ARM company
The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationLeveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD
Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationEmbedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi. Lecture - 10 System on Chip (SOC)
Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 10 System on Chip (SOC) In the last class, we had discussed digital signal processors.
More informationAll About the Cell Processor
All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,
More informationComputer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley
Computer Systems Architecture I CSE 560M Lecture 19 Prof. Patrick Crowley Plan for Today Announcement No lecture next Wednesday (Thanksgiving holiday) Take Home Final Exam Available Dec 7 Due via email
More informationThree basic multiprocessing issues
Three basic multiprocessing issues 1. artitioning. The sequential program must be partitioned into subprogram units or tasks. This is done either by the programmer or by the compiler. 2. Scheduling. Associated
More informationEITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor
EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationOctopus: A Multi-core implementation
Octopus: A Multi-core implementation Kalpesh Sheth HPEC 2007, MIT, Lincoln Lab Export of this products is subject to U.S. export controls. Licenses may be required. This material provides up-to-date general
More informationINF5063: Programming heterogeneous multi-core processors Introduction
INF5063: Programming heterogeneous multi-core processors Introduction Håkon Kvale Stensland August 19 th, 2012 INF5063 Overview Course topic and scope Background for the use and parallel processing using
More informationMaster Program (Laurea Magistrale) in Computer Science and Networking. High Performance Computing Systems and Enabling Platforms.
Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi Multithreading Contents Main features of explicit multithreading
More informationUsing Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology
Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore
More informationLecture 12: EIT090 Computer Architecture
Lecture 12: EIT090 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University December 1, 2009 A. Ardö, EIT Lecture 12: EIT090 Computer Architecture December 1, 2009 1
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationHow to Write Fast Code , spring th Lecture, Mar. 31 st
How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying
More informationEvolution of Computers & Microprocessors. Dr. Cahit Karakuş
Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationNext Generation Enterprise Solutions from ARM
Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationIntel released new technology call P6P
P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new
More informationAdvanced Parallel Programming I
Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University
More informationParallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization
Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor
More informationMulti-core Programming Evolution
Multi-core Programming Evolution Based on slides from Intel Software ollege and Multi-ore Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts, Evolution
More informationIntroduction to System-on-Chip
Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationSMD149 - Operating Systems - Multiprocessing
SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction
More informationOverview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy
Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system
More informationA Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures
A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative
More information4. Hardware Platform: Real-Time Requirements
4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More information