Computer Architecture

Size: px
Start display at page:

Download "Computer Architecture"

Transcription

1 Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

2 Basic concepts Thread: Threads are lightweight processes. They consist of several instructions. The threads share a common (virtual) address space. Threads can communicate via this common address space. Task: Tasks are heavyweight processes. Each task has its own address space. Tasks can only communicate via inter task communication channels like shared memory, pipes, message queues or sockets. A task can contain several threads Computer Architecture Part 10 page 2 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

3 Basic concepts Instruction level parallelism is limited. To further exploit parallel processing, thread or task level parallelism can be used. Two major architectures are known: Multithreaded processors exploit thread level parallelism Chip multiprocessors (multi core processors, many core processors) exploit task level parallelism Both concepts are also used in combination Computer Architecture Part 10 page 3 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

4 Basic concepts In a multi-threaded processor instructions of several threads of the program are candidates for concurrent issuing. This can be done in a classical scalar pipeline to hide the latencies of memory access. Here, instructions from several threads can be processed in the different pipeline stages. In can be as well combined with a superscalar pipeline to increase the level of possible parallelism from the intra thread level to the inter thread level. This is called SMT (Simultaneous Multithreading). Computer Architecture Part 10 page 4 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

5 Basic concepts Chip multiprocessors combine multiple processor cores on a single chip. Therefore these processors are also called multi core processors. Today's multicore processors integrate 2-8 cores on a chip. By increasing the number of cores in the future (e.g. > 100), the term many core processors is used. These cores can execute several tasks in parallel. Cores can be homogeneous or heterogeneous. Having multithreaded cores, multithreading and chip multiprocessing can be combined. Computer Architecture Part 10 page 5 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

6 Multithreaded Architectures Multithreaded processor: Supports the execution of multiple threads by hardware It can store the context information of several threads in separate register sets and execute instructions of different threads at the same time in the processor pipeline Different stages of the processor pipeline can contain instructions from different threads This exploits thread level parallelism on basis of parallelism in time (pipelining) Computer Architecture Part 10 page 6 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

7 Multithreaded Architectures Goal: Reduction of latencies caused by memory accesses or dependencies Such latencies can be bridged by switching to another thread During the latency, instructions from other threads are feed into the pipeline => the processor ultilzation is raised, the throughput of a load consisting of multiple threads increases (while the throughput of a single thread remains the same) Explicit multithreaded processors: each thread is a real thread of the application program Implicit multithreaded processors: speculative parallel threads are created dynamically out of a sequential program Computer Architecture Part 10 page 7 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

8 Basic multithreading techniques (2) (3) (4) (2) (3) (4) (a) single threaded prozessor (b) Cycle-by-cycle- Interleaving-Technik (fine-grain multithreading): Time (processor cycles) (2) (3) (4) (2) (2) (2) Context switches Context switch Context is switched each clock cycle (a) (b) (c) (c) Block-Interleaving-Technik (coarse-grain multithreading): Instructions of a thread are executed until an event causes a latency. Then context is switched. Computer Architecture Part 10 page 8 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

9 Comparing multithreading to superscaler and VLIW (2) (3) (4) (2) (3) (4) Time (processor cycles) N N N N N (2) (2) (2) (3) (4) (4) (2) (2) (2) (2) Context switches N N (2) (2) (2) N (3) N N N (4) (4) N N N N N (2) (2) (2) (2) Context switches (a) (b) (c) (d) a: four times superscalar processor b: four times VLIW processor c: four times superscaler processor d: four times VLIW processor with cycle by cycle interleaving with cycle by cycle interleaving Computer Architecture Part 10 page 9 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

10 Classification of block interleaving techniques Block Interleaving statisch dynamisch Explicit-switch Implicit-switch (switch-on-load, switch-on-store, switch-on-branch,...) Conditionalswitch Switchon-signal Switch-oncache-miss Switchon-use Computer Architecture Part 10 page 10 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

11 Simultaneous multithreading (SMT) A simultaneous multithreaded processor is able to issue instructions of multiple threads to multiple execution units in a single clock cycle. This exploits thread level and instruction level parallelism in time and space Instruction Fetch... Instruction Decode and Rename... Instruction Window Issue Reservation Stations Reservation Stations Execution 1... Execution 4 Retire and Write Back Computer Architecture Part 10 page 11 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

12 Comparing SMT to chip multiprocessing Simultaneous multithreading (a) and chip multiprocessing (b) (2) (3) (4) (2) (3) (4) Time (processor cycles) (2) (2) (2) (2) (3) (4) (4) (4) (4) (2) (4) (4) (4) (4) (2) (3) (2) (4) (2) (2) (4) (4) (2) (2) (4) (4) (4) (4) (4) (a) (2) (2) (3) (2) (4) (4) (2) (3) (4) (4) (2) (2) (3) (3) (4) (4) (2) (3) (3) (4) (b) Computer Architecture Part 10 page 12 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

13 Other applications of multithreading Resulting from the ability of fast context switching more application fields for multithreading arise Reduction of energy consumption Mispredictions in superscaler processors cost energy. Multithreaded processors can execute instructions from other threads instead Event handling Helper threads handle special events (e.g. carbage collection) Real-time processing Allows efficient real-time scheduling policies like LLF or GP Computer Architecture Part 10 page 13 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

14 Chip multiprocessing architectures A Chip-Multiprocessor (CMP) combines several processors on a single chip Instead of chip-multiprocessor, today this is also called Multi-Core- Processor, where a core denotes a single processor on the multi-core processor chip Each core can have the complexity of today s microprocessors and holds ist own primary cache for instructions and data Usually, the cores are organized as memory coupled multi processors with a shared address space Furthermore, a secondary cache is contained on the chip For future multi-core processors containing a large number of cores (>100), the term Many-Core-Processor is used Computer Architecture Part 10 page 14 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

15 Possible multi-core-configurations shared-main memory shared-secondary cache Processor Processo r Processor Processor Processor Processor Processor Processor Primary Cache Primary Cache Primary Cache Primary Cache Primary Cache Primary Cache Primary Cache Primary Cache Secondary Cache Secondary Cache Secondary Cache Secondary Cache Secondary Cache Global Memory Global Memory Computer Architecture Part 10 page 15 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

16 Possible multi-core-configurations (2) shared-primary cache Processor Processor Processor Processor Primary Cache Secondary Cache Global Memory Computer Architecture Part 10 page 16 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

17 Chip-Multiprocessor / Multi-Core Simulations show the shared secondary cache architecture superior to shared primary cache and shared main memory Therefore, mostly a large shared secondary cache is implemented on the processor chip Cache coherency protocols known from symmetric multi-processor architectures (e.g. MESI protocol) guarantee a correct access to the shared memory cells from inside and outside the processor chip Today, chip multiprocessing is often combined with simultaneous multithreading There, each core is a SMT core giving the advantages of both approaches Computer Architecture Part 10 page 17 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

18 An early single chip multiprocessor proposal: Hydra A Single Chip CPU 0 Primary Primary I-cache D-cache CPU0 Mem.Controller Centralized Bus Arbitration Mechanisms CPU 1 Primary Primary I-cache D-cache CPU1 Mem.Controller CPU 2 Primary Primary I-cache D-cache CPU2 Mem.Controller CPU 3 Primary Primary I-cache D-cache CPU 3 Mem.Controller On-chip Secondary Cache Off-chip L3 Interface Rambus Mem. Interface DMA I/O Bus Interface Cache SRAM Array DRAM Main Memory I/O Device Computer Architecture Part 10 page 18 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

19 Multi-Core examples IBM Power5 Symmetric multi-core processor with two 64-bit 2 times SMT processors having 64 kbytes instruction cache and 32 kbytes data cache Both cores share a MByte on-chip secondary cache Controller for third level cache as well on chip Four Power5 chips and four L3 cache chips are combined in a multi-chip module Computer Architecture Part 10 page 19 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

20 Multi-Core examples IBM Power6 Similar to Power5, but superscaler in-order-execution Level 1 cache size raised to 64 kbytes for instructions and data on each core 65 nm process 5 GHz clock frequency Computer Architecture Part 10 page 20 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

21 Multi-Core examples IBM Power7 4, 6 or 8 cores Turbo mode deactivates 4 out of 8 cores, but gives access to all memory controllers for the remaining 4 cores => improves single core performance Each core supports 4 times SMT 45 nm process 4 GHz clock frequency Computer Architecture Part 10 page 21 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

22 Multi-Core examples Intel Core 2 Duo (Wolfdale) 2 processor cores of Intel Core 2 architecture 32 kbytes data and instruction cache for each core Core 1 6 MBytes L2 cache 45 nm process 3 Ghz clock frequency L2 Cache Shared by both cores Core 2 Computer Architecture Part 10 page 22 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

23 Multi-Core examples Microarchitecture of Intel Core 2 family (a single core) Computer Architecture Part 10 page 23 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting Source: c t 16/2006

24 Multi-Core examples Intel Core 2 Quad (Yorkfield) 2 Wolfdale dices in a multi-chip module => 4 processor cores of Intel Core 2 architecture 32 kbytes data and instruction cache for each core 6 MBytes L2 cache for each dice 45 nm process 3 Ghz clock frequency Computer Architecture Part 10 page 24 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

25 Multi-Core examples Intel Core i7-3930k (Sandy Bridge E) 6 core processor (Hexa-Core) 32 kbytes data and instruction cache for each core 256 kbytes L2 cache for each core 15 MBytes common L3 cache 32 nm process 3.3 Ghz clock frequency Computer Architecture Part 10 page 25 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

26 Heterogeneous multi-cores While homogeneous multi-core processors are commonly used for general purpose computing, heterogeneous multi-core processors are seen as a future trend for embedded systems A first member of this technology is the IBM Cell processor containing a Power processor (Power Processor Element, PPE) and 8 dependend processors (Synergistic Processing Elements, SPE) PPE: based on Power architecture, two times SMT, controls the 8 SPEs SPE: contains a RISC processor with 128 bit SIMD (multimedia) instructions, a memory flow controller and a bus controller Originally designed for Sony Playstation 3, the cell processor is now used in various application domains Computer Architecture Part 10 page 26 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

27 Cell Processor Die Computer Architecture Part 10 page 27 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

28 GPUs Heterogeneous Many-Cores 1000 and more streaming processor cores for shading First generation: Special purpose hardware for various shading tasks Second generation: Programmable streaming processors for pixel shading, vertex shading,.. Third generation: Unfied Shaders Example: Radeon R600 GPU Computer Architecture Part 10 page 28 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

29 GPUs Another example: NVIDIA GF100 4 Graphic Processing Clusters (GPC) 768 kbytes L2 Cache 6 memory controllers Computer Architecture Part 10 page 29 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

30 GPUs A GPC consists of: Raster Engine (triangle setup, rasterization, Z- management) Polymorph Engine (Vertex attribute fetch, tesselation) 4 Streaming Multiprocessors (Unified Shading: vertex-, geometry-, raster-, texture-, pixel- processing) => Computer Architecture Part 10 page 30 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

31 GPUs A SM consists of: 32 CUDA Cores (Compute Unified Device Architecture) 16 Load/Store Units 4 Special Function Units (sin, cos, square root calculation, etc.) => GF100 Overall: 4 x 4 x 32 = 512 CUDA Cores 4 x 4 x 16 = 256 Load/Store Units 4 x 4 x 4 = 64 Special Function Units 4 x 4 = PolyMorph Engines 4 Raster Engines Computer Architecture Part 10 page 31 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

32 Multi-Core discussion: performance Due to multithreading in PC and server operating systems, two to four cores significantly increase the processor throughput Exploiting eight or more cores requires parallel application programs Hence, software development is challenged to deliver the necessary number of parallel threads by either parallelizing compilers or parallel applications Experiences from multiprocessors show a moderate number of parallel threads resulting in high performance improvement, but this does not scale to a higher amount of parallelism Beginning with 4 to 8 threads, the performance improvement is dramatically reduced Using 8 cores, except for very computing intensive applications (signal processing, graphic processing) some cores will be temporarily idle Furthermore, memory bandwidth can become a bottleneck Computer Architecture Part 10 page 32 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

33 Multi-Core discussion: hardware While current multi-core processors use cache coupled interconnection, future processors might rely on grid structures (network on chip) to improve performance Adaptive and reconfigurable MPSoC (Multi-Processor Systens-on-.Chip) will gain importance for embedded systems and general purpose computing Heterogeneous many-core GPUs are state-of-the-art Reconfigurable cache memories might allow variable connections to different cores Available input/output bandwidth is still an open problem for throughput oriented programs Computer Architecture Part 10 page 33 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

34 Multi-Core discussion: hardware For data access, transactional memory might be is a model for future multi-core processors Similar to database systems, memory access is organized as a transaction being executed completely or not at all Hardware support for checkpointing and rollback is necessary As an advantage, concurrent access is simplified (no locks) Furthermore, fault tolerance and dependability techniques will become more important as the error probability will increase with decreasing transistor dimensions On chip power management will keep the importance it has already today Computer Architecture Part 10 page 34 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

35 Multi-Core discussion: software Currently, operating system concepts known from memory coupled multiprocessor systems are used. Here, the operating system scheduler assigns independent processes to the available processors Different to these concepts, the closer core connection of multi-core processors leads to a different computation versus synchronization ratio allowing to use more fine grain parallelism Parallel computing will become the future standard programming model Most of the currently existing software is sequential, thus can run only on one core Programming languages and tools to exploit the fine grain parallelism of multi-core processors need to be developed Furthermore, software engineering techniques are needed to allow the development of safe parallel programs Computer Architecture Part 10 page 35 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

36 Multi-Core discussion: software The application development for multi-core processors will become one of the main future market places for computer scientists Today s applications have to be proceeded with the goal to exploit parallelism, gain performance and increase comfort New applications currently not realizable due to a lack of processor performance will arise These are hard to predict Possible applications must have the need for high computational performance reachable by parallelism Such applications might come from speech recognition, image recognition, data mining, learning technologies or hardware synthesis Computer Architecture Part 10 page 36 of 36 Prof. Dr. Uwe Brinkschulte, M.Sc. Benjamin Betting

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Antonio R. Miele Marco D. Santambrogio

Antonio R. Miele Marco D. Santambrogio Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

WHY PARALLEL PROCESSING? (CE-401)

WHY PARALLEL PROCESSING? (CE-401) PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism Computing Systems & Performance Beyond Instruction-Level Parallelism MSc Informatics Eng. 2012/13 A.J.Proença From ILP to Multithreading and Shared Cache (most slides are borrowed) When exploiting ILP,

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently

More information

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Multithreading: Exploiting Thread-Level Parallelism within a Processor Multithreading: Exploiting Thread-Level Parallelism within a Processor Instruction-Level Parallelism (ILP): What we ve seen so far Wrap-up on multiple issue machines Beyond ILP Multithreading Advanced

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Advanced Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

GRAPHICS PROCESSING UNITS

GRAPHICS PROCESSING UNITS GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

More information

Advanced Processor Architecture

Advanced Processor Architecture Advanced Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 11

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 3 Fundamentals in Computer Architecture Computer Architecture Part 3 page 1 of 55 Prof. Dr. Uwe Brinkschulte,

More information

Master Program (Laurea Magistrale) in Computer Science and Networking. High Performance Computing Systems and Enabling Platforms.

Master Program (Laurea Magistrale) in Computer Science and Networking. High Performance Computing Systems and Enabling Platforms. Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi Multithreading Contents Main features of explicit multithreading

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems Liang-Gee Chen Distinguished Professor General Director, SOC Center National Taiwan University DSP/IC Design Lab, GIEE, NTU 1

More information

PowerPC 740 and 750

PowerPC 740 and 750 368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

THREAD LEVEL PARALLELISM

THREAD LEVEL PARALLELISM THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 19 Prof. Patrick Crowley Plan for Today Announcement No lecture next Wednesday (Thanksgiving holiday) Take Home Final Exam Available Dec 7 Due via email

More information

Online Course Evaluation. What we will do in the last week?

Online Course Evaluation. What we will do in the last week? Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs

More information

IBM Cell Processor. Gilbert Hendry Mark Kretschmann

IBM Cell Processor. Gilbert Hendry Mark Kretschmann IBM Cell Processor Gilbert Hendry Mark Kretschmann Architectural components Architectural security Programming Models Compiler Applications Performance Power and Cost Conclusion Outline Cell Architecture:

More information

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University

Advanced d Processor Architecture. Computer Systems Laboratory Sungkyunkwan University Advanced d Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Modern Microprocessors More than just GHz CPU Clock Speed SPECint2000

More information

Exploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.

More information

Introducing Multi-core Computing / Hyperthreading

Introducing Multi-core Computing / Hyperthreading Introducing Multi-core Computing / Hyperthreading Clock Frequency with Time 3/9/2017 2 Why multi-core/hyperthreading? Difficult to make single-core clock frequencies even higher Deeply pipelined circuits:

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Introduction to Computing and Systems Architecture

Introduction to Computing and Systems Architecture Introduction to Computing and Systems Architecture 1. Computability A task is computable if a sequence of instructions can be described which, when followed, will complete such a task. This says little

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

How to Write Fast Code , spring th Lecture, Mar. 31 st

How to Write Fast Code , spring th Lecture, Mar. 31 st How to Write Fast Code 18-645, spring 2008 20 th Lecture, Mar. 31 st Instructor: Markus Püschel TAs: Srinivas Chellappa (Vas) and Frédéric de Mesmay (Fred) Introduction Parallelism: definition Carrying

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

All About the Cell Processor

All About the Cell Processor All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,

More information

Portland State University ECE 588/688. IBM Power4 System Microarchitecture

Portland State University ECE 588/688. IBM Power4 System Microarchitecture Portland State University ECE 588/688 IBM Power4 System Microarchitecture Copyright by Alaa Alameldeen 2018 IBM Power4 Design Principles SMP optimization Designed for high-throughput multi-tasking environments

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Parallel Systems I The GPU architecture. Jan Lemeire

Parallel Systems I The GPU architecture. Jan Lemeire Parallel Systems I The GPU architecture Jan Lemeire 2012-2013 Sequential program CPU pipeline Sequential pipelined execution Instruction-level parallelism (ILP): superscalar pipeline out-of-order execution

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

Introduction to GPU programming with CUDA

Introduction to GPU programming with CUDA Introduction to GPU programming with CUDA Dr. Juan C Zuniga University of Saskatchewan, WestGrid UBC Summer School, Vancouver. June 12th, 2018 Outline 1 Overview of GPU computing a. what is a GPU? b. GPU

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

Handout 2 ILP: Part B

Handout 2 ILP: Part B Handout 2 ILP: Part B Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism Loop unrolling by compiler to increase ILP Branch prediction to increase ILP

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Lecture 14: Multithreading

Lecture 14: Multithreading CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors

COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors COSC 6385 Computer Architecture - Data Level Parallelism (III) The Intel Larrabee, Intel Xeon Phi and IBM Cell processors Edgar Gabriel Fall 2018 References Intel Larrabee: [1] L. Seiler, D. Carmean, E.

More information

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 7: The Programmable GPU Core Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today A brief history of GPU programmability Throughput processing core 101 A detailed

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

The University of Texas at Austin

The University of Texas at Austin EE382N: Principles in Computer Architecture Parallelism and Locality Fall 2009 Lecture 24 Stream Processors Wrapup + Sony (/Toshiba/IBM) Cell Broadband Engine Mattan Erez The University of Texas at Austin

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Introduction II. Overview

Introduction II. Overview Introduction II Overview Today we will introduce multicore hardware (we will introduce many-core hardware prior to learning OpenCL) We will also consider the relationship between computer hardware and

More information

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance

More information

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK SUBJECT : CS6303 / COMPUTER ARCHITECTURE SEM / YEAR : VI / III year B.E. Unit I OVERVIEW AND INSTRUCTIONS Part A Q.No Questions BT Level

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

Lecture 25: Board Notes: Threads and GPUs

Lecture 25: Board Notes: Threads and GPUs Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel

More information

ASYNCHRONOUS SHADERS WHITE PAPER 0

ASYNCHRONOUS SHADERS WHITE PAPER 0 ASYNCHRONOUS SHADERS WHITE PAPER 0 INTRODUCTION GPU technology is constantly evolving to deliver more performance with lower cost and lower power consumption. Transistor scaling and Moore s Law have helped

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor

More information

Spring 2010 Prof. Hyesoon Kim. Xbox 360 System Architecture, Anderews, Baker

Spring 2010 Prof. Hyesoon Kim. Xbox 360 System Architecture, Anderews, Baker Spring 2010 Prof. Hyesoon Kim Xbox 360 System Architecture, Anderews, Baker 3 CPU cores 4-way SIMD vector units 8-way 1MB L2 cache (3.2 GHz) 2 way SMT 48 unified shaders 3D graphics units 512-Mbyte DRAM

More information

A brief History of INTEL and Motorola Microprocessors Part 1

A brief History of INTEL and Motorola Microprocessors Part 1 Eng. Guerino Mangiamele ( Member of EMA) Hobson University Microprocessors Architecture A brief History of INTEL and Motorola Microprocessors Part 1 The Early Intel Microprocessors The first microprocessor

More information

Parallel Computing. Parallel Computing. Hwansoo Han

Parallel Computing. Parallel Computing. Hwansoo Han Parallel Computing Parallel Computing Hwansoo Han What is Parallel Computing? Software with multiple threads Parallel vs. concurrent Parallel computing executes multiple threads at the same time on multiple

More information

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

GPUs and GPGPUs. Greg Blanton John T. Lubia

GPUs and GPGPUs. Greg Blanton John T. Lubia GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware

More information

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model.

Performance of Computer Systems. CSE 586 Computer Architecture. Review. ISA s (RISC, CISC, EPIC) Basic Pipeline Model. Performance of Computer Systems CSE 586 Computer Architecture Review Jean-Loup Baer http://www.cs.washington.edu/education/courses/586/00sp Performance metrics Use (weighted) arithmetic means for execution

More information

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 5th Edition, Irv Englander John

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More

More information

Computer Architecture 计算机体系结构. Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 9. CMP and Multicore System 第九讲 片上多处理器与多核系统 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2017 Review Classification of parallel architectures Shared-memory system Cache coherency

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014

Computer Architecture Lecture 15: Load/Store Handling and Data Flow. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 18-447 Computer Architecture Lecture 15: Load/Store Handling and Data Flow Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014 Lab 4 Heads Up Lab 4a out Branch handling and branch predictors

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information