POWER3: Next Generation 64-bit PowerPC Processor Design

Size: px
Start display at page:

Download "POWER3: Next Generation 64-bit PowerPC Processor Design"

Transcription

1 POWER3: Next Generation 64-bit PowerPC Processor Design Authors Mark Papermaster, Robert Dinkjian, Michael Mayfield, Peter Lenk, Bill Ciarfella, Frank O Connell, Raymond DuPont High End Processor Design, IBM Server Group Development, Austin, Texas with plans for increased frequency by as much as 40% over the POWER3-II architecture, with more design tuning combined with a move to IBM s newest breakthrough technology - Silicon on Insulator (SOI). Abstract IBM s new POWER3 microprocessor integrates the high-bandwidth and floating point capabilities of its POWER2 architecture predecessor into a fully scaleable 64-bit PowerPC* symmetric multi-processor (SMP) implementation. Based on PowerPC Architecture*, this microprocessor contains the fundamental design features that are planned to be used in the CPUs for the next three generations of RISC System / 6000* targeted at the numeric intensive computing (NIC), high-end analysis, graphics, commercial workstation and server markets. This paper provides an overview of how processor microarchitecture, silicon technology, packaging technology, and systems architecture can be leveraged to produce outstanding high-performance computational capabilities. What follows is a description of the processor design point, the execution core, and key features - such as hardware prefetch - to reduce latency to memory. Design The POWER3 microprocessor objectives were to continue the POWER2 architecture tradition of bringing real solutions to IBM RISC System/6000 customers high compute needs, while adding 64-bit addressability, double-word interger operations, and symmetric multiprocessor support in the PowerPC Architecture. To satisfy compute intensive requirements, the POWER3 design contains a highly superscalar core which comprises eight execution units, fed by a high bandwidth memory interface supporting four floating point operations per cycle. The technology strategy of the POWER3 design was to produce a highly sophisticated processor core and memory subsystem in an advanced, but well-established technology. POWER3-II design is the next step, planned to result in an increase of frequency by up to 50% by tuning the design and moving into IBM s cutting-edge copper technology - CMOS7S. The POWER3-III design is step three, Floating FPU1 Floating FPU2 Branch/Dispatch Memory Mgmt Instruction Cache IU FXU1 Processor Overview FXU2 FXU3 Bus Interface : L2 Control, Clock Figure 1 shows the block diagram of the POWER3 processor, which comprises eight execution units, a 32KB instruction cache, 64KB data cache, and an on board bus interface unit () that controls both the L2 bus interface and the memory bus interface. Two of the three fixed point units (FXUs) are single cycle execution for the bulk of the integer arithmetic instructions. The third unit executes the multi-cycle integer instructions such as multiply and divide. The two floating point units (FPUs) are fully independent, each containing dedicated hardware for square root and divide routines as well as fused multiply-add instruction execution. The FPUs are fully pipelined with three cycle latency, single cycle throughput. Two load store units provide the data to sustain four floating point operations per cycle. A 16-entry store queue buffer prevents stores from stalling the machine while loads are being performed. Loads are also executed speculatively, improving data throughput. The branch execution unit employs dynamic branch prediction, with four pending predicted branches supported. The branch target address LS1 LS2 Memory Mgmt Data Cache DU L2 Cache 6XX Bus 1-16 MB Figure 1. POWER3 Block Diagram

2 cache contains 256 entries ( by 2 way associative), and the branch history table has 2048 entries. The instructions are speculatively executed with a unique register renaming scheme that involves a total of 64 virtual rename registers (32 fixed and 32 floating point), and a total of 40 physical rename registers actually implemented (16 fixed point and 24 floating point). The on board contains the interface logic end processors shipping today with the STREAM memory benchmark. This benchmark defines execution to be out of main memory and not L2 MB/SEC STREAM MEMORY BANDWIDTH Instruction Cache IPU FXU DEC HP SGI SUN POWER C180 Origin 2000 Ultra 43P 260 5/300E 250 MHz Enterprise 200 MHz 6001 *Using the STREAM benchmark for uniporcessor data as of 9/98 FPU IFU Figure 3. High Bandwidth Performance cache. When applications are executed out of the L2 cache, POWER3 processor will perform even faster. High Bandwidth: Data Cache DCMMU supporting up to 16 Mbytes L2, 6XX system bus protocols, and dedicated hardware to reduce latency to memory. Containing 15 million transistors, the POWER3 processor die is shown in Figure 2. It is manufactured in IBM s 0.25 micron hybrid CMOS 6S2 technology, with five levels of interconnect metallurgy. System Level Bandwidth Data Cache Figure 2. POWER3 processor die photo A key challenge of the POWER3 processor was to design a high bandwidth system interface to feed a wide superscalar processor core. Using IBM packaging technology's high I/O count, the POWER3 processor was implemented with separate, independent 16 byte memory bus and 32byte L2 bus, each with separate address, data, and control lines, achieving 6.4GBps to the L2 at 200 MHz. As an example, Figure 3 shows the POWER3 processor capability in comparison to other high Figure 4 is a block diagram of the data memory subsystem. The 64 KB data cache is implemented as a Content Addressable Memory (CAM) based-cache with a long line size ( bytes). The array is way set associative and eight way interleaved (four way by line and two way by doubleword). The interleaving of the data cache effectively provides a multiported array function 8 Byte 8 Byte Load Data Store Data D-Cache CRB 32 SBB 64 Bus Interface 8 Byte Load Data XX Bus Private L2 Bus Figure 4. High Bandwidth Interface

3 provided there is no access conflict between the subarray banks. The bandwidth and concurrency of operations in this data cache are impressive and achieve the goal of maintaining the high throughput of the predecessor POWER1 and POWER2 architecture processors,* while adding SMP and 64-bit addressability. The data cache has wide internal busing to perform the following highly parallel operations: A) Eight-byte read for Load/Store #1 B) Eight-byte read for Load/Store #2 C) Eight-byte write for the Store Queue D) byte cache line write from the Cache Reload Buffer (CRB) E ) 64-byte half line read to Cache Storeback Buffer (CSB) The porting and controls of the data cache are such that (assuming no interleave collisions) any four of operations A through E can occur in the same cycle, with operations C and E being the only exclusive ones. processing path of the POWER3 processor from instruction decode and dispatch to instruction completion. The Instruction Buffer can contain up to 12 instructions while the Dispatch Buffer can hold up to four instructions. If the Instruction Buffer is empty the Dispatch Buffer can be loaded directly from the instruction cache. Up to four instructions can be dispatched per cycle. Dispatch is in order to the execution unit queues. Eight instructions can be issued from the execution unit queues to the eight execution units in one cycle. Issue and execution are out of order, with a total of 32 outstanding instructions tracked by the Completion Buffer. Up to four instructions can be completed per cycle from the Completion Buffer Sequential Instructions I-Cache The byte CRB and the byte CSB create a pipelined interface with the. This consists of a 32 byte bus that sends data from the to the data cache CRB and a 16 byte bus that sends data from the data cache CSB to the. The data cache was carefully designed to not be a bottleneck to system performance under any conditions. High Bandwidth: Instruction Cache Figure 5 shows the instruction cache block diagram. The 32K byte instruction cache is also way set associative, 2 way interleaved (on a line basis), with byte lines. The interleaving permits a byte cache write from the CRB to one interleave, while an eight instruction (32-byte) fetch is done to the Instruction Buffers from the other interleave. The instruction cache read has the additional feature of being able to access eight sequential instructions at a time from anywhere within a given line. This allows the instruction cache to send eight sequential instructions to the Instruction Buffer in a single cycle. Decode-to-Completion Bandwidth Cache Reload Buffer 32 Bus Interface XX Bus Private L2 Bus Figure 5. Instruction Processin This instruction processing bandwidth gives the POWER3 processor a very high utilization efficiency, which is reflected in the outstanding performance on the Linpak 1000x1000 benchmark (TPP). (See performance section below.) Reduced Latency Memory Subsystem To ensure that potentially needed data and instructions are available to keep the core from stalling, the POWER3 processor designers invested in two key latency reduction techniques. The high instruction bandwidth from the instruction cache is maintained throughout the instruction

4 First, all caches are non-blocking. The instruction cache supports two outstanding misses, and the data cache supports up to four. Second, the POWER3 processor implements sequential instruction and data access detection algorithms in hardware, which permit the prefetch of cache lines to closer levels of the memory hierarchy. This reduces the negative performance impact of increasing memory latencies, particularly on technical workloads. These programs often access memory in regular, sequential patterns. The POWER3 processor prefetches up to four separate data streams with a depth of two to four lines for each stream. Compared with the base design without hardware prefetch, the prefecthing engine improves sustained performance by greater than 2.5X on loops such as those found in double precision A times X plus Y (DAXPY) compared to the base design without hardware prefetch. Programs with these regular, sequential patterns contained within the L2 cache will execute nearly as fast as if the data were contained in the L1 cache. Instructions are prefetched into the L1 cache up to one sequential line ahead of the line currently being accessed on the predicted path. These architectural features not only enhance performance for the current 200 MHz POWER3 processor, but they also enable higher frequency versions to scale well in performance. System Implementation The system interface is designed to allow flexibility in system implementation from low cost, bus-based systems to more complex switch-based configurations providing greater address and data bandwidth. combine to cover the wide spectrum of demands that characterize technical and commercial computing. Applications may be limited by the rate of computational speed or by the rate of data delivery to the computational units. They may be primarily fixed point intensive, primarily floating point intensive, or some combination of these characteristics. POWER3 processor s well balanced design handles these challenges with its eight execution units, wide data paths, non-blocking cache and prefetch engine, and many other features. Two standard benchmarks show the remarkable performance of the POWER3 processor. On the Linpak 1000 X 1000 (TPP) benchmark, the POWER3 processor (200 MHz) runs at 632 MFLOPS per CPU, and on the STREAM Benchmark, the POWER3 processor sustains over 1.1GBps memory bandwidth. The outstanding TPP performance illustrates the ability of the POWER3 processor to sustain close to peak floating point performance, while the STREAM benchmark proves the POWER3 processor's ability to sustain close to peak memory performance. Its SPECfp95 performance of 30.1 shows a combination of these attributes in running an entire application suite. Due to its robust floating-point performance and high memory bandwidth, the POWER3 processor will also provide outstanding graphics performance. The RS/6000* 43P Model 260 with its POWER GTX3000P* Graphics Accelerator and 200 MHz POWER3 processor will yield an industry leading CDRS (OpenGL) benchmark rating of greater than 215 providing leadership performance in many CAD industry applications(1). The POWER3 processor design supports Modified Exclusive Shared Invalid (MESI) snoop-oriented SMP cache coherence along with remote processor bus protocols for increased throughput and large system topologies. The split transaction bus allows it to achieve up to 90% of available data bandwidth running a DAXPY type workload. This flexibility is possible because of IBM s advanced packaging technology which allows for the POWER3 processor s 1088 I/O including 748 signal I/O to maintain the high bandwidth needed to support high frequency processors. PowerPC Architecture 64-bit SMP scalable POWER1 POWER3-III POWER3-II POWER3 200 MHz P2SC+ 270 mm² 160 MHz 256 mm² P2SC 135 MHz POWER2 355 mm² single die 5 chip Deep Blue processor CPU core Up to 500 MHz Figure 6. POWER3 Roadmap Performance The POWER3 processor excels in real application performance precisely because its many facilities POWER3 processor-based RS/6000 systems will set new standards for application performance in the forthcoming years.

5 Rev the Engine Figure 6 shows the future roadmap of the POWER3 processor family. The second design point design is well along in its implementation in IBM s industry leading CMOS7S process, which provides technology performance gains associated with shrinking channel lengths to.18 micron drawn and a reduction in RC delay with the copper interconnect In addition to mapping technology, the POWER3-II processor is planned to improve commercial performance by adding set associative L2 support and fractional bus modes to support the higher frequencies. The technology map and tuning are planned to rapidly scale the POWER3 processor frequency to the 300 to 500 MHz implementations which is planned to achieve 30+ SPECint95 and 70+ SPECfp95. Work is already underway to apply IBM s recently announced SOI technology to the POWER roadmap of products. SOI technology is projected to give higher frequencies while at the same time reducing the power requirements. Summary In Summary, the POWER3 processor is very robust, delivering real performance on real applications for the next generations of RISC System 6000 solutions. It utilizes IBM s superior silicon technology, packaging technology, and microarchitecture and systems expertise to produce systems with outstanding performance in both commercial and technical computing. Any performance data contained in this document was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements quoted in this paper may have been made on development-level systems. Actual results may vary. Users of this paper should verify the applicable data for their specific environment. All benchmark values are provided AS IS and no warranties or guarantees are expressed or implied by IBM. Linpak TPP (Toward Peak Performance) - n=1000 is the array size. The results are measured in MFLOPS. Linpak Benchmarks from: STREAM is a program which J. McCalpin of University of Virginia developed and measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels. The results reported in this paper are the fastest TRIAD program using a uniprocessor machine. STREAM Benchmark from: GPC/OPC results - CDRS-03, DX-03, DRV-04, Light-01 and AW advs-01 are weighted geometric means of individual viewset metrics. The viewsets were developed by ISVs (Independent Software Vendors) with the assistance of OPC (OPENGL Performance Characterization) member companies. Larger values indicate better performance. CDRS Benchmark from: Biographies Mark Papermaster is the Manager of High End Processor Development, Robert Dinkjian is a Senior Technical Staff Member, Michael Mayfield is a Senior Technical Staff Member, Peter Lenk is a Senior Engineer, Raymond DuPont is a Senior Engineer, all in the High End Processor Development Group. Bill Ciarfella is a Senior Engineer and Frank O Connell is a Senior Engineer in the Processor Performance Group. All authors are members of the IBM Server Group, Austin, Texas. References 1. The GXT3000P Graphics Accelerator Notes *PowerPC, PowerPC Architecture, IBM RISC System/6000, RS/6000, POWER GTX3000P, POWER Architecture, POWER2 Architecture are trademarks of the IBM Corporation. IBM may have patents or pending patent applications covering subject matter in this paper. The furnishing of this presentation does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY, USA. All statements regarding IBM s future direction and intent are subject to change or withdraw without notice, and represent goals and objectives only. Contact your IBM local Branch Office or IBM Authorized Reseller for the full text of a specific Statement of General Direction.

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors

PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors PowerPC TM 970: First in a new family of 64-bit high performance PowerPC processors Peter Sandon Senior PowerPC Processor Architect IBM Microelectronics All information in these materials is subject to

More information

Portland State University ECE 588/688. Cray-1 and Cray T3E

Portland State University ECE 588/688. Cray-1 and Cray T3E Portland State University ECE 588/688 Cray-1 and Cray T3E Copyright by Alaa Alameldeen 2014 Cray-1 A successful Vector processor from the 1970s Vector instructions are examples of SIMD Contains vector

More information

PowerPC 740 and 750

PowerPC 740 and 750 368 floating-point registers. A reorder buffer with 16 elements is used as well to support speculative execution. The register file has 12 ports. Although instructions can be executed out-of-order, in-order

More information

Power 7. Dan Christiani Kyle Wieschowski

Power 7. Dan Christiani Kyle Wieschowski Power 7 Dan Christiani Kyle Wieschowski History 1980-2000 1980 RISC Prototype 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) 1993 IBM launches 66MHz POWER2 (.35 um) 1997 POWER2 Super

More information

Portland State University ECE 588/688. IBM Power4 System Microarchitecture

Portland State University ECE 588/688. IBM Power4 System Microarchitecture Portland State University ECE 588/688 IBM Power4 System Microarchitecture Copyright by Alaa Alameldeen 2018 IBM Power4 Design Principles SMP optimization Designed for high-throughput multi-tasking environments

More information

1. PowerPC 970MP Overview

1. PowerPC 970MP Overview 1. The IBM PowerPC 970MP reduced instruction set computer (RISC) microprocessor is an implementation of the PowerPC Architecture. This chapter provides an overview of the features of the 970MP microprocessor

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation SAS Enterprise Miner Performance on IBM System p 570 Jan, 2008 Hsian-Fen Tsao Brian Porter Harry Seifert IBM Corporation Copyright IBM Corporation, 2008. All Rights Reserved. TABLE OF CONTENTS ABSTRACT...3

More information

Inside Intel Core Microarchitecture

Inside Intel Core Microarchitecture White Paper Inside Intel Core Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation

More information

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS TECHNOLOGY BRIEF March 1999 Compaq Computer Corporation ISSD Technology Communications CONTENTS Executive Overview1 Notice2 Introduction 3 8-Way Architecture Overview 3 Processor and I/O Bus Design 4 Processor

More information

MIPS R5000 Microprocessor. Technical Backgrounder. 32 kb I-cache and 32 kb D-cache, each 2-way set associative

MIPS R5000 Microprocessor. Technical Backgrounder. 32 kb I-cache and 32 kb D-cache, each 2-way set associative MIPS R5000 Microprocessor Technical Backgrounder Performance: SPECint95 5.5 SPECfp95 5.5 Instruction Set ISA Compatibility Pipeline Clock System Interface clock Caches TLB Power dissipation: Supply voltage

More information

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

COSC 6385 Computer Architecture - Thread Level Parallelism (I) COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

620 Fills Out PowerPC Product Line

620 Fills Out PowerPC Product Line 620 Fills Out PowerPC Product Line New 64-Bit Processor Aimed at Servers, High-End Desktops by Linley Gwennap MICROPROCESSOR BTAC Fetch Branch Double Precision FPU FP Registers Rename Buffer /Tag Predict

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA

The Alpha Microprocessor: Out-of-Order Execution at 600 Mhz. R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA The Alpha 21264 Microprocessor: Out-of-Order ution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in

More information

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights

The Alpha Microprocessor: Out-of-Order Execution at 600 MHz. Some Highlights The Alpha 21264 Microprocessor: Out-of-Order ution at 600 MHz R. E. Kessler Compaq Computer Corporation Shrewsbury, MA 1 Some Highlights Continued Alpha performance leadership 600 MHz operation in 0.35u

More information

This Material Was All Drawn From Intel Documents

This Material Was All Drawn From Intel Documents This Material Was All Drawn From Intel Documents A ROAD MAP OF INTEL MICROPROCESSORS Hao Sun February 2001 Abstract The exponential growth of both the power and breadth of usage of the computer has made

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

IBM POWER4: a 64-bit Architecture and a new Technology to form Systems

IBM POWER4: a 64-bit Architecture and a new Technology to form Systems IBM POWER4: a 64-bit Architecture and a new Technology to form Systems Rui Daniel Gomes de Macedo Fernandes Departamento de Informática, Universidade do Minho 4710-057 Braga, Portugal ruif@net.sapo.pt

More information

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group

Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group Simultaneous Multi-threading Implementation in POWER5 -- IBM's Next Generation POWER Microprocessor Ron Kalla, Balaram Sinharoy, Joel Tendler IBM Systems Group Outline Motivation Background Threading Fundamentals

More information

Next Generation Technology from Intel Intel Pentium 4 Processor

Next Generation Technology from Intel Intel Pentium 4 Processor Next Generation Technology from Intel Intel Pentium 4 Processor 1 The Intel Pentium 4 Processor Platform Intel s highest performance processor for desktop PCs Targeted at consumer enthusiasts and business

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Perspectives on the Memory Wall. John D. McCalpin, Ph.D IBM Global Microprocessor Development Austin, TX

Perspectives on the Memory Wall. John D. McCalpin, Ph.D IBM Global Microprocessor Development Austin, TX Perspectives on the Memory Wall John D. McCalpin, Ph.D IBM Global Microprocessor Development Austin, TX The Memory Wall In December, 1994, Bill Wulf and Sally McKee published a short paper: Hitting the

More information

Microelectronics. Moore s Law. Initially, only a few gates or memory cells could be reliably manufactured and packaged together.

Microelectronics. Moore s Law. Initially, only a few gates or memory cells could be reliably manufactured and packaged together. Microelectronics Initially, only a few gates or memory cells could be reliably manufactured and packaged together. These early integrated circuits are referred to as small-scale integration (SSI). As time

More information

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 10: Memory Part II CSC 631: High-Performance Computer Architecture 1 Two predictable properties of memory references: Temporal Locality:

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

All About the Cell Processor

All About the Cell Processor All About the Cell H. Peter Hofstee, Ph. D. IBM Systems and Technology Group SCEI/Sony Toshiba IBM Design Center Austin, Texas Acknowledgements Cell is the result of a deep partnership between SCEI/Sony,

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

Advanced cache optimizations. ECE 154B Dmitri Strukov

Advanced cache optimizations. ECE 154B Dmitri Strukov Advanced cache optimizations ECE 154B Dmitri Strukov Advanced Cache Optimization 1) Way prediction 2) Victim cache 3) Critical word first and early restart 4) Merging write buffer 5) Nonblocking cache

More information

Digital Leads the Pack with 21164

Digital Leads the Pack with 21164 MICROPROCESSOR REPORT THE INSIDERS GUIDE TO MICROPROCESSOR HARDWARE VOLUME 8 NUMBER 12 SEPTEMBER 12, 1994 Digital Leads the Pack with 21164 First of Next-Generation RISCs Extends Alpha s Performance Lead

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Power Technology For a Smarter Future

Power Technology For a Smarter Future 2011 IBM Power Systems Technical University October 10-14 Fontainebleau Miami Beach Miami, FL IBM Power Technology For a Smarter Future Jeffrey Stuecheli Power Processor Development Copyright IBM Corporation

More information

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures

Like scalar processor Processes individual data items Item may be single integer or floating point number. - 1 of 15 - Superscalar Architectures Superscalar Architectures Have looked at examined basic architecture concepts Starting with simple machines Introduced concepts underlying RISC machines From characteristics of RISC instructions Found

More information

A brief History of INTEL and Motorola Microprocessors Part 1

A brief History of INTEL and Motorola Microprocessors Part 1 Eng. Guerino Mangiamele ( Member of EMA) Hobson University Microprocessors Architecture A brief History of INTEL and Motorola Microprocessors Part 1 The Early Intel Microprocessors The first microprocessor

More information

Freescale Semiconductor, I

Freescale Semiconductor, I Copyright (c) Institute of Electrical Freescale and Electronics Semiconductor, Engineers. Reprinted Inc. with permission. This material is posted here with permission of the IEEE. Such permission of the

More information

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU 1-6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Product Overview Introduction 1. ARCHITECTURE OVERVIEW The Cyrix 6x86 CPU is a leader in the sixth generation of high

More information

Alpha AXP Workstation Family Performance Brief - OpenVMS

Alpha AXP Workstation Family Performance Brief - OpenVMS DEC 3000 Model 500 AXP Workstation DEC 3000 Model 400 AXP Workstation INSIDE Digital Equipment Corporation November 20, 1992 Second Edition EB-N0102-51 Benchmark results: SPEC LINPACK Dhrystone X11perf

More information

Parallel Computer Architecture

Parallel Computer Architecture Parallel Computer Architecture What is Parallel Architecture? A parallel computer is a collection of processing elements that cooperate to solve large problems fast Some broad issues: Resource Allocation:»

More information

MIPS R4300I Microprocessor. Technical Backgrounder-Preliminary

MIPS R4300I Microprocessor. Technical Backgrounder-Preliminary MIPS R4300I Microprocessor Technical Backgrounder-Preliminary Table of Contents Chapter 1. R4300I Technical Summary... 3 Chapter 2. Overview... 4 Introduction... 4 The R4300I Microprocessor... 5 The R4300I

More information

PowerPC 620 Case Study

PowerPC 620 Case Study Chapter 6: The PowerPC 60 Modern Processor Design: Fundamentals of Superscalar Processors PowerPC 60 Case Study First-generation out-of-order processor Developed as part of Apple-IBM-Motorola alliance

More information

IBM's POWER5 Micro Processor Design and Methodology

IBM's POWER5 Micro Processor Design and Methodology IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

A Superscalar RISC Processor with 160 FPRs for Large Scale Scientific Processing

A Superscalar RISC Processor with 160 FPRs for Large Scale Scientific Processing A Superscalar RISC Processor with 160 FPRs for Large Scale Scientific Processing Kentaro Shimada *1, Tatsuya Kawashimo *1, Makoto Hanawa *1, Ryo Yamagata *2, and Eiki Kamada *2 *1 Central Research Laboratory,

More information

The World s First Seventh-Generation x86 Processor: Delivering the Ultimate Performance for Cutting-Edge Software Applications

The World s First Seventh-Generation x86 Processor: Delivering the Ultimate Performance for Cutting-Edge Software Applications AMD Athlon Processor Architecture The World s First Seventh-Generation x86 Processor: Delivering the Ultimate Performance for Cutting-Edge Software Applications ADVANCED MICRO DEVICES, INC. One AMD Place

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

UltraSparc-3 Aims at MP Servers

UltraSparc-3 Aims at MP Servers UltraSparc-3 Aims at MP Servers Sun s Next Speed Demon Handles 11.2 Gbytes/s of Chip I/O Bandwidth by Peter Song Kicking its processor clock speeds into a higher gear, Sun disclosed that its next-generation

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance

Computer Organization. 8 th Edition. Chapter 2 p Computer Evolution and Performance William Stallings Computer Organization and Architecture 8 th Edition Chapter 2 p Computer Evolution and Performance ENIAC - background Electronic Numerical Integrator And Computer Eckert and Mauchly University

More information

POWER7: IBM's Next Generation Server Processor

POWER7: IBM's Next Generation Server Processor Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Chapter 18 Parallel Processing

Chapter 18 Parallel Processing Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD

More information

The T0 Vector Microprocessor. Talk Outline

The T0 Vector Microprocessor. Talk Outline Slides from presentation at the Hot Chips VII conference, 15 August 1995.. The T0 Vector Microprocessor Krste Asanovic James Beck Bertrand Irissou Brian E. D. Kingsbury Nelson Morgan John Wawrzynek University

More information

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen

Uniprocessors. HPC Fall 2012 Prof. Robert van Engelen Uniprocessors HPC Fall 2012 Prof. Robert van Engelen Overview PART I: Uniprocessors and Compiler Optimizations PART II: Multiprocessors and Parallel Programming Models Uniprocessors Processor architectures

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

The UltraSPARC -IIi Processor. Technology White Paper

The UltraSPARC -IIi Processor. Technology White Paper The UltraSPARC -IIi Processor Technology White Paper 1997, 1998 Sun Microsystems, Inc. All rights reserved. Printed in the United States of America. 901 San Antonio Road, Palo Alto, California 94303 U.S.A

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

The PowerPC RISC Family Microprocessor

The PowerPC RISC Family Microprocessor The PowerPC RISC Family Microprocessors In Brief... The PowerPC architecture is derived from the IBM Performance Optimized with Enhanced RISC (POWER) architecture. The PowerPC architecture shares all of

More information

SGI Challenge Overview

SGI Challenge Overview CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 (Case Studies) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived

More information

IBM Single Chip RISC Processor (RSC)

IBM Single Chip RISC Processor (RSC) IBM Single Chip RISC Processor (RSC) C. R. Moore, D. M. Baker, J.S. Muhich, and R.E. East Advanced Workstation Division International Business Machines Corporation Austin, Texas Abstract A highly in.d

More information

Digital Semiconductor Alpha Microprocessor Product Brief

Digital Semiconductor Alpha Microprocessor Product Brief Digital Semiconductor Alpha 21164 Microprocessor Product Brief March 1995 Description The Alpha 21164 microprocessor is a high-performance implementation of Digital s Alpha architecture designed for application

More information

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization Computer Architecture Computer Architecture Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr nizamettinaydin@gmail.com Parallel Processing http://www.yildiz.edu.tr/~naydin 1 2 Outline Multiple Processor

More information

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel.

A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. Multiprocessor Systems A Multiprocessor system generally means that more than one instruction stream is being executed in parallel. However, Flynn s SIMD machine classification, also called an array processor,

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any a performance

More information

Lecture 8: RISC & Parallel Computers. Parallel computers

Lecture 8: RISC & Parallel Computers. Parallel computers Lecture 8: RISC & Parallel Computers RISC vs CISC computers Parallel computers Final remarks Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation in computer

More information

by M. T. Vaden L. J. Merkel C. R. Moore J. Reese Potter

by M. T. Vaden L. J. Merkel C. R. Moore J. Reese Potter Design considerations T. R. M. for the PowerPC 601 microprocessor by M. T. Vaden L. J. Merkel C. R. Moore J. Reese Potter The PowerPC 601 microprocessor (601) is the first member of a family of processors

More information

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache. Multiprocessors why would you want a multiprocessor? Multiprocessors and Multithreading more is better? Cache Cache Cache Classifying Multiprocessors Flynn Taxonomy Flynn Taxonomy Interconnection Network

More information

POWER3: The next generation of PowerPC processors

POWER3: The next generation of PowerPC processors POWER3: The next generation of PowerPC processors by F. P. O Connell S. W. White The POWER3 processor is a high-performance microprocessor which excels at technical computing. Designed by IBM and deployed

More information

TUNING CUDA APPLICATIONS FOR MAXWELL

TUNING CUDA APPLICATIONS FOR MAXWELL TUNING CUDA APPLICATIONS FOR MAXWELL DA-07173-001_v6.5 August 2014 Application Note TABLE OF CONTENTS Chapter 1. Maxwell Tuning Guide... 1 1.1. NVIDIA Maxwell Compute Architecture... 1 1.2. CUDA Best Practices...2

More information

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4. Chapter 4: CPU 4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.8 Control hazard 4.14 Concluding Rem marks Hazards Situations that

More information

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Lecture 7: Implementing Cache Coherence. Topics: implementation details Lecture 7: Implementing Cache Coherence Topics: implementation details 1 Implementing Coherence Protocols Correctness and performance are not the only metrics Deadlock: a cycle of resource dependencies,

More information

Chapter 18. Parallel Processing. Yonsei University

Chapter 18. Parallel Processing. Yonsei University Chapter 18 Parallel Processing Contents Multiple Processor Organizations Symmetric Multiprocessors Cache Coherence and the MESI Protocol Clusters Nonuniform Memory Access Vector Computation 18-2 Types

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Case Study IBM PowerPC 620

Case Study IBM PowerPC 620 Case Study IBM PowerPC 620 year shipped: 1995 allowing out-of-order execution (dynamic scheduling) and in-order commit (hardware speculation). using a reorder buffer to track when instruction can commit,

More information

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem) White Paper First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem) Introducing a New Dynamically and Design- Scalable Microarchitecture that Rewrites the Book On Energy Efficiency

More information

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 03. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 03 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 3.3 Comparison of 2-bit predictors. A noncorrelating predictor for 4096 bits is first, followed

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner CS104 Computer Organization and rogramming Lecture 20: Superscalar processors, Multiprocessors Robert Wagner Faster and faster rocessors So much to do, so little time... How can we make computers that

More information

06-2 EE Lecture Transparency. Formatted 14:50, 4 December 1998 from lsli

06-2 EE Lecture Transparency. Formatted 14:50, 4 December 1998 from lsli 06-1 Vector Processors, Etc. 06-1 Some material from Appendix B of Hennessy and Patterson. Outline Memory Latency Hiding v. Reduction Program Characteristics Vector Processors Data Prefetch Processor /DRAM

More information

Intel Architecture for Software Developers

Intel Architecture for Software Developers Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist POWER9 Announcement Martin Bušek IBM Server Solution Sales Specialist Announce Performance Launch GA 2/13 2/27 3/19 3/20 POWER9 is here!!! The new POWER9 processor ~1TB/s 1 st chip with PCIe4 4GHZ 2x Core

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information