Many-core back to the future. Matt Horsnell ARM Research and Development

Size: px
Start display at page:

Download "Many-core back to the future. Matt Horsnell ARM Research and Development"

Transcription

1 Many-core back to the future Matt Horsnell ARM Research and Development 1

2 Outline Introduction How we got to multi-core Focus on multi-core Evolution to many-core? Predicting the future microprocessor? Why I believe it s a great time to be a computer architect Wrap-up 2

3 About me MEng Computer Science, 2003 Final year project: multi-core processors (David May) PhD Computer Science, A Chip Multi-Cluster architecture with locality aware taskdistribution. Research Associate, Object-based Hardware Transaction Memory. Software Engineer, 2010 Clock-tree synthesis and clock-concurrent optimization. Research and Development Architecture and micro-architecture. 3

4 ARM The leading silicon IP company Leading 32 bit RISC architecture Leading Physical IP ~8Bn ARM chips shipped in 2011 Design and license CPUs, GPUs and associated IP > 920 CPU licences > 275 licensees ~1500 designs/year use ARM physical IP 2400 people world-wide >30 locations LSE (FTSE100) and NASDAQ, > 10Bn *Image from FastCompany s 50 most innovative companies 2011 (#12) 4

5 ARM Business model Technology energy-efficient chips 2-3 years to design a processor 2-3 designs introduced per year Range of design points Range of end-markets Partners select a design License fee to access design 3-4 years integrating into SoC Royalty fee per chip 20+ years reuse 5

6 002: Ubiquitous Environments ARM - 1mm 3 to 1km 3 *NSF and University of Wisconsin-Madison *University of Michigan * Samsung Galaxy S3 * Google Nexus 10 1mm 3 1km 3 10c $1000 A B C 6 Cortex-M0; 65 University of Michigan 8.75mm 3 system 2 x solar cell 0.18µm CM Cortex -M3 near-thresh 12µAh Li-ion battery *NXP Cortex-M0 * Fujitsu Calypso Cortex-R4 * Samsung Exynos Cortex-A15 * ARM Cortex-A57

7 ARM - Partnership Momentum ARM Architecture is the number one by volume ARM is growing into new markets and product categories 920 Licenses, 300 Partners, ~50% Shipping Billion Cumulative Billion Cumulative Billion Cumulative 7

8 ARM Research and Development Focus on technologies impacting +10 years from now product pipeline is typically 7 years Always interesting and challenging work research ideas become products can make an influence on the whole industry 8

9 The state of micro-architecture, or how we got to MULTI-CORE 9

10 Feature Size Moore s Law 1.5 um 1.0 um 0.68 um 0.50 um 0.35 um 0.25 um 0.18 um 0.13 um 90 nm 65 nm 45 nm 32 nm Intel transistors 108 KHz Intel Pentium 3.1M transistors 66 MHz Intel Corei7 1.4B transistors 3GHz doubling the number of transistors, economically placed on a chip, every 2 years doubling performance every 2 years *data sourced from 10

11 Dennard Scaling 1974 Robert Dennard at IBM MOSFETs continue to function as voltage-controlled switches while all key figures of merit such as layout density, operating speed, and energy efficiency improve provided geometric dimensions, voltages, and doping concentrations are consistently scaled to maintain the same electric field. 30% dimension shrink 50% area shrink 40% performance increase maintain a 30% supply voltage drop ~50% power reduction 40% faster, 2x transistors, constant power 11

12 Frequency (MHz) Frequency x *data sourced from 12

13 Performance vs. Dennard <2 orders orders Feature Size (um) *data sourced from 13

14 Increase (X) Micro-architecture gains On-die cache, pipelining Super-scalar OOO-Speculative Deep pipeline, Replay, Trace-cache Back to non-deep pipeline * Borkar et. al, The future of the microprocessor, ACM Comms

15 Normalized Performance Pollack s rule Normalized Core Area *data sourced from 15

16 End of Dennard Scaling Classic scaling ended at 130nm Parameter (scale factor = a) Classic Scaling Dimensions 1/a 1/a Voltage 1/a ~1 Current 1/a 1/a Current Scaling Capacitance (A/t) 1/a >1/a Power/Circuit (V.I) 1/a 2 1/a Power Density (VI/A) 1 Delay Circuit 1/a ~1 a Innovation needed to keep driving Moore s Law Materials and Lithography 16

17 Power 2000 Nuclear reactor Hot Plate mw/mm infeasible to economically dissipate heat beyond 800mW/mm 2 *data sourced from 17

18 CPU Clocks/DRAM Latency Memory DRAM density increases with Moore s Law Speed increases far slower Typically 100s of cycles for a memory access to DRAM Hidden to date by caches and bandwidth * Borkar et. al, The future of the microprocessor, ACM Comms

19 On-die cache % of total die area On-die cache (KB) Cache um 0.5um 0.25um 0.13um 65nm energy concerns and inefficient μ-arch led to more cache for efficiency cache sizes increased slowly decreasing die area given to $ most transistors core μ-arch um 0.5um 0.25um 0.13um 65nm * Borkar et. al, The future of the microprocessor, ACM Comms

20 Trend Summary Process scaling continues to follow Moore s Law ITRS suggests 7nm in 2024, Intel 5nm ~2020 Power budget remains constant Frequency frequency increases stopped in years of increasing beyond Dennard scaling increased power Micro-architectural techniques ILP speculation increases power practical pipelining limits <FO4 delays increases power Design complexity Performance Memory wall remains so larger caches 20

21 A focus on MULTI-CORE 21

22 Multi-core Performance increases drive the microprocessor industry performance enables new applications performance enables new markets performance enables new form factors Power fundamentally prevents more frequency scaling performance must come from parallelism ILP already exploited in single core microprocessors Must exploit thread-level parallelism* Multiple cores on the same die gave the industry a new road map Moore s Law now applied to doubling cores every 2 years 22

23 MPCore Power Single CPU unused processors turned off 260MHz, consumes ~160mW Dual-CPU (same MHz, same Vt) Same workload single-threaded, concurrency from OS only could lower MHz (V 2 power saving) Lower power in dual-cpu at same MHz Reduced context switching Increase in cache effectiveness With threaded code, MP offers more performance at lower MHz 23

24 IO coherent devices ARM MPCORE ARM roadmap introduced ARM11 MPCore 2003, shipped ARM-Cortex A-class MPCore roadmap : A8, A9, A5, A15 Quad Cortex-A15 MPCore Quad Cortex-A15 MPCore A15 A15 A15 A15 A15 A15 A15 A15 Processor Coherency (SCU) Up to 4MB L2 cache Processor Coherency (SCU) Up to 4MB L2 cache 128-bit AMBA bit AMBA 4 CoreLink CCI-400 Cache Coherent Interconnect MMU-400 System MMU 24

25 Cortex-A15 MPCore Cortex-A15 uses the similar MP model to Cortex-A9/A5 Data side cache coherence maintained by Snoop Control Unit (SCU) Two ACE master ports and optional ACP slave interface supporting coherent data transfers to non-cached external devices. Integrated GIC Interrupt controller and timer-watchdog units Tightly integrated L2 cache APB APB APB APB ETM v7 Debug ETM v7 Debug ETM v7 Debug ETM v7 Debug STB DPU PFU DPU PFU DPU PFU DPU PFU DuTLB IuTLB DuTLB IuTLB DuTLB IuTLB DuTLB IuTLB A15 Cortex-A5 A15 Core Cortex-A5 A15 Core Cortex-A5 DCU Main TLB ICU STB DCU Main TLB ICU STB DCU Main TLB ICU STB DCUA15 Main TLB Core ICU CP15 CP15 CP15 CP15 Cortex-A5 Core BIU BIU BIU BIU IRQ[n] GIC Timer Duplicate Tags Timer Timer Timer Snoop Controller Unit DDI ACP L2 Cache ACE AXI 25

26 Snoop Control Unit Self-sizing intelligent block to join multiple CPU together Manages the impact and control over support of the coherence protocol Support cache-2-cache transference, monitors for migratory lines Arbitrates multiple CPU s across 1 or 2 load balanced system AXI bus Manages adaptive power down and interfaces with system power controller Decodes and manages private peripheral access Runs at CPU frequency 26

27 New Capabilities in the Cortex-A15 Full compatibility with the Cortex-A9/A5/A8 Supporting the ARMv7 Architecture Addition of Virtualization Extension (VE) Run multiple OS binary instances simultaneously Isolates multiple work environments and data Supporting Large Physical Addressing Extensions (LPAE) Ability to use up to 1TB of physical memory With AMBA 4 System Coherency (AMBA-ACE) Other cached devices can be coherent with processor Many-core multiprocessor scalability Basis of concurrent big.little Processing 27

28 Evolution to MANY-CORE? 28

29 Evolution to Many-Core Base theorem Simpler and smaller processor designs require far less energy to accomplish same amount of compute as a more complex and larger processor design. Approximate rule of thumb held within ARM To increase performance 50% you double the power and area cost of the processor design Quickly reaches point of diminishing returns 29

30 Power Micro-architecture trade-offs High switching power at high performance points big core High-leakage power at low performance points ideal high dynamic range core small core Performance 30

31 Strategy Focus: The Thermal Wall SOC sustained power is limited in mobile devices by thermals; 1.5W to 2W with low-cost POP and stacked memories Power 3W without stacked memories Burst for responsiveness (e.g. Browsing) T >= Tjmax, Tskin Responsiveness is a must Complex active management is needed Opportunistic Residency Managed Sustained Power Tj >= Tmax Tj < Tmax Un-managed Max Power (@Tjmax) Sustained performance (e.g. HD Video Record, Gaming) Power Optimised Low End (e.g. , Voice, MP3) Time 31

32 Use Case Typical day for heavy smartphone user 90 mins voice calls 60 mins 30 mins reading web 30 mins watching HW-accelerated video 50 mins playing Angry Birds or similar 90 mins jogging, listening to MP3s and logging GPS coordinates 10 mins video recording/photo capture 7 hrs sleep with music alarm clock OS typically executing ~28 active processes background synchronizing 32

33 Use Case Measurements 33

34 Use Case Conclusion 34

35 Multiprocessing Capable Many-core Benefits 35

36 ARM s big processor Cortex-A15 Processor announced September Core MP configurable Dual-cores shipped Oct 12 Advanced Capabilities Full ARMv7A architecture Thumb -2, Trustzone, VFP, Neon Virtualization, LPAE AMBA 4 ACE Coherency Shipped Oct 12 Samsung Exynos GHz Products Nexus10 Samsung Chromebook High Performance Up to 1.5GHz for mobile on 28nm 36

37 ARM LITTLE Processor Cortex-A7 Processor announced October Core MP configurable Same Advanced Capabilities Full ARMv7A architecture Thumb -2, Trustzone, VFP, Neon Virtualization, LPAE AMBA 4 ACE Coherency ISA identical to Cortex-A15 Cortex A-7 & Cortex-A15 testchip taped out in Q4 11 Performance efficiency exceeded expectations High Performance Up to 1.2GHz in mobile 37

38 Comparison of big.little Pipelines Cortex-A7 Pipeline Focused on energy efficiency 8-11 stages, in-order, limited dual-issue Cortex-A15 Pipeline Focused on efficient peak performance 15+ stages, out-of-order, multi-issue 38

39 Size Matters Large silicon area costs: Less die per wafer Higher yield impact from silicon imperfections Higher leakage power whenever power applied Typically contains more transistor raising dynamic switching power Not necessarily providing increased performance if gates are required to support architectural complexity rather than instruction execution ARM s LITTLE processor Single Core Cortex-A7 (incl. NEON, FPU, 32kB L1) Device with comparable performance 0.45 mm 2 in 28nm 39

40 Performance Comparison 40

41 Power Efficiency Comparison 41

42 Extending DVFS DVFS sweep over entire operational voltage range of ARM s first big.little processor pair 42

43 Software Use Models big.little switching one CPU active switch between A15 and A7 depending on performance requirements big.little MP both CPUs can be active allocate threads that need high-performance to A15 allocate threads that don t need high-performance, but benefit from best energy efficiency to A7 AMBA 4 hardware coherency between A15 and A7 43

44 Predicting the future MICROPROCESSOR 44

45 A word on prediction 1989 Microprocessors circa 2000 IEEE Spectrum, Gelsinger, P., Intel M transistors, 250MHz no. of transistors 20% over, frequency 2x under, performance 4-8x under 1996 The future of micro-processors IEEE Micro, Yu, A., Intel M transistors, 4GHz no. of transistors 2x over, frequency 5% over predicted the power wall without breakthrough voltage scaling understandably missed the trend to mobile computing 2005 The future of micro-processors ACM Queue, Olukotun et. al, Stanford. no predictions just a discussion of CMPs 2011 The future of micro-processors Comms. ACM, Borkar et. al, Intel. Moore s law continues μ-arch goes beyond homogeneous parallelism, exploit heterogeneity, exploit custom logic software must be able to take advantage of it Various by we ll have 100, 1000, 10,000, 100,000 cores 45

46 Evolution of Mobile Performance Cloud ARM + MP + MP + MP ARM MHz & uarch ARM + MP ARM ARM MHz & uarch ARM + MP GPU ARM ARM HW MHz Architecture U-Architecture MHz Architecture U-Architecture MP MHz Architecture U-Architecture MP Multi-Performance GPGPU MHz Architecture U-Architecture MP Multi-Performance GPGPU Heterogeneity* Domain Specific Off-Load Future? 46

47 Wireless Baseband Packet Processing Subsystem H.265 Video Dec & Enc Image Proc/ recognition Display Near Future Smartphone Peak: 2.5Gb/s Avg: 400Mb/s Peak: 10Gb/s 5 G wireless Graphics & Composition Subsystem Ext Display 4K 240fps Native Display 1080p 120fps Peak: 1Gb/s Power Manager Applications Subsystem Memory Interface 22MP 4K 13Wh Battery 16GB 100GB/s 512GB 47

48 System scaling Today Near Future Increase Notes Cellular 20Mbps 400Mbps 20X Wifi 300Mbps 10Gbps 30X Display 720P 4K 17X Video 720P H.264 4K H X-102X Battery 5.7W 13W 2.2X Assumes constant complexity - 2-4X more H.256 complexity Input Compute Output 20-30x??X 17X 68x How does compute scale in a power and thermal constrained environment? 48

49 Dark Silicon Node 45nm 22nm 11nm Year Area Peak freq Lack 1 of power scaling 1.6 severely limits 2.4 Source: ITRS 2008 Power 1 the complexity 1 of systems! 0.6 (4 x 1) -1 = 25% (16 x 0.6) -1 = 10% Exploitable Si (in 45nm power budget) 10% 25% Source: ITRS

50 The Many-core wall Figure 1: Overview of the models and the methodology * Eseilzadeh et el, ISCA 11 50

51 The Many-core wall? assumes conservative scaling assumes ITRS scaling * Eseilzadeh et el, ISCA 11 Eseilzadeh et el predict the end of multi-core scaling at the 16nm node as early as

52 Increasing Heterogeneity big.little How far can the specialization of micro-architecture improve energy efficiency within a common instruction set architecture? GP/GPU Exposing the compute capability of GPU through a general purpose language OpenCL available on mobile parts (Samsung Chromebook, Nexus 10) HSA foundation My biggest question... How can the benefits of homogeneity in a programming environment be maintained with this increasing heterogeneity? 52

53 ASIC vs. General Purpose ASIC efficiency far greater than [Hameed et al, ISCA 10] x more energy efficient 50x more performance Difficult to identify targets in the general case beyond the obvious candidates (audio, video, crypto, packet) what granularity to target? QScores target multiple general purpose computations [Venkatesh et. al, MICRO 11] Integration how to offload, how to handle contention static compile time target, or dynamic raises the software bar even higher 53

54 a word on software Finding thread level parallelism is hard. * Blake et. al, ISCA 10 Performance portable ensuring optimal performance becomes exponentially harder does this finally mandate runtimes and virtual machines? 54

55 and there s more Fault tolerance how to design for and overcome hard or soft faults Leakage avoidance system designed for power off software designed to enable more power off Bandwidth feeding more and more cores requires huge off-chip bandwidth power and latency concerns 55

56 Why I believe its an interesting time to be a COMPUTER ARCHITECT 56

57 Energy proportional computing Reality is a finite (fixed) energy budget need to re-evaluate architecture and implementation 90/10 rule becomes 10x10 special purpose function acceleration Energy proportional computing becomes the goal in the near term multi-core will likely become many-core many-core will certainly be heterogeneous identifying accelerators becomes necessary Software agnostic performance portable, interfaces, parallel code An new era of rapid dynamics within computer architecture solutions will change at a much quicker tempo 57

58 Something cool transaction elimination Bandwidth to memory = Power Loosely speaking power budget for GPU is ~1W 150pJ to read/write a byte from memory 2x32 LPDDR2 peaks at 4-8GB/s 150pJ * 8GB = 1.2 W Mali-T604 transaction elimination compute a checksum/hash for each completed tile write out tile only if checksum changes trades-off extra compute for reduction of bandwidth 58

59 Memory transaction elimination * 59

60 Conclusion Power constrained micro-architecture is challenging long held design principles Although processes will continue to scale may not be economically sound to increase transistors on a chip unless they can be put to good work The multi-core era is already heterogeneous more heterogeneity not just in the compute over-provisioning of transistors makes accelerators likely Innovation to reduce power at all levels in the micro-architecture compute, interconnect, software, silicon 60

61 Fin Questions? Always looking for good candidates ARM University program Cortex-A programming guide* 61

62 62 BACK-UP

63 FO4/cycle FO4 delay

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of

More information

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers William Stallings Computer Organization and Architecture 8 th Edition Chapter 18 Multicore Computers Hardware Performance Issues Microprocessors have seen an exponential increase in performance Improved

More information

ARM big.little Technology Unleashed An Improved User Experience Delivered

ARM big.little Technology Unleashed An Improved User Experience Delivered ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits

More information

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design

More information

Microprocessor Trends and Implications for the Future

Microprocessor Trends and Implications for the Future Microprocessor Trends and Implications for the Future John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 4 1 September 2016 Context Last two classes: from

More information

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration

More information

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

3D Graphics in Future Mobile Devices. Steve Steele, ARM

3D Graphics in Future Mobile Devices. Steve Steele, ARM 3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and

More information

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

All Programmable: from Silicon to System

All Programmable: from Silicon to System All Programmable: from Silicon to System Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates Variability Page 3 Industry Debates on Cost Page 4

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Growth outside Cell Phone Applications

Growth outside Cell Phone Applications ARM Introduction Growth outside Cell Phone Applications ~1B units shipped into non-mobile applications Embedded segment now accounts for 13% of ARM shipments Automotive, microcontroller and smartcards

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Next Generation Enterprise Solutions from ARM

Next Generation Enterprise Solutions from ARM Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the

More information

The ARM Cortex-A9 Processors

The ARM Cortex-A9 Processors The ARM Cortex-A9 Processors This whitepaper describes the details of the latest high performance processor design within the common ARM Cortex applications profile ARM Cortex-A9 MPCore processor: A multicore

More information

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye Negotiating the Maze Getting the most out of memory systems today and tomorrow Robert Kaye 1 System on Chip Memory Systems Systems use external memory Large address space Low cost-per-bit Large interface

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Stefan Rosinger Director, Product Management Arm Arm TechCon 2017 Agenda Market growth and trends DynamIQ

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Module 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT

Module 18: TLP on Chip: HT/SMT and CMP Lecture 39: Simultaneous Multithreading and Chip-multiprocessing TLP on Chip: HT/SMT and CMP SMT TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012

More information

Many-Core Computing Era and New Challenges. Nikos Hardavellas, EECS

Many-Core Computing Era and New Challenges. Nikos Hardavellas, EECS Many-Core Computing Era and New Challenges Nikos Hardavellas, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 2007 45nm 2010 32nm 2013 22nm 2016 16nm 2019

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 22

ECE 571 Advanced Microprocessor-Based Design Lecture 22 ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ

More information

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc. On-chip Networks Enable the Dark Silicon Advantage Drew Wingard CTO & Co-founder Sonics, Inc. Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits

More information

ARM the Company ARM the Research Collaborator

ARM the Company ARM the Research Collaborator UMIC Day 13 ARM the Company ARM the Research Collaborator John Goodacre Director Technology and Systems Aachen 15 th October 2013 1 The ARM Vision A world where all electronic products and services are

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris

More information

Microelettronica. J. M. Rabaey, "Digital integrated circuits: a design perspective" EE141 Microelettronica

Microelettronica. J. M. Rabaey, Digital integrated circuits: a design perspective EE141 Microelettronica Microelettronica J. M. Rabaey, "Digital integrated circuits: a design perspective" Introduction Why is designing digital ICs different today than it was before? Will it change in future? The First Computer

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Fundamentals of Computer Design

Fundamentals of Computer Design CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining

More information

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation

More information

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS

Exploiting Dark Silicon in Server Design. Nikos Hardavellas Northwestern University, EECS Exploiting Dark Silicon in Server Design Nikos Hardavellas Northwestern University, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 45nm 32nm 22nm 16nm

More information

EE241 - Spring 2004 Advanced Digital Integrated Circuits

EE241 - Spring 2004 Advanced Digital Integrated Circuits EE24 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolić Lecture 2 Impact of Scaling Class Material Last lecture Class scope, organization Today s lecture Impact of scaling 2 Major Roadblocks.

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

The Challenges of System Design. Raising Performance and Reducing Power Consumption

The Challenges of System Design. Raising Performance and Reducing Power Consumption The Challenges of System Design Raising Performance and Reducing Power Consumption 1 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2 Product Challenge - Software

More information

Chapter 5. Introduction ARM Cortex series

Chapter 5. Introduction ARM Cortex series Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1

More information

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components By William Orme, Strategic Marketing Manager, ARM Ltd. and Nick Heaton, Senior Solutions Architect, Cadence Finding

More information

KeyStone II. CorePac Overview

KeyStone II. CorePac Overview KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the

More information

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem.

Power dissipation! The VLSI Interconnect Challenge. Interconnect is the crux of the problem. Interconnect is the crux of the problem. The VLSI Interconnect Challenge Avinoam Kolodny Electrical Engineering Department Technion Israel Institute of Technology VLSI Challenges System complexity Performance Tolerance to digital noise and faults

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 24

ECE 571 Advanced Microprocessor-Based Design Lecture 24 ECE 571 Advanced Microprocessor-Based Design Lecture 24 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 25 April 2013 Project/HW Reminder Project Presentations. 15-20 minutes.

More information

Amber Baruffa Vincent Varouh

Amber Baruffa Vincent Varouh Amber Baruffa Vincent Varouh Advanced RISC Machine 1979 Acorn Computers Created 1985 first RISC processor (ARM1) 25,000 transistors 32-bit instruction set 16 general purpose registers Load/Store Multiple

More information

BREAKING THE MEMORY WALL

BREAKING THE MEMORY WALL BREAKING THE MEMORY WALL CS433 Fall 2015 Dimitrios Skarlatos OUTLINE Introduction Current Trends in Computer Architecture 3D Die Stacking The memory Wall Conclusion INTRODUCTION Ideal Scaling of power

More information

Lecture 2: Performance

Lecture 2: Performance Lecture 2: Performance Today s topics: Technology wrap-up Performance trends and equations Reminders: YouTube videos, canvas, and class webpage: http://www.cs.utah.edu/~rajeev/cs3810/ 1 Important Trends

More information

VLSI Design Automation. Maurizio Palesi

VLSI Design Automation. Maurizio Palesi VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 Outline Technology trends VLSI Design flow (an overview) 3 IC Products Processors CPU, DSP, Controllers Memory chips

More information

EE586 VLSI Design. Partha Pande School of EECS Washington State University

EE586 VLSI Design. Partha Pande School of EECS Washington State University EE586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 1 (Introduction) Why is designing digital ICs different today than it was before? Will it change in

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Sept. 5 th : Homework 1 release (due on Sept.

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica VLSI Design Automation 1 Outline Technology trends VLSI Design flow (an overview) 2 IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing

More information

ARMv8-A Software Development

ARMv8-A Software Development ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for

More information

The Implications of Multi-core

The Implications of Multi-core The Implications of Multi- What I want to do today Given that everyone is heralding Multi- Is it really the Holy Grail? Will it cure cancer? A lot of misinformation has surfaced What multi- is and what

More information

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101

Lecture 21: Parallelism ILP to Multicores. Parallel Processing 101 18 447 Lecture 21: Parallelism ILP to Multicores S 10 L21 1 James C. Hoe Dept of ECE, CMU April 7, 2010 Announcements: Handouts: Lab 4 due this week Optional reading assignments below. The Microarchitecture

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

Parallelism Marco Serafini

Parallelism Marco Serafini Parallelism Marco Serafini COMPSCI 590S Lecture 3 Announcements Reviews First paper posted on website Review due by this Wednesday 11 PM (hard deadline) Data Science Career Mixer (save the date!) November

More information

HPC Technology Trends

HPC Technology Trends HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Embedded Systems: Architecture

Embedded Systems: Architecture Embedded Systems: Architecture Jinkyu Jeong (Jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ICE3028: Embedded Systems Design, Fall 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd

Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd 1 Topics The Mobile Computing Platform The Application Processor CMOS Power Model Multicore Software: Complexity & Scaling Conclusion

More information

Building Ultra-Low Power Wearable SoCs

Building Ultra-Low Power Wearable SoCs Building Ultra-Low Power Wearable SoCs 1 Wearable noun An item that can be worn adjective Easy to wear, suitable for wearing 2 Wearable Opportunity: Fastest Growing Market Segment Projected Growth from

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Effective System Design with ARM System IP

Effective System Design with ARM System IP Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( )

Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 26: Multiprocessing continued Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Today Non-Uniform

More information

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun

The Stanford Hydra CMP. Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun The Stanford Hydra CMP Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Michael Chen, Maciek Kozyrczak*, and Kunle Olukotun Computer Systems Laboratory Stanford University http://www-hydra.stanford.edu

More information

Computer Systems Research in the Post-Dennard Scaling Era. Emilio G. Cota Candidacy Exam April 30, 2013

Computer Systems Research in the Post-Dennard Scaling Era. Emilio G. Cota Candidacy Exam April 30, 2013 Computer Systems Research in the Post-Dennard Scaling Era Emilio G. Cota Candidacy Exam April 30, 2013 Intel 4004, 1971 1 core, no cache 23K 10um transistors Intel Nehalem EX, 2009 8c, 24MB cache 2.3B

More information

Computer Architecture. R. Poss

Computer Architecture. R. Poss Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Computer Architecture. Fall Dongkun Shin, SKKU

Computer Architecture. Fall Dongkun Shin, SKKU Computer Architecture Fall 2018 1 Syllabus Instructors: Dongkun Shin Office : Room 85470 E-mail : dongkun@skku.edu Office Hours: Wed. 15:00-17:30 or by appointment Lecture notes nyx.skku.ac.kr Courses

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases Steve Steele, ARM 1 Today s Computational Challenges Trends Growing display sizes and resolutions, richer

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

Multicore and Parallel Processing

Multicore and Parallel Processing Multicore and Parallel Processing Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 4.10 11, 7.1 6 xkcd/619 2 Pitfall: Amdahl s Law Execution time after improvement

More information

Leveraging the Benefits of Symmetric Multiprocessing (SMP) in Mobile Devices

Leveraging the Benefits of Symmetric Multiprocessing (SMP) in Mobile Devices WHITE PAPER Brian Carlson OMAP Platform Marketing Manager Introduction Mobile devices, such as Smartphones and Mobile Internet Devices (MIDs), are beginning to deliver advanced functions, such as PC-like

More information

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University

Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core. Prof. Onur Mutlu Carnegie Mellon University 18-742 Spring 2011 Parallel Computer Architecture Lecture 4: Multi-core Prof. Onur Mutlu Carnegie Mellon University Research Project Project proposal due: Jan 31 Project topics Does everyone have a topic?

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Automatic Post Silicon Clock Scheduling 08/12/2008. UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering

Automatic Post Silicon Clock Scheduling 08/12/2008. UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Power Issues in Computer Architecture Fall 2008 Power Density Trend for Intel mp 1000 Watt/cm2 100 10

More information