Helio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology
|
|
- Meagan Wilkinson
- 5 years ago
- Views:
Transcription
1 Helio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology Tsung-Yao Lin, g-hsien Lee, Loda Chou, Clavin Peng, Jih-g Hsu, Jia-g Chen, John-CC Chen, Alex Chiou, Artis Chiu, David Lee, Carrie Huang, Kenny Lee, TzuHeng Wang, Wei-Ting Wang, Yenchi Lee, Chi-Hui Wang, Pao-Ching Tseng, Ryan Chen, Kevin Jou August 2016
2 Agenda Tri-Gear Concept Challenges Key Technologies Tailored CPU cores for gears Enhanced coherent interconnect Hybrid scheduler Holistic gear allocation Adaptive thermal management Achievements Summary
3 User Behavior Changed Source: Flurry Analytics Scenarios Example Application Task Load Time Spent% Per Day (2013) Time Spent% Per Day (2014) Time Spent% Per Day (2015) Changes ( ) Web Browsing Chrome Browser Gaming Temple Run 2 Social Messaging Heavy ~ Medium Heavy ~ Light 20% 14% 10% -4% 32% 32% 15% -17% Facebook Medium 24% 28% 31% +3% Entertainment, Utilities, and others YouTube, Mail Medium ~ Light 24% 26% 44% +18% Social messaging, entertainment, and utilities (with medium to light loads) take up to 75% of user time
4 Task Load Distribution of Scenarios Energy Consumption of Scenarios 100% 80% 60% 40% 20% 33% 38% 13% 48% 12% 28% 47% 36% 17% 42% Heavy Load Medium Load Light Load 0% Web Browsing Gaming Social Messaging Entertainment, Utilities & Others Idle Medium load tasks are important across all scenarios (36% ~ 48%) Heavy load tasks are still important for specific scenarios
5 The Dual-Gear Dilemma Light Tasks Medium Tasks Heavy Tasks big always-on, connected game multimedia
6 The Dual-Gear Dilemma Light Tasks Medium Tasks Heavy Tasks Execute medium load tasks on big wasted energy cannot meet performance requirement big always-on, connected Sustainable usage game multimedia
7 The Dual-Gear Dilemma Light Tasks Medium Tasks Heavy Tasks Execute medium load tasks on : balance between performance and power big always-on, connected Sustainable usage game multimedia
8 power Introduction to Tri-Gear High Performance 1 New gear introduced Sustainable Performance 2 gear goes for even lower power, gear aims for higher performance Low Power 3 Reduced power consumption across entire performance range 0 % 100 % 0 % 100 % performance
9 Info. Challenges of Tri-Gear Previous Evolving to Dual-Gear Tri-Gear Revised scheduler Tailored processors Enhanced coherent interconnect SW HW Scheduler Balance power and performance Light Task Heavy Task Thermal Management imize thermal performance Prevent overheating Power Management imize power consumption Right Task to Right CPU Control Info. big Coherent Interconnect Control Improved thermal sensing, power budgeting Improved gear management
10 Agenda Tri-Gear Concept Challenges Key Technologies Tailored CPU cores for gears Enhanced coherent interconnect Hybrid scheduler Holistic gear allocation Adaptive thermal management Achievements Summary
11 Energy Consumption Tailored CPU Cores for Three Gears gear for efficient performance +30% power-efficiency Multi-bit flip-flops optimization Delicate usage of high leakage LVT cells +40% performance vs. gear LIB and MEM optimizations, gears extend power/performance ranges A53 A53 A53 1.4GHz A53 A53 A53 A53 2.0GHz A53 A72 A72 2.5GHz 2.5X 2.0X 1.5X 1.0X +30% power-efficiency vs. +40% Performance vs. 0.5X 0X 1X 2X 3X Single-Thread Performance * Energy and Performance scale relative to the highest point of curve
12 Enhanced Coherent Interconnect Enhanced from 2 ACE ports to 3 ACE ports Increased logic extra power ~50% power reduction by sub-module Fine-Grain Clock Gating (FGCG) Coherent Interconnect Power Comparison big ACE ACE Coherent Interconnect Memory 0.3 common usage range -50% power * Power is relative to 2-gear at 1GB/s ACE ACE ACE Tri-Gear Coherent Interconnect Memory
13 Hybrid Scheduler HMP Dual-Gear scheduler Limited to Dual-Gear Boot CPU is always on and cannot be migrated (Fixed CPU0) Typically in cannot be off Dual-Gear scheduler Fixed CPU0 HMP (Heterogeneous Multi-Processing) SMP (Symmetric Multi-Processing) SMP C0 big C1 big big big Dual-level HMP scheduler for Tri-Gear? Might not be optimal Fixed CPU0 limits power saving opportunities Tri-Gear scheduler Fixed CPU0 SMP HMP? SMP HMP SMP C0 C1 Power-Off
14 CPU Power Intelligent Core Activation Technology (ICAT) ICAT assigns CPU0 dynamically gear can be off by task migration 8%~10% CPU power saved for medium load always online for CPU0(booted CPU) Fixed CPU0 C0 C1 2.5X Power/Tj curve 2.0X 1.5X 1.0X 2 threads w/o ICAT 2 threads with ICAT 1 thread w/o ICAT ICAT: can be offline Dynamic CPU0 C0 C1 0.5X Tj ( C) 1 thread with ICAT Power-Off * Power is relative to 1 thread with ICAT at 65 C
15 Asymmetric Multi-Processing (AMP) with ICAT AMP: enhanced HMP with dynamic gear operation for power saving Packing tasks to for sustainable performance HMP AMP task migration with ICAT Tri-Gear scheduler HMP AMP (Asymmetric Multi-Processing) SMP SMP SMP Packing tasks to for low power C0 C1 HMP AMP
16 power Hybrid Scheduler Instant boost technology HMP for high performance Instant boost technology Quick response to utilize for urgent or heavy tasks Inter-gear task migration Hybrid = SMP + AMP + HMP Inter-gear task migration Dynamic threshold control for energy efficiency and responsiveness Thread-group migration strategy to increase cluster (L2 cache) locality AMP HMP High Performance Sustainable Performance Low Power 0 % 100 % performance
17 Enhanced Power Management Previous Power Management Dynamic Voltage & Frequency Scaling (DVFS) and Hot-Plug drivers consider inputs separately: Power budget, performance requests, and system status such as load, Thread Level Parallelism (TLP) Big gear on/off controlled by Hot-Plug driver Status Thermal, Battery... Power Budget Requests CPU DVFS Heavy task, Scenario... Performance Requests CPU Hot-Plug Status Thermal, Battery... Heavy task, Scenario... Centralized Gear Allocation A holistic control to handle increased complexity Tracking steady states to avoid unnecessary gear migration overhead Linking to user-specified performance, normal, power-saving modes Status Power Budget Requests Centralized Gear Allocation Control CPU DVFS Performance Requests Control CPU Hot-Plug
18 Power Power Adaptive Thermal Management (ATM) Power budgeting by both core limit and frequency limit for all CPUs 2X Dual-Gear Dual-Gear to Tri-Gear More possible solutions from core / frequency combination meeting power target 1.5X ~ 3X more possible solutions on core combination alone, depending on TLP 1X 0X 0X 1X 2X 2-Thread Performance Tri-Gear 2X 1X 0X 0X 1X 2X 2-Thread Performance * Power and performance are relative to the highest point of curve * Each point in a curve represents a choice of gear / core / freq
19 Power Power ATM for More Combinations Previous power allocation Simple cost function: power efficiency only Large search space: chosen solution might not meet actual system requirement Precise power allocation Comprehensive cost function: power efficiency, system requirement (#core, frequency and power), system overhead +10% Performance from considering system requirement -5 C max Tj from reducing system overhead: hot-plug vs. DVFS latency 3X 2X 1X 0X 0X 1X 2X 3X 4X 5X 3X 2X 1X Previous Power Allocation Large search space Power budget Multi-Thread Performance Precise Power Allocation Reduced search space Power budget 0X 0X 1X 2X 3X 4X 5X Multi-Thread Performance * Power and performance are relative to the highest point of curve * Geekbench v3 Multi-core Performance 1 Heavy + 3 Light tasks Freq. limit Freq. limit
20 Agenda Tri-Gear Concept Challenges Key Technologies Tailored CPU cores for gears Enhanced coherent interconnect Hybrid scheduler Holistic gear allocation Adaptive thermal management Achievements Summary
21 Energy Consumption Energy Saving from Tri-Gear CPU Architecture Energy saving from Dual-Gear to Tri-Gear Up to -38% CPU energy measured for scenarios used daily 100% -35% -38% -38% -21% -12% 80% 60% Dual-Gear big 40% Dual-Gear 20% Tri-Gear Tri-Gear 0% Video Record+EIS (Utilities) Web Rollover (Web Browsing) Burst Photo (Utilities) Facebook (Social Messaging) Heavy Loading Game (Gaming) Tri-Gear
22 CorePilot Technology Evolvement SMP Symmetric Multi-Processing HMP Heterogeneous Multi-Processing HC Heterogeneous Computing Tri-Gear Hybrid Tri-Gear Multi-Processing C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 big big GPU GPU MT6592 MT6595 Helio P10 Helio X20 CorePilot 1.0 CorePilot 2.0 CorePilot 3.0 Octa-core with SMP big. HMP Global Task Scheduling CPU+GPU Computing Dynamic Gear Migration for low power Tri-Gear CPU Architecture 12% ~ 38% CPU energy saving
23 power power Summary Majority of tasks are medium and light loads Added gear and enhanced gear CorePilot 3.0 Key Technologies Tailored CPU cores for gears Enhanced coherent interconnect Hybrid scheduler Holistic gear allocation Adaptive thermal management Benefit of Tri-Gear Up to 38% CPU energy saving for typical scenarios used daily over extended performance range 0 % 100 % performance
24 Copyright MediaTek Inc. All rights reserved.
MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot Heterogeneous Multi-Processing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on
More informationMediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip
More informationARM big.little Technology Unleashed An Improved User Experience Delivered
ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationBig.LITTLE Processing with ARM Cortex -A15 & Cortex-A7
Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design
More informationHeterogeneous Architecture. Luca Benini
Heterogeneous Architecture Luca Benini lbenini@iis.ee.ethz.ch Intel s Broadwell 03.05.2016 2 Qualcomm s Snapdragon 810 03.05.2016 3 AMD Bristol Ridge Departement Informationstechnologie und Elektrotechnik
More informationARM Vision for Thermal Management and Energy Aware Scheduling on Linux
ARM Vision for Management and Energy Aware Scheduling on Linux Charles Garcia-Tobin, Software Power Architect, ARM Thomas Molgaard, Director of Product Management, ARM ARM Tech Symposia China 2015 November
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationQoS Handling with DVFS (CPUfreq & Devfreq)
QoS Handling with DVFS (CPUfreq & Devfreq) MyungJoo Ham SW Center, 1 Performance Issues of DVFS Performance Sucks w/ DVFS! Battery-life Still Matters More Devices (components) w/ DVFS More Performance
More informationThe mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management
Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management
More informationAgenda. System Performance Scaling of IBM POWER6 TM Based Servers
System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 23 Mahadevan Gomathisankaran April 27, 2010 04/27/2010 Lecture 23 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationMoorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones
Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones HOT CHIPS 2009 August 24 2009 Rajesh Patel Lead Architect, Lincroft SoC Intel Corporation Legal Disclaimer INFORMATION
More informationOn-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.
On-chip Networks Enable the Dark Silicon Advantage Drew Wingard CTO & Co-founder Sonics, Inc. Agenda Sonics history and corporate summary Power challenges in advanced SoCs General power management techniques
More informationPOWER MANAGEMENT AND ENERGY EFFICIENCY
POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption
More informationEmbedded Systems: Projects
December 2015 Embedded Systems: Projects Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Research Activities Interconnect: bus, NoC Simulation (component design, evaluation)
More informationCPU Clock Ratio, CPU Frequency The settings above are synchronous to those under the same items on the Advanced Frequency Settings menu.
Advanced CPU Core Features CPU Clock Ratio, CPU Frequency The settings above are synchronous to those under the same items on the Advanced Frequency Settings menu. CPU PLL Selection Allows you to set the
More informationGigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004
Gigascale Integration Design Challenges & Opportunities Shekhar Borkar Circuit Research, Intel Labs October 24, 2004 Outline CMOS technology challenges Technology, circuit and μarchitecture solutions Integration
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationCSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore
CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors
More informationComputer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics
Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationReliable Power and Thermal Management in The Data Center
Reliable Power and Thermal Management in The Data Center Deva Bodas Corporation April 19, 2004 Deva.Bodas@.com 1 Agenda 2 Data center manageability challenges & trends Current state of power & thermal
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationLTE Device Customization For Operators
MediaTek Proprietary 1 Wireless System Communication & Partnership Carrier Engineering Services LTE Device Customization For Operators GCF China Workshop 2017 MediaTek Proprietary 2 None of the information
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationDell Dynamic Power Mode: An Introduction to Power Limits
Dell Dynamic Power Mode: An Introduction to Power Limits By: Alex Shows, Client Performance Engineering Managing system power is critical to balancing performance, battery life, and operating temperatures.
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationA2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications
A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas
More informationIntelligent Power Allocation for Consumer & Embedded Thermal Control
Intelligent Power Allocation for Consumer & Embedded Thermal Control Ian Rickards ARM Ltd, Cambridge UK ELC San Diego 5-April-2016 Existing Linux Thermal Framework Trip1 Trip0 Thermal trip mechanism using
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationECE 571 Advanced Microprocessor-Based Design Lecture 24
ECE 571 Advanced Microprocessor-Based Design Lecture 24 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 25 April 2013 Project/HW Reminder Project Presentations. 15-20 minutes.
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationIntegrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM
Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of
More informationPower Management for Embedded Systems
Power Management for Embedded Systems Minsoo Ryu Hanyang University Why Power Management? Battery-operated devices Smartphones, digital cameras, and laptops use batteries Power savings and battery run
More informationJae Wook Lee. SIC R&D Lab. LG Electronics
Jae Wook Lee SIC R&D Lab. LG Electronics Contents Introduction Why power validation on mobile application processor? Then, what to validate? Who is in charge of validation? Power Validation Components
More informationSH-X3 Flexible SuperH Multi-core for High-performance and Low-power Embedded Systems
SH-X3 Flexible SuperH Multi-core for High-performance and Low-power Embedded Systems Shinichi Shibahara 1, Masashi Takada 2, Tatsuya Kamei 1, Kiyoshi Hayase 1, Yutaka Yoshida 1, Osamu Nishii 1, Toshihiro
More informationECE 571 Advanced Microprocessor-Based Design Lecture 22
ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ
More informationMulticore Programming
Multi Programming Parallel Hardware and Performance 8 Nov 00 (Part ) Peter Sewell Jaroslav Ševčík Tim Harris Merge sort 6MB input (-bit integers) Recurse(left) ~98% execution time Recurse(right) Merge
More informationPosition Paper: OpenMP scheduling on ARM big.little architecture
Position Paper: OpenMP scheduling on ARM big.little architecture Anastasiia Butko, Louisa Bessad, David Novo, Florent Bruguier, Abdoulaye Gamatié, Gilles Sassatelli, Lionel Torres, and Michel Robert LIRMM
More informationIs Intel s Hyper-Threading Technology Worth the Extra Money to the Average User?
Is Intel s Hyper-Threading Technology Worth the Extra Money to the Average User? Andrew Murray Villanova University 800 Lancaster Avenue, Villanova, PA, 19085 United States of America ABSTRACT In the mid-1990
More informationLeakage Mitigation Techniques in Smartphone SoCs
Leakage Mitigation Techniques in Smartphone SoCs 1 John Redmond 1 Broadcom International Symposium on Low Power Electronics and Design Smartphone Use Cases Power Device Convergence Diverse Use Cases Camera
More informationSilvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter
Introducing Next Generation Low Power Microarchitecture: Silvermont Dadi Perlmutter Executive Vice President General Manager, Intel Architecture Group Chief Product Officer Risk Factors Today s presentations
More informationBuilding Ultra-Low Power Wearable SoCs
Building Ultra-Low Power Wearable SoCs 1 Wearable noun An item that can be worn adjective Easy to wear, suitable for wearing 2 Wearable Opportunity: Fastest Growing Market Segment Projected Growth from
More informationHyperTransport. Dennis Vega Ryan Rawlins
HyperTransport Dennis Vega Ryan Rawlins What is HyperTransport (HT)? A point to point interconnect technology that links processors to other processors, coprocessors, I/O controllers, and peripheral controllers.
More informationA task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b
5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 1 School of
More informationDistributed File Systems Issues. NFS (Network File System) AFS: Namespace. The Andrew File System (AFS) Operating Systems 11/19/2012 CSC 256/456 1
Distributed File Systems Issues NFS (Network File System) Naming and transparency (location transparency versus location independence) Host:local-name Attach remote directories (mount) Single global name
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationThread Affinity Experiments
Thread Affinity Experiments Power implications on Exynos Introduction The LPGPU2 Profiling Tool and API provide support for CPU thread affinity locking and logging, and although this functionality is not
More informationA Cool Scheduler for Multi-Core Systems Exploiting Program Phases
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth
More informationCentip3De: A 64-Core, 3D Stacked, Near-Threshold System
1 1 1 Centip3De: A 64-Core, 3D Stacked, Near-Threshold System Ronald G. Dreslinski David Fick, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman
More informationDesigning Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,
More informationEECS 750: Advanced Operating Systems. 01/29 /2014 Heechul Yun
EECS 750: Advanced Operating Systems 01/29 /2014 Heechul Yun 1 Administrative Next summary assignment Resource Containers: A New Facility for Resource Management in Server Systems, OSDI 99 due by 11:59
More informationA Study on C-group controlled big.little Architecture
A Study on C-group controlled big.little Architecture Renesas Electronics Corporation New Solutions Platform Business Division Renesas Solutions Corporation Advanced Software Platform Development Department
More informationPower management for in-vehicle infotainment systems
Automotive Linux Summit 2017 Power management for in-vehicle infotainment systems 2017/05/31 Takahiko Gomi Automotive Information Solution Business Division Renesas Electronics Corporation 1 Who am I?
More informationUbiquitous and Mobile Computing CS 528:EnergyEfficiency Comparison of Mobile Platforms and Applications: A Quantitative Approach. Norberto Luna Cano
Ubiquitous and Mobile Computing CS 528:EnergyEfficiency Comparison of Mobile Platforms and Applications: A Quantitative Approach Norberto Luna Cano Computer Science Dept. Worcester Polytechnic Institute
More informationChallenges for Next Generation Networking AMP Series
21 June 2011 Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC, QorIQ, StarCore and Symphony are trademarks
More informationPOWER7+ TM IBM IBM Corporation
POWER7+ TM 2012 Corporation Outline POWER Processor History Design Overview Performance Benchmarks Key Features Scale-up / Scale-out The new accelerators Advanced energy management Summary * Statements
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming David Lifka lifka@cac.cornell.edu May 23, 2011 5/23/2011 www.cac.cornell.edu 1 y What is Parallel Programming? Using more than one processor or computer to complete
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationCortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology
More informationMemshare: a Dynamic Multi-tenant Key-value Cache
Memshare: a Dynamic Multi-tenant Key-value Cache ASAF CIDON*, DANIEL RUSHTON, STEPHEN M. RUMBLE, RYAN STUTSMAN *STANFORD UNIVERSITY, UNIVERSITY OF UTAH, GOOGLE INC. 1 Cache is 100X Faster Than Database
More informationEfficient Resource Management for Cloud Computing Environments
Efficient Resource Management for Cloud Computing Environments Andrew J. Younge, Gregor von Laszewski, Lizhe Wang Pervasive Technology Institute Indianan University Bloomington, IN USA Sonia Lopez-Alarcon,
More informationA unified multicore programming model
A unified multicore programming model Simplifying multicore migration By Sven Brehmer Abstract There are a number of different multicore architectures and programming models available, making it challenging
More informationFlash Storage Trends & Ecosystem
Flash Storage Trends & Ecosystem Hung Vuong Qualcomm Inc. Introduction Trends Agenda Wireless Industry Trends Memory & Storage Trends Opportunities Summary Cellular Products Group (CPG) Wireless Handsets
More informationAbstract. Testing Parameters. Introduction. Hardware Platform. Native System
Abstract In this paper, we address the latency issue in RT- XEN virtual machines that are available in Xen 4.5. Despite the advantages of applying virtualization to systems, the default credit scheduler
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationTowards Energy Proportionality for Large-Scale Latency-Critical Workloads
Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo *, Liqun Cheng *, Rama Govindaraju *, Luiz André Barroso *, Christos Kozyrakis Stanford University * Google Inc. 2012
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationFUSION PROCESSORS AND HPC
FUSION PROCESSORS AND HPC Chuck Moore AMD Corporate Fellow & Technology Group CTO June 14, 2011 Fusion Processors and HPC Today: Multi-socket x86 CMPs + optional dgpu + high BW memory Fusion APUs (SPFP)
More informationPower-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization
This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power
More informationMulti-Core Microprocessor Chips: Motivation & Challenges
Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005
More informationPOWER7: IBM's Next Generation Server Processor
Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by
More informationMany-Core Computing Era and New Challenges. Nikos Hardavellas, EECS
Many-Core Computing Era and New Challenges Nikos Hardavellas, EECS Moore s Law Is Alive And Well 90nm 90nm transistor (Intel, 2005) Swine Flu A/H1N1 (CDC) 65nm 2007 45nm 2010 32nm 2013 22nm 2016 16nm 2019
More informationExploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems
Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI
More informationPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization Wei Chen, Jia Rao*, and Xiaobo Zhou University of Colorado, Colorado Springs * University of Texas at Arlington Data Center
More informationTowards Power Management for FreeBSD
Towards Power Management for FreeBSD Robin Randhawa robin.randhawa@arm.com FreeBSD Developer Summit Computer Laboratory University of Cambridge August 2015 Agenda An overview of Energy Aware Scheduling
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationRealizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics
Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics Zvonimir Z. Bandic, Sr. Director, Next Generation Platform Technologies Western Digital Corporation
More informationReview: Creating a Parallel Program. Programming for Performance
Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)
More informationAMD Opteron 4200 Series Processor
What s new in the AMD Opteron 4200 Series Processor (Codenamed Valencia ) and the new Bulldozer Microarchitecture? Platform Processor Socket Chipset Opteron 4000 Opteron 4200 C32 56x0 / 5100 (codenamed
More informationSeahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor
Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor PD Marketing ARM 1 Introduction to Cortex-A15 & Seahawk ARM Cortex-A15 is a high performance engine for superphones,
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationRUBIK: FAST ANALYTICAL POWER MANAGEMENT
RUBIK: FAST ANALYTICAL POWER MANAGEMENT FOR LATENCY-CRITICAL SYSTEMS HARSHAD KASTURE, DAVIDE BARTOLINI, NATHAN BECKMANN, DANIEL SANCHEZ MICRO 2015 Motivation 2! Low server utilization in today s datacenters
More informationOptimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs
Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem
More informationFast Tridiagonal Solvers on GPU
Fast Tridiagonal Solvers on GPU Yao Zhang John Owens UC Davis Jonathan Cohen NVIDIA GPU Technology Conference 2009 Outline Introduction Algorithms Design algorithms for GPU architecture Performance Bottleneck-based
More informationHOT CHIPS 2014 NVIDIA S DENVER PROCESSOR. Darrell Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman
HOT CHIPS 2014 NVIDIA S DENVER PROCESSOR Darrell Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman TEGRA K1 with Dual Denver CPUs The First 64-bit Android Kepler-Class
More informationNetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013
NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationA Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan
LegoOS A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang Y 4 1 2 Monolithic Server OS / Hypervisor 3 Problems? 4 cpu mem Resource
More informationPOWER7: IBM's Next Generation Server Processor
POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline
More informationEfficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems
Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun
More informationAdapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES
Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES Tom Atwood Business Development Manager Sun Microsystems, Inc. Takeaways Understand the technical differences between
More informationExploration of Cache Coherent CPU- FPGA Heterogeneous System
Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based
More informationAttack Your SoC Power Challenges with Virtual Prototyping
Attack Your SoC Power Challenges with Virtual Prototyping Stefan Thiel Gunnar Braun Accellera Systems Initiative 1 Agenda Part #1: Power-aware Architecture Definition Part #2: Power-aware Software Development
More informationAn Energy-Efficient Near/Sub-Threshold FPGA Interconnect Architecture Using Dynamic Voltage Scaling and Power-Gating
An Energy-Efficient Near/Sub-Threshold FPGA Interconnect Architecture Using Dynamic Voltage Scaling and Power-Gating He Qi, Oluseyi Ayorinde, and Benton H. Calhoun Charles L. Brown Department of Electrical
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More information