ISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015
|
|
- Agnes Golden
- 5 years ago
- Views:
Transcription
1 ISA-Aging (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 Bruno Cardoso Lopes, Rafael Auler, Edson Borin, Luiz Ramos, Rodolfo Azevedo, University of Campinas, Brasil - Institute of Computing
2 Motivation ISA Aging x86 code is bigger than RISC (ARM) 2
3 What about other architectures? V4$ V4T$ V5TE$ V6$ VFP2$ DB$ V6T2$ V7$ HWDIV$ FP16$ MP$ NEON/VFP3$ VFP4$ 299$ 302$ 320$ 366$ 466$ 469$ 477$ 481$ 483$ 485$ 487$ ARM 1990 RIOS POWER POWER POWER POWER POWER3 II 2001 POWER POWER POWER POWER POWER POWER POWER POWER POWER PowerPC $ 250$ 300$ 350$ 400$ 450$ 500$ 550$ 600$ Total$number$of$InstrucGons$ Total number of instructions 3
4 The x86 instruction set Intel 8086 family, variable-length format Operation code: opcode + other bits to uniquely identify an instruction 4
5 Average instruction opcode size by x86 features Variable-length format no longer benefits most used instruction 5
6 AVX & SSE (vs x87) SPEC2006FP Modern compilers use AVX or SSE as default ISA for floating point calculations 6
7 Solutions? 7
8 Radical Approaches Breaking Backward Compatibility 1 Reduce all opcodes to 2 bytes 2 Reduce all opcodes to 1 or 2 bytes 3 Convert to a RISC-like ISA encoding 8
9 Evaluation Code size (%) bwaves cactusadm Approach 1 Approach 2 Approach % 35.5% calculix dealii gamess GemsFDTD gromacs lbm leslie3d milc namd povray soplex x86 code is bigger than RISC (ARM) for most programs Solution (2) encoding shows that variable-length is better than RISC and x86. 9 sphinx3 tonto wrf zeusmp GeoMean
10 However... Breaking x86 backward compatibility is not an option. Software base Market What now? 10
11 Recycling Mechanism 11
12 Recycling Mechanism Remove outdated and unused instructions Re-use opcode space to encode new instructions while maintaining backward compatibility Benefits Open room for encoding new instructions with less bits - improving program size and cache. x86 complexity can be reduced, opening market for specific domains; e.g. low-end embedded devices. 12
13 Two examples 13
14 Outdating Recycling 2010: ISA : ISA :ISA 2 CPU Revision A Industry warns software vendors CPU Revision B ISA Evolution /ISA Releases SW Revision A SW Revision B ISA release vs revisions Opcode 1h 4h AAA Opcode 1h 4h VADD 14
15 Outdating Recycling 2010: ISA : ISA :ISA 2 CPU Revision A Industry warns software vendors CPU Revision B ISA Evolution /ISA Releases SW Revision A??? SW Revision B ISA release vs revisions Opcode 1h 4h AAA Opcode 1h 4h VADD 15
16 CPU Revision B Revision Mismatch Trap Mask Selector SW Revision A Execution hits opcode 1h 4h Revision A Opcode Trap? 0h N h 4h Y Trap Mask Vectors for revisions A against Z 16
17 Emulation Old software revision executing on new processor revision leads to backward compatibility issues Solution: software emulation mechanism via CPU generated traps. Allows non-sequential ISA evolution disputes over new extensions (XOP, FMA4,...): vendors could emulate each other instructions using the trap mechanism. 17
18 Emulation Emulation must avoid using outdated instructions Emulation Routines: Operating System Firmware Linker Operating System Loader Executable header annotated with software 18
19 Evaluation Static and Dynamic instruction analysis of Linux and Windows from
20 Static Analysis Used Instructions Linux 100 Windows Year 20
21 Dynamic Analysis Fraction of the dynamic trace 100 % 99 % 98 % 97 % 96 % 95 % MMX P6 SSE SSE2 X87 16-bit Windows95 Windows98 WindowsXP WindowsVista Windows7 Slackware3 Ubuntu4 Ubuntu8 Ubuntu12 21
22 Emulation Overhead Experiment - Linux kernel trap implementation Tolerating a 5% overhead: we can re-encode 40% of the x86 ISA 22
23 How Many Instructions to Emulate? % instructions emulated at runtime =4 =5 =6 =8 =10 =12 = win win95+slack win95+slack win98+slack3.0 SSE SSE win98+slack win98+slack winxp+slack winxp+slack winxp+slack winxp+ubu winxp+ubu SSE3 SSSE3 SSE4.1 SSE4.2+AES+CLMUL 2006-winxp+ubu vista+ubu vista+ubu8.10 AVX 2009-win7+ubu win7+ubu win7+ubu win7+ubu12.10
24 Runtime Overhead Runtime overhead (%) =4 =5 =6 =8 =10 =12 = win win95+slack win95+slack win98+slack3.0 SSE SSE win98+slack win98+slack winxp+slack winxp+slack winxp+slack winxp+ubu winxp+ubu winxp+ubu vista+ubu vista+ubu SSE3 SSSE3 SSE4.1 SSE4.2+AES+CLMUL AVX 2009-win7+ubu win7+ubu win7+ubu win7+ubu12.10
25 Instruction Decoder Decoder + ucode ROM: from 2% to 17% of processor area Removed instructions still needed to be decoded (to generate traps) Reuse instruction encodings More than one decoder in recent x86 implementations (up to 3 fast and 1 slow decoder) 25
26 Decoder Critical Path Improvements C. Path Improv. (%) Naive SHRINK 2001-SSE SSE SSE SSE SSE SSSE SSE SSE SSE SSE AVX 2012-AVX
27 Decoder Area Gains Area Reduction (%) Naive SHRINK 2001-SSE SSE SSE SSE SSE SSSE SSE SSE SSE SSE AVX 2012-AVX
28 Decoder Power Gains Power Reduction (%) Naive SHRINK 2001-SSE SSE SSE SSE SSE SSSE SSE SSE SSE SSE AVX 2012-AVX
29 Conclusion Static and Dynamic analysis shows that a great number of x86 instructions are obsolete. Recycling mechanism: re-encoding instructions without breaking backward compatibility We could emulate 40% of x86 instructions with less than 5% overhead Decoder critical path improvements up to 50% Decoder area reduced up to 73% ucode ROM reduced up to 43% Power consumption reduced up to 70% 29
30 Questions? ISA-Aging (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 Bruno Cardoso Lopes, Rafael Auler, Edson Borin, Luiz Ramos, Rodolfo Azevedo, University of Campinas, Brasil - Institute of Computing
ISA-Aging Envelhecimento de Conjuntos de Instruções
ISA-Aging Envelhecimento de Conjuntos de Instruções Rodolfo Azevedo rodolfo@ic.unicamp.br Slides baseados na apresentação do artigo SHRINK: Reducing the ISA Complexity Via Instruction Recycling ISCA 2015
More informationA Fast Instruction Set Simulator for RISC-V
A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationNightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems
NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems Rentong Guo 1, Xiaofei Liao 1, Hai Jin 1, Jianhui Yue 2, Guang Tan 3 1 Huazhong University of Science
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationFootprint-based Locality Analysis
Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 36 Performance 2010-04-23 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in
More informationNear-Threshold Computing: How Close Should We Get?
Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer ETH Zurich Enrico Kravina ETH Zurich Thomas R. Gross ETH Zurich Abstract Memory tracing (executing additional code for every memory access of a program) is a powerful
More informationA Dynamic Program Analysis to find Floating-Point Accuracy Problems
1 A Dynamic Program Analysis to find Floating-Point Accuracy Problems Florian Benz fbenz@stud.uni-saarland.de Andreas Hildebrandt andreas.hildebrandt@uni-mainz.de Sebastian Hack hack@cs.uni-saarland.de
More informationPerformance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor
Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University
More informationEnergy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012
Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso
More informationSandbox Based Optimal Offset Estimation [DPC2]
Sandbox Based Optimal Offset Estimation [DPC2] Nathan T. Brown and Resit Sendag Department of Electrical, Computer, and Biomedical Engineering Outline Motivation Background/Related Work Sequential Offset
More informationArchitecture of Parallel Computer Systems - Performance Benchmarking -
Architecture of Parallel Computer Systems - Performance Benchmarking - SoSe 18 L.079.05810 www.uni-paderborn.de/pc2 J. Simon - Architecture of Parallel Computer Systems SoSe 2018 < 1 > Definition of Benchmark
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationPIPELINING AND PROCESSOR PERFORMANCE
PIPELINING AND PROCESSOR PERFORMANCE Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 1, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationAddressing End-to-End Memory Access Latency in NoC-Based Multicores
Addressing End-to-End Memory Access Latency in NoC-Based Multicores Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das The Pennsylvania State University University Park, PA, 682, USA {akbar,euk39,kandemir,das}@cse.psu.edu
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 38 Performance 2008-04-30 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in
More informationPerformance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Performance Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Defining Performance (1) Which airplane has the best performance? Boeing 777 Boeing
More informationGenerating Low-Overhead Dynamic Binary Translators
Generating Low-Overhead Dynamic Binary Translators Mathias Payer ETH Zurich, Switzerland mathias.payer@inf.ethz.ch Thomas R. Gross ETH Zurich, Switzerland trg@inf.ethz.ch Abstract Dynamic (on the fly)
More informationThe information provided is intended to help designers and end users make performance
Configuring and Tuning for Performance on Intel 5100 Memory Controller Hub Chipset Based Platforms Contributor Perry Taylor Intel Corporation Index Words Intel 5100 Memory Controller Hub chipset Intel
More informationVirtual Machines and Dynamic Translation: Implementing ISAs in Software
Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application
More informationCPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:
CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2007 Lecture 14: Virtual Machines 563 L14.1 Fall 2009 Outline Types of Virtual Machine User-level (or Process VMs) System-level Techniques for implementing all
More informationDetection of Weak Spots in Benchmarks Memory Space by using PCA and CA
Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR *, Fazal NOORBASHA
More informationComputer Architecture. Introduction
to Computer Architecture 1 Computer Architecture What is Computer Architecture From Wikipedia, the free encyclopedia In computer engineering, computer architecture is a set of rules and methods that describe
More informationLoop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization
Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Yulei Sui, Xiaokang Fan, Hao Zhou and Jingling Xue School of Computer Science and Engineering The University of
More informationLinux Performance on IBM zenterprise 196
Martin Kammerer martin.kammerer@de.ibm.com 9/27/10 Linux Performance on IBM zenterprise 196 visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html Trademarks IBM, the IBM logo, and
More informationComputer Systems Laboratory Sungkyunkwan University
ARM & IA-32 Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ARM (1) ARM & MIPS similarities ARM: the most popular embedded core Similar basic set
More informationBias Scheduling in Heterogeneous Multi-core Architectures
Bias Scheduling in Heterogeneous Multi-core Architectures David Koufaty Dheeraj Reddy Scott Hahn Intel Labs {david.a.koufaty, dheeraj.reddy, scott.hahn}@intel.com Abstract Heterogeneous architectures that
More informationOpen Access Research on the Establishment of MSR Model in Cloud Computing based on Standard Performance Evaluation
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 821-825 821 Open Access Research on the Establishment of MSR Model in Cloud Computing based
More informationData Prefetching by Exploiting Global and Local Access Patterns
Journal of Instruction-Level Parallelism 13 (2011) 1-17 Submitted 3/10; published 1/11 Data Prefetching by Exploiting Global and Local Access Patterns Ahmad Sharif Hsien-Hsin S. Lee School of Electrical
More informationEfficient Memory Shadowing for 64-bit Architectures
Efficient Memory Shadowing for 64-bit Architectures The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Qin Zhao, Derek Bruening,
More informationModeling Virtual Machines Misprediction Overhead
Modeling Virtual Machines Misprediction Overhead Divino César, Rafael Auler, Rafael Dalibera, Sandro Rigo, Edson Borin and Guido Araújo Institute of Computing, University of Campinas Campinas, São Paulo
More informationA Front-end Execution Architecture for High Energy Efficiency
A Front-end Execution Architecture for High Energy Efficiency Ryota Shioya, Masahiro Goshima and Hideki Ando Department of Electrical Engineering and Computer Science, Nagoya University, Aichi, Japan Information
More informationPipelining. CS701 High Performance Computing
Pipelining CS701 High Performance Computing Student Presentation 1 Two 20 minute presentations Burks, Goldstine, von Neumann. Preliminary Discussion of the Logical Design of an Electronic Computing Instrument.
More informationFundamentals of Computer Design
CS359: Computer Architecture Fundamentals of Computer Design Yanyan Shen Department of Computer Science and Engineering 1 Defining Computer Architecture Agenda Introduction Classes of Computers 1.3 Defining
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More informationIBM POWER Systems Compiler Roadmap
IBM POWER Systems Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-14 May 22, 2008 Agenda Overall Roadmap The POWER Systems Compiler Products Detailed Roadmaps Common
More informationInsertion and Promotion for Tree-Based PseudoLRU Last-Level Caches
Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches Daniel A. Jiménez Department of Computer Science and Engineering Texas A&M University ABSTRACT Last-level caches mitigate the high latency
More informationPerceptron Learning for Reuse Prediction
Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level
More informationECE 486/586. Computer Architecture. Lecture # 7
ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix
More informationComputer System Architecture
CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Instruction Set Architecture (ISA) Level 2 Introduction 3 Instruction Set Architecture
More informationHOTL: A Higher Order Theory of Locality
HOTL: A Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com
More informationCOE608: Computer Organization and Architecture
Add on Instruction Set Architecture COE608: Computer Organization and Architecture Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University Overview More
More informationEfficient and Effective Misaligned Data Access Handling in a Dynamic Binary Translation System
Efficient and Effective Misaligned Data Access Handling in a Dynamic Binary Translation System JIANJUN LI, Institute of Computing Technology Graduate University of Chinese Academy of Sciences CHENGGANG
More informationComputer Architecture
Computer Architecture Lecture 3: ISA Tradeoffs Dr. Ahmed Sallam Suez Canal University Spring 2015 Based on original slides by Prof. Onur Mutlu Design Point A set of design considerations and their importance
More informationSPARC64 VII Fujitsu s Next Generation Quad-Core Processor
SPARC64 VII Fujitsu s Next Generation Quad-Core Processor August 26, 2008 Takumi Maruyama LSI Development Division Next Generation Technical Computing Unit Fujitsu Limited High Performance Technology High
More informationHOTL: a Higher Order Theory of Locality
HOTL: a Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com
More informationPotential for hardware-based techniques for reuse distance analysis
Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports - Open Dissertations, Master's Theses and Master's Reports 2011 Potential for hardware-based
More informationExploi'ng Compressed Block Size as an Indicator of Future Reuse
Exploi'ng Compressed Block Size as an Indicator of Future Reuse Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch Execu've Summary In a compressed
More informationPMCTrack: Delivering performance monitoring counter support to the OS scheduler
PMCTrack: Delivering performance monitoring counter support to the OS scheduler J. C. Saez, A. Pousa, R. Rodríguez-Rodríguez, F. Castro, M. Prieto-Matias ArTeCS Group, Facultad de Informática, Complutense
More informationCOP: To Compress and Protect Main Memory
COP: To Compress and Protect Main Memory David J. Palframan Nam Sung Kim Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin Madison palframan@wisc.edu, nskim3@wisc.edu,
More informationComputer Architecture
Computer Architecture Lecture 3: ISA Tradeoffs Dr. Ahmed Sallam Suez Canal University Based on original slides by Prof. Onur Mutlu Application Space Dream, and they will appear 2 Design Point A set of
More informationIntroducing the GCC to the Polyhedron Model
1/15 Michael Claßen University of Passau St. Goar, June 30th 2009 2/15 Agenda Agenda 1 GRAPHITE Introduction Status of GRAPHITE 2 The Polytope Model in GRAPHITE What code can be represented? GPOLY - The
More informationMaking Data Prefetch Smarter: Adaptive Prefetching on POWER7
Making Data Prefetch Smarter: Adaptive Prefetching on POWER7 Víctor Jiménez Barcelona Supercomputing Center Barcelona, Spain victor.javier@bsc.es Alper Buyuktosunoglu IBM T. J. Watson Research Center Yorktown
More informationPerformance Analysis in Modern Multicores
Practical Parallel Programming course - Winter 2017 CS @ Haifa University Performance Analysis in Modern Multicores Ahmad Yasin CPU Architect, Intel Corporation 14 January 2018 1 Ahmad Yasin Performance
More informationEJEMPLOS DE ARQUITECTURAS
Maestría en Electrónica Arquitectura de Computadoras Unidad 4 EJEMPLOS DE ARQUITECTURAS M. C. Felipe Santiago Espinosa Marzo/2017 ARM & MIPS Similarities ARM: the most popular embedded core Similar basic
More information562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016
562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016 Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Member,
More informationThe Slide does not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams.
The Slide does not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams. Operating System Services User Operating System Interface
More informationSystems Architecture I
Systems Architecture I Topics Assemblers, Linkers, and Loaders * Alternative Instruction Sets ** *This lecture was derived from material in the text (sec. 3.8-3.9). **This lecture was derived from material
More informationSEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week)
+ SEN361 Computer Organization Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) + Outline 1. Overview 1.1 Basic Concepts and Computer Evolution 1.2 Performance Issues + 1.2 Performance Issues + Designing for
More informationThe x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova
The x86 Architecture ICS312 - Spring 2018 Machine-Level and Systems Programming Henri Casanova (henric@hawaii.edu) The 80x86 Architecture! To learn assembly programming we need to pick a processor family
More informationA Comprehensive Scheduler for Asymmetric Multicore Systems
A Comprehensive Scheduler for Asymmetric Multicore Systems Juan Carlos Saez Manuel Prieto Complutense University, Madrid, Spain {jcsaezal,mpmatias}@pdi.ucm.es Alexandra Fedorova Sergey Blagodurov Simon
More informationMulti-Cache Resizing via Greedy Coordinate Descent
Noname manuscript No. (will be inserted by the editor) Multi-Cache Resizing via Greedy Coordinate Descent I. Stephen Choi Donald Yeung Received: date / Accepted: date Abstract To reduce power consumption
More informationCSE 4/521 Introduction to Operating Systems. Lecture 12 Main Memory I (Background, Swapping) Summer 2018
CSE 4/521 Introduction to Operating Systems Lecture 12 Main Memory I (Background, Swapping) Summer 2018 Overview Objective: 1. To provide a detailed description of various ways of organizing memory hardware.
More informationInformation System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University
2110684 Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University Agenda Capacity Planning Determining the production capacity needed by an organization
More informationDEMM: a Dynamic Energy-saving mechanism for Multicore Memories
DEMM: a Dynamic Energy-saving mechanism for Multicore Memories Akbar Sharifi, Wei Ding 2, Diana Guttman 3, Hui Zhao 4, Xulong Tang 5, Mahmut Kandemir 5, Chita Das 5 Facebook 2 Qualcomm 3 Intel 4 University
More informationIBM System p Compiler Roadmap
IBM System p Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-13 July 19, 2007 Agenda Overall Roadmap The System p Compiler Products Detailed Roadmaps Common Features
More informationLIA. Large Installation Administration. Virtualization
LIA Large Installation Administration Virtualization 2 Virtualization What is Virtualization "a technique for hiding the physical characteristics of computing resources from the way in which other systems,
More informationProgrammazione Avanzata
Programmazione Avanzata Vittorio Ruggiero (v.ruggiero@cineca.it) Roma, Marzo 2017 Pipeline Outline CPU: internal parallelism? CPU are entirely parallel pipelining superscalar execution units SIMD MMX,
More informationComputing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design
Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design Computing Element Choices: Computing Element Programmability Spatial vs. Temporal Computing Main Processor Types/Applications
More informationEKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ EKT 303 WEEK 2 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. Chapter 2 + Performance Issues + Designing for Performance The cost of computer systems continues to drop dramatically,
More informationLinearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko Advisors: Todd C. Mowry and Onur Mutlu Computer Science Department, Carnegie Mellon
More informationOpenPrefetch. (in-progress)
OpenPrefetch Let There Be Industry-Competitive Prefetching in RISC-V Processors (in-progress) Bowen Huang, Zihao Yu, Zhigang Liu, Chuanqi Zhang, Sa Wang, Yungang Bao Institute of Computing Technology(ICT),
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationMachine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationTranslation Caching: Skip, Don t Walk (the Page Table)
Translation Caching: Skip, Don t Walk (the Page Table) Thomas W. Barr, Alan L. Cox, Scott Rixner Rice University Houston, TX {twb, alc, rixner}@rice.edu ABSTRACT This paper explores the design space of
More informationROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks
ROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks Lucas Davi, Ahmad-Reza Sadeghi, Marcel Winandy ABSTRACT System Security Lab Technische Universität Darmstadt Darmstadt,
More informationLecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor
Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important
More informationThesis Defense Lavanya Subramanian
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Thesis Defense Lavanya Subramanian Committee: Advisor: Onur Mutlu Greg Ganger James Hoe Ravi Iyer (Intel)
More informationOutline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently?
Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs
More informationChapter 12. CPU Structure and Function. Yonsei University
Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor
More informationChapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1
Chapter 2 1 MIPS Instructions Instruction Meaning add $s1,$s2,$s3 $s1 = $s2 + $s3 sub $s1,$s2,$s3 $s1 = $s2 $s3 addi $s1,$s2,4 $s1 = $s2 + 4 ori $s1,$s2,4 $s2 = $s2 4 lw $s1,100($s2) $s1 = Memory[$s2+100]
More informationChapter 2: Instructions How we talk to the computer
Chapter 2: Instructions How we talk to the computer 1 The Instruction Set Architecture that part of the architecture that is visible to the programmer - instruction formats - opcodes (available instructions)
More informationRISC Principles. Introduction
3 RISC Principles In the last chapter, we presented many details on the processor design space as well as the CISC and RISC architectures. It is time we consolidated our discussion to give details of RISC
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science
More informationOutline. What Makes a Good ISA? Programmability. Implementability
Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs
More informationIntel released new technology call P6P
P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new
More informationChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality
ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu Executive Summary Goal: Reduce
More information( ZIH ) Center for Information Services and High Performance Computing. Overvi ew over the x86 Processor Architecture
( ZIH ) Center for Information Services and High Performance Computing Overvi ew over the x86 Processor Architecture Daniel Molka Ulf Markwardt Daniel.Molka@tu-dresden.de ulf.markwardt@tu-dresden.de Outline
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data
More informationComputer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/17/2014
18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/17/2014 Design Point A set of design considerations and their importance leads to tradeoffs
More informationLecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang
Lecture 25: Interrupt Handling and Multi-Data Processing Spring 2018 Jason Tang 1 Topics Interrupt handling Vector processing Multi-data processing 2 I/O Communication Software needs to know when: I/O
More informationMultiperspective Reuse Prediction
ABSTRACT Daniel A. Jiménez Texas A&M University djimenezacm.org The disparity between last-level cache and memory latencies motivates the search for e cient cache management policies. Recent work in predicting
More information