QEmu TCG Enhancements for Speeding-up the Emulation of SIMD instructions
|
|
- Johnathan Hunter
- 5 years ago
- Views:
Transcription
1 QEmu TCG Enhancements for Speeding-up the Emulation of SIMD instructions Luc Michel, Nicolas Fournel and Frédéric Pétrot TIMA Laboratory System Level Synthesis Group DATE 11 W8 18/03/2011
2 Outline 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 2 / 25
3 Outline About QEmu About SIMD instructions 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 3 / 25
4 About QEmu About SIMD instructions QEmu: a fast and portable dynamic translator Simulation with QEmu Open-source simulation and virtualization software, Dynamic binary translation of the code of a target architecture, To be executed on an host architecture. Precise goal of the present work Accelerate the cross-execution of the Neon instructions. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 4 / 25
5 What are SIMD instructions? About QEmu About SIMD instructions SIMD Instructions: Single Instruction, Multiple Data Same operation on multiple data in parallel, very efficient to optimize some algorithms: parts of media codecs, of radio processes,..., 64 bits or 128 bits data vectors, 8, 16, 32, 64 bits data depending on the instructions. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 5 / 25
6 Example: vadd.i16 About QEmu About SIMD instructions Taken from the ARM Neon instruction set Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 6 / 25
7 Outline The intermediate representation The helpers 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 7 / 25
8 The intermediate representation The helpers The intermediate representation of QEmu The intermediate representation of QEmu Independent intermediate representation consists of micro-operations. add i32 mov i32 or i32 Two steps translation 1 Target architecture code micro-operations, 2 micro-operations host architecture code. Intermediate representation benefits Independence between targets and hosts architectures. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 8 / 25
9 Binary translation example The intermediate representation The helpers Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 9 / 25
10 The intermediate representation The helpers Neon instructions translation method: the helpers The helpers C functions, simulate an instruction, Compiled as a part of QEmu, Called when translating the corresponding Neon instruction. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 10 / 25
11 Example with a helper The intermediate representation The helpers Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 11 / 25
12 Helpers overhead The intermediate representation The helpers Helpers overhead Function call, Adapting the arguments, Passing the arguments, Getting the result. Multiple calls because each 64b/128b vector split into 32b parts Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 12 / 25
13 Outline A solution to improve the translation Intermediate representation extension choices 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 13 / 25
14 A solution to improve the translation A solution to improve the translation Intermediate representation extension choices The idea Be able to take advantage of the host SIMD capabilities, Add some SIMD micro-operations to the QEmu IR, Translate these micro-operations to host SIMD instructions. The practical example of this work ARM Neon instruction set Intel x86 MMX/SSE instruction set. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 14 / 25
15 How to extend the IR A solution to improve the translation Intermediate representation extension choices Choose how to extend the QEmu IR Adding a micro-operation for each target instruction, Keep a little IR and add only elementary micro-operations. Our choice Try to keep the IR as simple as possible. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 15 / 25
16 A solution to improve the translation Intermediate representation extension choices Examples of mapping between Neon and MMX/SSE Direct mapping between two instructions The most favorable case, micro-operation with the semantic of these two instructions. Mapping between vadd.i16 (Neon) and paddw (MMX/SSE) Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 16 / 25
17 A solution to improve the translation Intermediate representation extension choices Examples of mapping between Neon and MMX/SSE A Neon instruction emits multiple micro-operations The Neon instruction is not elementary, split into several elementary micro-operations. Translating the vsra.u32 (Neon) instruction Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 17 / 25
18 A solution to improve the translation Intermediate representation extension choices Examples of mapping between Neon and MMX/SSE A micro-operation generates multiple host instructions No equivalent for this micro-operation on the host, micro-operation behavior reproduced with host instructions, Harder to perform with QEmu than previous case. The simd 128 shl i8 micro-op emits several host instructions Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 18 / 25
19 Outline Tests protocol 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 19 / 25
20 What kind of tests? Tests protocol Unitary tests Ensure correctness of the translation, detect regression during the development phase. Execution time. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 20 / 25
21 Tests environment Tests protocol Linux in QEmu Minimalist Linux system, Cross-compilation toolchain to compile some programs for the test system. Real BeagleBoard system Board embedding an ARM Cortex-A8 CPU with Neon extension, Used to validate our unitary tests. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 21 / 25
22 Performance tests Tests protocol The three chosen instructions vadd.i16, vsra.u16, vshl.u8. For each instruction assembly functions, containing 0% to 100% of this Neon instruction, filled with classical instructions, executed several times in a loop, total execution time measured for the helpers and mapping strategies Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 22 / 25
23 Performance tests results Tests protocol Relative execution time (%) compared to helpers vadd.i16 vsra.u16 vshl.u SIMD instructions (%) Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 23 / 25
24 Take away message Tests protocol Conclusion Results are very encouraging, but Amdahl s law still rules What to do next? Extend the implementation to more SIMD instruction sets, Probably with the help of automation tools Call to QEmu development community Should this approach be promoted into mainstream QEmu? Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 24 / 25
25 Thanks for your attention Tests protocol And now ready to answer your questions! Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 25 / 25
The challenge of SVE in QEMU. Alex Bennée Senior Virtualization Engineer
The challenge of SVE in QEMU Alex Bennée Senior Virtualization Engineer alex.bennee@linaro.org What is ARM s SVE? Why implement SVE in QEMU? How does QEMU s TCG work? Challenges for QEMU s emulation Current
More informationPrototyping Using Dynamic Binary Translation
Multiprocessor System-on-Chip Prototyping Using Dynamic Binary Translation 18 Frédéric Pétrot, Luc Michel, and Clément Deschamps Abstract Dynamic binary translation is a processor emulation technology
More informationAutomatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation
Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation Aimen Bouchhima, Patrice Gerin and Frédéric Pétrot System-Level Synthesis Group TIMA Laboratory 46, Av Félix
More informationFig. 1: Example of source level simulation [1] this pointer points to. FastVeri [2] converts software code into a virtual CPU model in SystemC. To kee
Cache Simulation for Instruction Set Simulator QEMU Tran Van Dung, Ittetsu Taniguchi, and Hiroyuki Tomiyama Graduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Noji-higashi, Kusatsu,
More informationNative Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization
Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Frédéric Pétrot and Nicolas Fournel System Level Synthesis
More informationARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management
ARM Powered SoCs OpenEmbedded: a framework for toolchain generation and rootfs management jacopo @ Admstaff Reloaded 12-2010 An overview on commercial ARM-Powered SOCs Many low-cost ARM powered devices
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationLoop Aware IR-Level Annotation Framework for Performance Estimation in Native Simulation
Loop Aware IR-Level Annotation Framework for Performance Estimation in Native Simulation Omayma matoussi Frédéric Pétrot TIMA Laboratory France 01/17/2017 1 / 43 Outline Introduction Software back-annotation
More informationFirst QEMU Users Forum
Cooperative Computing & Communication Laboratory First QEMU Users Forum Alpexpo Grenoble, March 18 th 2011 Frédéric Pétrot & Wolfgang Mueller What is QEMU? Open source library for hardware emulation and
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationSIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje!
SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje! Hemendra K. Rawat and Patrick Schaumont! Virginia tech, Blacksburg, USA! {hrawat, schaum}@vt.edu! 1 Motivation q
More informationAutomatic Instrumentation Technique of Embedded Software for High Level Hardware/Software Co-Simulation
Automatic Instrumentation Technique of Embedded Software for High Level Hardware/Software Co-Simulation Aimen Bouchhima, Patrice Gerin and Frédéric Pétrot System Level Synthesis Group, TIMA Laboratory
More informationLecture 11 - Portability and Optimizations
Lecture 11 - Portability and Optimizations This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
More informationHardware-Accelerated Dynamic Binary Translation
Hardware-Accelerated Dynamic Binary Translation Rokicki Simon - Irisa / Université de Rennes 1 Steven Derrien - Irisa / Université de Rennes 1 Erven Rohou - Inria Embedded Systems Tight constraints in
More informationVirtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels
Virtualization Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels 1 What is virtualization? Creating a virtual version of something o Hardware, operating system, application, network, memory,
More informationPydgin for RISC-V: A Fast and Productive Instruction-Set Simulator
for RISC-V: A Fast and Productive Instruction-Set Simulator Berkin Ilbeyi In collaboration with Derek Lockhart (Google), and Christopher Batten 3rd RISC-V Workshop, Jan 2016 Cornell University Computer
More information2010 Summer Answers [OS I]
CS2503 A-Z Accumulator o Register where CPU stores intermediate arithmetic results. o Speeds up process by not having to store these results in main memory. Addition o Carried out by the ALU. o ADD AX,
More informationLecture 5. KVM for ARM. Christoffer Dall and Jason Nieh. 5 November, Operating Systems Practical. OSP Lecture 5, KVM for ARM 1/42
Lecture 5 KVM for ARM Christoffer Dall and Jason Nieh Operating Systems Practical 5 November, 2014 OSP Lecture 5, KVM for ARM 1/42 Contents Virtualization KVM Virtualization on ARM KVM/ARM: System architecture
More informationPorting BLIS to new architectures Early experiences
1st BLIS Retreat. Austin (Texas) Early experiences Universidad Complutense de Madrid (Spain) September 5, 2013 BLIS design principles BLIS = Programmability + Performance + Portability Share experiences
More informationSWAR: MMX, SSE, SSE 2 Multiplatform Programming
SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow
More informationTerminology & Basic Concepts
Terminology & Basic Concepts Language Processors The basic model of a language processor is the black box translator (or transducer) Has one input stream, one output stream, and a black box (program) that
More informationARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers
ARM Architecture ARM Ltd! Founded in November 1990! Spun out of Acorn Computers! Designs the ARM range of RISC processor cores! Licenses ARM core designs to semiconductor partners who fabricate and sell
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationPresented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY
Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory gmkurtzer@lbl.gov CONTAINERS IN HPC WITH SINGULARITY A QUICK REVIEW OF THE LANDSCAPE Many types of virtualization
More informationVirtual Machines Measure Up
Virtual Machines Measure Up Graduate Operating Systems, Fall 2005 Final Project Presentation John Staton Karsten Steinhaeuser University of Notre Dame December 15, 2005 Outline Problem Description Virtual
More informationUnleash the DSP performance of Arm Cortex processors
Unleash the DSP performance of Arm Cortex processors Arm Tech Symposia 2017 Lionel Belnet Senior Product Manager Agenda Unleash the DSP performance of Cortex processors 1 Introducing Arm Cortex technology
More informationLecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang
Lecture 25: Interrupt Handling and Multi-Data Processing Spring 2018 Jason Tang 1 Topics Interrupt handling Vector processing Multi-data processing 2 I/O Communication Software needs to know when: I/O
More informationECE 571 Advanced Microprocessor-Based Design Lecture 2
ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 January 2016 Announcements HW#1 will be posted tomorrow I am handing out
More informationAlex Bennée stsquad on #qemu Virtualization Linaro Projects: QEMU, KVM, ARM 2. 1
VECTORS MEET VIRTUALIZATION ALEX BENNÉE FOSDEM 2018 1 INTRODUCTION Alex Bennée alex.bennee@linaro.org stsquad on #qemu Virtualization Developer @ Linaro Projects: QEMU, KVM, ARM 2. 1 WHAT IS QEMU? From:
More informationA Universal Parallel Front End for Execution Driven Microarchitecture Simulation
A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili Georgia Institute of Technology Arun Rodrigues Sandia National Laboratories Outline
More informationIntroduction to Compilers and Language Design
Introduction to Compilers and Language Design Copyright 2018 Douglas Thain. Hardcover ISBN: 978-0-359-13804-3 Paperback ISBN: 978-0-359-14283-5 First edition. Anyone is free to download and print the PDF
More informationW4118: PC Hardware and x86. Junfeng Yang
W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more
More informationToward Building up Arm HPC Ecosystem --Fujitsu s Activities--
Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 28 th, 2018 0 Copyright 2018 FUJITSU LIMITED Outline of
More informationCost of Your Programs
Department of Computer Science and Engineering Chinese University of Hong Kong In the class, we have defined the RAM computation model. In turn, this allowed us to define rigorously algorithms and their
More informationSpeeding up the Booting Time of a Toro Appliance
Speeding up the Booting Time of a Toro Appliance Matias E. Vara Larsen www.torokernel.io matiasevara@gmail.com Application-oriented Kernel Toro Kernel Process Memory Devices Filesystem Networking Toro
More informationIntroduction. L25: Modern Compiler Design
Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript
More informationParallelism in Hardware
Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law
More informationDynamic Binary Instrumentation: Introduction to Pin
Dynamic Binary Instrumentation: Introduction to Pin Instrumentation A technique that injects instrumentation code into a binary to collect run-time information 2 Instrumentation A technique that injects
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationEfficient and Retargetable Dynamic Binary Translation
Efficient and Retargetable Dynamic Binary Translation Ding-Yong Hong April 2013 Computer Science National Tsing Hua University Submitted in partial fulfillment of the requirements for the degree of Doctor
More informationExploiting Longer SIMD Lanes in Dynamic Binary Translation
Exploiting Longer SIMD Lanes in Dynamic Binary Translation Ding-Yong Hong 1, Sheng-Yu Fu 2, Yu-Ping Liu 2, Jan-Jan Wu 1 and Wei-Chung Hsu 2 1 Institute of Information Science, Academia Sinica Email: {dyhong,wuj}@iis.sinica.edu.tw
More informationCHAPTER 1 Introduction to Computers and Java
CHAPTER 1 Introduction to Computers and Java Copyright 2016 Pearson Education, Inc., Hoboken NJ Chapter Topics Chapter 1 discusses the following main topics: Why Program? Computer Systems: Hardware and
More informationVon Neumann architecture. The first computers used a single fixed program (like a numeric calculator).
Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.
More informationAn overview of virtual machine architecture
An overview of virtual machine architecture Outline History Standardized System Components Virtual Machine Basics Process VMs System VMs Virtualizing Process Summary and Taxonomy History In ancient times:
More informationComputers in Engineering COMP 208. Computer Structure. Computer Architecture. Computer Structure Michael A. Hawker
Computers in Engineering COMP 208 Computer Structure Michael A. Hawker Computer Structure We will briefly look at the structure of a modern computer That will help us understand some of the concepts that
More informationENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design
ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2016 1 ISA is the HW/SW
More informationHands-on with the Sitara Linux SDK
Hands-on with the Sitara Linux SDK This presentation provides a hands-on overview of the Sitara Linux SDK. It focuses on the software and tools found in the SDK and how to use these tools to develop for
More informationMaking Dynamic Instrumentation Great Again
Making Dynamic Instrumentation Great Again Malware Research Team @ @xabiugarte [advertising space ] Deep Packer Inspector https://packerinspector.github.io https://packerinspector.com Many instrumentation
More informationSIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD
OS SIMD 1 2 SIMD (Single Instruction Multiple Data) SIMD OS (Operating System) SIMD SIMD OS Utilization of a SIMD unit in the OS Kernel Shogo Saito 1 and Shuichi Oikawa 2 Nowadays, it is very common that
More informationIs dynamic compilation possible for embedded system?
Is dynamic compilation possible for embedded system? Scopes 2015, St Goar Victor Lomüller, Henri-Pierre Charles CEA DACLE / Grenoble www.cea.fr June 2 2015 Introduction : Wake Up Questions Session FAQ
More informationOpenMP Device Offloading to FPGA Accelerators. Lukas Sommer, Jens Korinth, Andreas Koch
OpenMP Device Offloading to FPGA Accelerators Lukas Sommer, Jens Korinth, Andreas Koch Motivation Increasing use of heterogeneous systems to overcome CPU power limitations 2017-07-12 OpenMP FPGA Device
More informationARM System Design. Aim: to introduce. ARM-based embedded system design the ARM and Thumb instruction sets. the ARM software development toolkit
Aim: to introduce ARM System Design ARM-based embedded system design the ARM and Thumb instruction sets including hands-on programming sessions the ARM software development toolkit used in the hands-on
More informationARM NEON Assembly Optimization
NEON Assembly Optimization Dae-Hwan Kim Department of Computer and Information, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea kimdh@ssc.ac.kr Abstract is one
More informationThe X86 Assembly Language Instruction Nop Means
The X86 Assembly Language Instruction Nop Means As little as 1 CPU cycle is "wasted" to execute a NOP instruction (the exact and other "assembly tricks", as explained also in this thread on Programmers.
More informationT-EMU 2.0: The Next Generation LLVM Based Micro-Processor Emulator. Dr. Mattias Holm
T-EMU 2.0: The Next Generation LLVM Based Micro-Processor Emulator Dr. Mattias Holm Outline Emulation Primer T-EMU 2.0 History Architectural Overview TableGen and LLVM use Future Directions
More informationOptiCode: Machine Code Deobfuscation for Malware Analysis
OptiCode: Machine Code Deobfuscation for Malware Analysis NGUYEN Anh Quynh, COSEINC CONFidence, Krakow - Poland 2013, May 28th 1 / 47 Agenda 1 Obfuscation problem in malware analysis
More informationELC4438: Embedded System Design Embedded Processor
ELC4438: Embedded System Design Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University 1. Processor Architecture General PC Von Neumann Architecture a.k.a. Princeton Architecture
More informationCPS104 Computer Organization Lecture 1. CPS104: Computer Organization. Meat of the Course. Robert Wagner
CPS104 Computer Organization Lecture 1 Robert Wagner Slides available on: http://www.cs.duke.edu/~raw/cps104/lectures 1 CPS104: Computer Organization Instructor: Robert Wagner Office: LSRC D336, 660-6536
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationFrom CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations
1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became
More informationHeterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1
COSCOⅣ Heterogeneous SoCs M5171111 HASEGAWA TORU M5171112 IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM
More informationOrganising benchmarking LLVM-based compiler: Arm experience
Organising benchmarking LLVM-based compiler: Arm experience Evgeny Astigeevich LLVM Dev Meeting April 2018 2018 Arm Limited Terminology Upstream: everything on llvm.org side. Downstream: everything on
More informationRAID 0 (non-redundant) RAID Types 4/25/2011
Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security
More informationCPS104 Computer Organization Lecture 1
CPS104 Computer Organization Lecture 1 Robert Wagner Slides available on: http://www.cs.duke.edu/~raw/cps104/lectures 1 CPS104: Computer Organization Instructor: Robert Wagner Office: LSRC D336, 660-6536
More informationCSE A215 Assembly Language Programming for Engineers
CSE A215 Assembly Language Programming for Engineers Lecture 6 MIPS vs. ARM and Number Representation Part-2 (COD Chapter 3) September 20, 2012 Sam Siewert Comparison of MIPS32 and ARM General Purpose
More informationUFCETW-20-2 Examination Answer all questions in Section A (60 marks) and 2 questions from Section B (40 marks)
Embedded Systems Programming Exam 20010-11 Answer all questions in Section A (60 marks) and 2 questions from Section B (40 marks) Section A answer all questions (60%) A1 Embedded Systems: ARM Appendix
More informationEC 413 Computer Organization
EC 413 Computer Organization Review I Prof. Michel A. Kinsy Computing: The Art of Abstraction Application Algorithm Programming Language Operating System/Virtual Machine Instruction Set Architecture (ISA)
More informationECE 471 Embedded Systems Lecture 3
ECE 471 Embedded Systems Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 September 2018 Announcements New classroom: Stevens 365 HW#1 was posted, due Friday Reminder:
More informationQEMU: Architecture and Internals Lecture for the Embedded Systems Course CSD, University of Crete (April 18, 2016)
QEMU: Architecture and Internals Lecture for the Embedded Systems Course CSD, University of Crete (April 18, 2016) ManolisMarazakis (maraz@ics.forth.gr) Institute of Computer Science (ICS) Foundation for
More informationIntel X86 Assembler Instruction Set Opcode Table
Intel X86 Assembler Instruction Set Opcode Table x86 Instruction Set Reference. Derived from the September 2014 version of the Intel 64 and IA-32 LGDT, Load Global/Interrupt Descriptor Table Register.
More informationPorting & Optimising Code 32-bit to 64-bit
Porting & Optimising Code 32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain Working Group Linaro Connect, Dublin July 2013 A Presentation of Four Parts Register Files Structure Layout &
More informationDescribe The Addressing Modes Using Proper Instruction Format
Describe The Addressing Modes Using Proper Instruction Format Many of the measurements are presented using a small set of benchmarks, B _RISC_ takes advantage of the similarities to describe eight instruction
More informationBuilding Ultra-Low Power Wearable SoCs
Building Ultra-Low Power Wearable SoCs 1 Wearable noun An item that can be worn adjective Easy to wear, suitable for wearing 2 Wearable Opportunity: Fastest Growing Market Segment Projected Growth from
More informationThroughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform
Throughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform By: Floris Driessen (f.c.driessen@student.tue.nl) Introduction 1 Video applications on embedded platforms
More informationUnleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM
Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases Steve Steele, ARM 1 Today s Computational Challenges Trends Growing display sizes and resolutions, richer
More informationECE 598 Advanced Operating Systems Lecture 4
ECE 598 Advanced Operating Systems Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Announcements HW#1 was due HW#2 was posted, will be tricky Let me know
More informationCode Generation for QEMU-SystemC Cosimulation from SysML
Code Generation for QEMU- Cosimulation from SysML Da He, Fabian Mischkalla, Wolfgang Mueller University of Paderborn/C-Lab, Fuerstenallee 11, 33102 Paderborn, Germany {dahe, fabianm, wolfgang}@c-lab.de
More informationHKG18-TR08: Upstreaming SVE in QEMU. Alex Bennée and Richard Henderson
HKG18-TR08: Upstreaming SVE in QEMU Alex Bennée and Richard Henderson Contents Introductions The QEMU Project Development Process Upstreaming Criteria SVE Work Who we are What QEMU is Native Vectors for
More informationCARE: the Comprehensive Archiver for Reproducible Execution
Introducing CARE CARE: the Comprehensive Archiver for Reproducible Execution STMicroelectronics, Grenoble, France TRUST 14 June 12th, 2014 Outline Introducing CARE 1 Introducing CARE 2 Typical use cases
More informationIntroduction to Symbolic Execution
Introduction to Symbolic Execution Classic Symbolic Execution 1 Problem 1: Infinite execution path Problem 2: Unsolvable formulas 2 Problem 3: symbolic modeling External function calls and system calls
More informationTake GPU Processing Power Beyond Graphics with Mali GPU Computing
Take GPU Processing Power Beyond Graphics with Mali GPU Computing Roberto Mijat Visual Computing Marketing Manager August 2012 Introduction Modern processor and SoC architectures endorse parallelism as
More informationAn In Depth Look at VOLK
An In Depth Look at VOLK The Vector-Optimize Library of Kernels Nathan West U.S. Naval Research Laboratory 26 August 2015 (U) 26 August 2015 1 / 19 A brief look at VOLK organization VOLK is a sub-project
More informationSoftware Ecosystem for Arm-based HPC
Software Ecosystem for Arm-based HPC CUG 2018 - Stockholm Florent.Lebeau@arm.com Ecosystem for HPC List of components needed: Linux OS availability Compilers Libraries Job schedulers Debuggers Profilers
More informationMcSema: Static Translation of X86 Instructions to LLVM
McSema: Static Translation of X86 Instructions to LLVM ARTEM DINABURG, ARTEM@TRAILOFBITS.COM ANDREW RUEF, ANDREW@TRAILOFBITS.COM About Us Artem Security Researcher blog.dinaburg.org Andrew PhD Student,
More informationQuartzV: Bringing Quality of Time to Virtual Machines
QuartzV: Bringing Quality of Time to Virtual Machines Sandeep D souza and Raj Rajkumar Carnegie Mellon University IEEE RTAS @ CPS Week 2018 1 A Shared Notion of Time Coordinated Actions Ordering of Events
More informationHETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE
HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)
More informationCompiler Design. Computer Science & Information Technology (CS) Rank under AIR 100
GATE- 2016-17 Postal Correspondence 1 Compiler Design Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,
More informationComputer Organization & Assembly Language Programming (CSE 2312)
Computer Organization & Assembly Language Programming (CSE 2312) Lecture 1 Taylor Johnson Outline Administration Course Objectives Computer Organization Overview August 21, 2014 CSE2312, Fall 2014 2 Administration
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More information17. Instruction Sets: Characteristics and Functions
17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set
More informationINF5110 Compiler Construction
INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction Introduction Compiler architecture & phases Bootstrapping and cross-compilation 2 / 33 Outline 1. Introduction Introduction
More informationOPERATING SYSTEM. Chapter 4: Threads
OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To
More informationLLVM, Clang and Embedded Linux Systems. Bruno Cardoso Lopes University of Campinas
LLVM, Clang and Embedded Linux Systems Bruno Cardoso Lopes University of Campinas What s LLVM? What s LLVM Compiler infrastructure Frontend (clang) IR Optimizer Backends JIT Tools Assembler Disassembler
More information5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.
Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction
More informationA Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture. Anthony Fox and Magnus O. Myreen University of Cambridge
A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen University of Cambridge Background Instruction set architectures play an important role in
More informationChapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018
Chapters 3 ARM Assembly Embedded Systems with ARM Cortext-M Updated: Wednesday, February 7, 2018 Programming languages - Categories Interpreted based on the machine Less complex, not as efficient Efficient,
More informationAndroid System Development Training 4-day session
Android System Development Training 4-day session Title Android System Development Training Overview Understanding the Android Internals Understanding the Android Build System Customizing Android for a
More informationINTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX
INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX Keith Ma ---------------------------------------- keithma@bu.edu Research Computing Services ----------- help@rcs.bu.edu Boston University ----------------------------------------------------
More informationCompiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.
Compiler Structure Source Program Text Phases of Compilation Compilation process is partitioned into a series of four distinct subproblems called phases, each with a separate well-defined translation task
More informationThe Changing Face of Edge Compute
The Changing Face of Edge Compute 2018 Arm Limited Alvin Yang Nov 2018 Market trends acceleration of technology deployment 26 years 4 years 100 billion chips shipped 100 billion chips shipped 1 Trillion
More information