QEmu TCG Enhancements for Speeding-up the Emulation of SIMD instructions

Size: px
Start display at page:

Download "QEmu TCG Enhancements for Speeding-up the Emulation of SIMD instructions"

Transcription

1 QEmu TCG Enhancements for Speeding-up the Emulation of SIMD instructions Luc Michel, Nicolas Fournel and Frédéric Pétrot TIMA Laboratory System Level Synthesis Group DATE 11 W8 18/03/2011

2 Outline 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 2 / 25

3 Outline About QEmu About SIMD instructions 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 3 / 25

4 About QEmu About SIMD instructions QEmu: a fast and portable dynamic translator Simulation with QEmu Open-source simulation and virtualization software, Dynamic binary translation of the code of a target architecture, To be executed on an host architecture. Precise goal of the present work Accelerate the cross-execution of the Neon instructions. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 4 / 25

5 What are SIMD instructions? About QEmu About SIMD instructions SIMD Instructions: Single Instruction, Multiple Data Same operation on multiple data in parallel, very efficient to optimize some algorithms: parts of media codecs, of radio processes,..., 64 bits or 128 bits data vectors, 8, 16, 32, 64 bits data depending on the instructions. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 5 / 25

6 Example: vadd.i16 About QEmu About SIMD instructions Taken from the ARM Neon instruction set Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 6 / 25

7 Outline The intermediate representation The helpers 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 7 / 25

8 The intermediate representation The helpers The intermediate representation of QEmu The intermediate representation of QEmu Independent intermediate representation consists of micro-operations. add i32 mov i32 or i32 Two steps translation 1 Target architecture code micro-operations, 2 micro-operations host architecture code. Intermediate representation benefits Independence between targets and hosts architectures. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 8 / 25

9 Binary translation example The intermediate representation The helpers Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 9 / 25

10 The intermediate representation The helpers Neon instructions translation method: the helpers The helpers C functions, simulate an instruction, Compiled as a part of QEmu, Called when translating the corresponding Neon instruction. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 10 / 25

11 Example with a helper The intermediate representation The helpers Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 11 / 25

12 Helpers overhead The intermediate representation The helpers Helpers overhead Function call, Adapting the arguments, Passing the arguments, Getting the result. Multiple calls because each 64b/128b vector split into 32b parts Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 12 / 25

13 Outline A solution to improve the translation Intermediate representation extension choices 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 13 / 25

14 A solution to improve the translation A solution to improve the translation Intermediate representation extension choices The idea Be able to take advantage of the host SIMD capabilities, Add some SIMD micro-operations to the QEmu IR, Translate these micro-operations to host SIMD instructions. The practical example of this work ARM Neon instruction set Intel x86 MMX/SSE instruction set. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 14 / 25

15 How to extend the IR A solution to improve the translation Intermediate representation extension choices Choose how to extend the QEmu IR Adding a micro-operation for each target instruction, Keep a little IR and add only elementary micro-operations. Our choice Try to keep the IR as simple as possible. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 15 / 25

16 A solution to improve the translation Intermediate representation extension choices Examples of mapping between Neon and MMX/SSE Direct mapping between two instructions The most favorable case, micro-operation with the semantic of these two instructions. Mapping between vadd.i16 (Neon) and paddw (MMX/SSE) Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 16 / 25

17 A solution to improve the translation Intermediate representation extension choices Examples of mapping between Neon and MMX/SSE A Neon instruction emits multiple micro-operations The Neon instruction is not elementary, split into several elementary micro-operations. Translating the vsra.u32 (Neon) instruction Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 17 / 25

18 A solution to improve the translation Intermediate representation extension choices Examples of mapping between Neon and MMX/SSE A micro-operation generates multiple host instructions No equivalent for this micro-operation on the host, micro-operation behavior reproduced with host instructions, Harder to perform with QEmu than previous case. The simd 128 shl i8 micro-op emits several host instructions Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 18 / 25

19 Outline Tests protocol 1 About QEmu About SIMD instructions 2 The intermediate representation The helpers 3 Improving Neon instructions translation A solution to improve the translation Intermediate representation extension choices 4 Tests protocol Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 19 / 25

20 What kind of tests? Tests protocol Unitary tests Ensure correctness of the translation, detect regression during the development phase. Execution time. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 20 / 25

21 Tests environment Tests protocol Linux in QEmu Minimalist Linux system, Cross-compilation toolchain to compile some programs for the test system. Real BeagleBoard system Board embedding an ARM Cortex-A8 CPU with Neon extension, Used to validate our unitary tests. Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 21 / 25

22 Performance tests Tests protocol The three chosen instructions vadd.i16, vsra.u16, vshl.u8. For each instruction assembly functions, containing 0% to 100% of this Neon instruction, filled with classical instructions, executed several times in a loop, total execution time measured for the helpers and mapping strategies Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 22 / 25

23 Performance tests results Tests protocol Relative execution time (%) compared to helpers vadd.i16 vsra.u16 vshl.u SIMD instructions (%) Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 23 / 25

24 Take away message Tests protocol Conclusion Results are very encouraging, but Amdahl s law still rules What to do next? Extend the implementation to more SIMD instruction sets, Probably with the help of automation tools Call to QEmu development community Should this approach be promoted into mainstream QEmu? Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 24 / 25

25 Thanks for your attention Tests protocol And now ready to answer your questions! Luc Michel, Nicolas Fournel and Frédéric Pétrot QEmu TCG Enhancements for SIMD support 25 / 25

The challenge of SVE in QEMU. Alex Bennée Senior Virtualization Engineer

The challenge of SVE in QEMU. Alex Bennée Senior Virtualization Engineer The challenge of SVE in QEMU Alex Bennée Senior Virtualization Engineer alex.bennee@linaro.org What is ARM s SVE? Why implement SVE in QEMU? How does QEMU s TCG work? Challenges for QEMU s emulation Current

More information

Prototyping Using Dynamic Binary Translation

Prototyping Using Dynamic Binary Translation Multiprocessor System-on-Chip Prototyping Using Dynamic Binary Translation 18 Frédéric Pétrot, Luc Michel, and Clément Deschamps Abstract Dynamic binary translation is a processor emulation technology

More information

Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation

Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation Aimen Bouchhima, Patrice Gerin and Frédéric Pétrot System-Level Synthesis Group TIMA Laboratory 46, Av Félix

More information

Fig. 1: Example of source level simulation [1] this pointer points to. FastVeri [2] converts software code into a virtual CPU model in SystemC. To kee

Fig. 1: Example of source level simulation [1] this pointer points to. FastVeri [2] converts software code into a virtual CPU model in SystemC. To kee Cache Simulation for Instruction Set Simulator QEMU Tran Van Dung, Ittetsu Taniguchi, and Hiroyuki Tomiyama Graduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Noji-higashi, Kusatsu,

More information

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Frédéric Pétrot and Nicolas Fournel System Level Synthesis

More information

ARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management

ARM Powered SoCs OpenEmbedded: a framework for toolcha. generation and rootfs management ARM Powered SoCs OpenEmbedded: a framework for toolchain generation and rootfs management jacopo @ Admstaff Reloaded 12-2010 An overview on commercial ARM-Powered SOCs Many low-cost ARM powered devices

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Loop Aware IR-Level Annotation Framework for Performance Estimation in Native Simulation

Loop Aware IR-Level Annotation Framework for Performance Estimation in Native Simulation Loop Aware IR-Level Annotation Framework for Performance Estimation in Native Simulation Omayma matoussi Frédéric Pétrot TIMA Laboratory France 01/17/2017 1 / 43 Outline Introduction Software back-annotation

More information

First QEMU Users Forum

First QEMU Users Forum Cooperative Computing & Communication Laboratory First QEMU Users Forum Alpexpo Grenoble, March 18 th 2011 Frédéric Pétrot & Wolfgang Mueller What is QEMU? Open source library for hardware emulation and

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje!

SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje! SIMD Instruction Set Extensions for KECCAK with Applications to SHA-3, Keyak and Ketje! Hemendra K. Rawat and Patrick Schaumont! Virginia tech, Blacksburg, USA! {hrawat, schaum}@vt.edu! 1 Motivation q

More information

Automatic Instrumentation Technique of Embedded Software for High Level Hardware/Software Co-Simulation

Automatic Instrumentation Technique of Embedded Software for High Level Hardware/Software Co-Simulation Automatic Instrumentation Technique of Embedded Software for High Level Hardware/Software Co-Simulation Aimen Bouchhima, Patrice Gerin and Frédéric Pétrot System Level Synthesis Group, TIMA Laboratory

More information

Lecture 11 - Portability and Optimizations

Lecture 11 - Portability and Optimizations Lecture 11 - Portability and Optimizations This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

More information

Hardware-Accelerated Dynamic Binary Translation

Hardware-Accelerated Dynamic Binary Translation Hardware-Accelerated Dynamic Binary Translation Rokicki Simon - Irisa / Université de Rennes 1 Steven Derrien - Irisa / Université de Rennes 1 Erven Rohou - Inria Embedded Systems Tight constraints in

More information

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels Virtualization Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels 1 What is virtualization? Creating a virtual version of something o Hardware, operating system, application, network, memory,

More information

Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator

Pydgin for RISC-V: A Fast and Productive Instruction-Set Simulator for RISC-V: A Fast and Productive Instruction-Set Simulator Berkin Ilbeyi In collaboration with Derek Lockhart (Google), and Christopher Batten 3rd RISC-V Workshop, Jan 2016 Cornell University Computer

More information

2010 Summer Answers [OS I]

2010 Summer Answers [OS I] CS2503 A-Z Accumulator o Register where CPU stores intermediate arithmetic results. o Speeds up process by not having to store these results in main memory. Addition o Carried out by the ALU. o ADD AX,

More information

Lecture 5. KVM for ARM. Christoffer Dall and Jason Nieh. 5 November, Operating Systems Practical. OSP Lecture 5, KVM for ARM 1/42

Lecture 5. KVM for ARM. Christoffer Dall and Jason Nieh. 5 November, Operating Systems Practical. OSP Lecture 5, KVM for ARM 1/42 Lecture 5 KVM for ARM Christoffer Dall and Jason Nieh Operating Systems Practical 5 November, 2014 OSP Lecture 5, KVM for ARM 1/42 Contents Virtualization KVM Virtualization on ARM KVM/ARM: System architecture

More information

Porting BLIS to new architectures Early experiences

Porting BLIS to new architectures Early experiences 1st BLIS Retreat. Austin (Texas) Early experiences Universidad Complutense de Madrid (Spain) September 5, 2013 BLIS design principles BLIS = Programmability + Performance + Portability Share experiences

More information

SWAR: MMX, SSE, SSE 2 Multiplatform Programming

SWAR: MMX, SSE, SSE 2 Multiplatform Programming SWAR: MMX, SSE, SSE 2 Multiplatform Programming Relatore: dott. Matteo Roffilli roffilli@csr.unibo.it 1 What s SWAR? SWAR = SIMD Within A Register SIMD = Single Instruction Multiple Data MMX,SSE,SSE2,Power3DNow

More information

Terminology & Basic Concepts

Terminology & Basic Concepts Terminology & Basic Concepts Language Processors The basic model of a language processor is the black box translator (or transducer) Has one input stream, one output stream, and a black box (program) that

More information

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers ARM Architecture ARM Ltd! Founded in November 1990! Spun out of Acorn Computers! Designs the ARM range of RISC processor cores! Licenses ARM core designs to semiconductor partners who fabricate and sell

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY

Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory gmkurtzer@lbl.gov CONTAINERS IN HPC WITH SINGULARITY A QUICK REVIEW OF THE LANDSCAPE Many types of virtualization

More information

Virtual Machines Measure Up

Virtual Machines Measure Up Virtual Machines Measure Up Graduate Operating Systems, Fall 2005 Final Project Presentation John Staton Karsten Steinhaeuser University of Notre Dame December 15, 2005 Outline Problem Description Virtual

More information

Unleash the DSP performance of Arm Cortex processors

Unleash the DSP performance of Arm Cortex processors Unleash the DSP performance of Arm Cortex processors Arm Tech Symposia 2017 Lionel Belnet Senior Product Manager Agenda Unleash the DSP performance of Cortex processors 1 Introducing Arm Cortex technology

More information

Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang

Lecture 25: Interrupt Handling and Multi-Data Processing. Spring 2018 Jason Tang Lecture 25: Interrupt Handling and Multi-Data Processing Spring 2018 Jason Tang 1 Topics Interrupt handling Vector processing Multi-data processing 2 I/O Communication Software needs to know when: I/O

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 2

ECE 571 Advanced Microprocessor-Based Design Lecture 2 ECE 571 Advanced Microprocessor-Based Design Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 January 2016 Announcements HW#1 will be posted tomorrow I am handing out

More information

Alex Bennée stsquad on #qemu Virtualization Linaro Projects: QEMU, KVM, ARM 2. 1

Alex Bennée stsquad on #qemu Virtualization Linaro Projects: QEMU, KVM, ARM 2. 1 VECTORS MEET VIRTUALIZATION ALEX BENNÉE FOSDEM 2018 1 INTRODUCTION Alex Bennée alex.bennee@linaro.org stsquad on #qemu Virtualization Developer @ Linaro Projects: QEMU, KVM, ARM 2. 1 WHAT IS QEMU? From:

More information

A Universal Parallel Front End for Execution Driven Microarchitecture Simulation

A Universal Parallel Front End for Execution Driven Microarchitecture Simulation A Universal Parallel Front End for Execution Driven Microarchitecture Simulation Chad D. Kersey Sudhakar Yalamanchili Georgia Institute of Technology Arun Rodrigues Sandia National Laboratories Outline

More information

Introduction to Compilers and Language Design

Introduction to Compilers and Language Design Introduction to Compilers and Language Design Copyright 2018 Douglas Thain. Hardcover ISBN: 978-0-359-13804-3 Paperback ISBN: 978-0-359-14283-5 First edition. Anyone is free to download and print the PDF

More information

W4118: PC Hardware and x86. Junfeng Yang

W4118: PC Hardware and x86. Junfeng Yang W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more

More information

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities--

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 28 th, 2018 0 Copyright 2018 FUJITSU LIMITED Outline of

More information

Cost of Your Programs

Cost of Your Programs Department of Computer Science and Engineering Chinese University of Hong Kong In the class, we have defined the RAM computation model. In turn, this allowed us to define rigorously algorithms and their

More information

Speeding up the Booting Time of a Toro Appliance

Speeding up the Booting Time of a Toro Appliance Speeding up the Booting Time of a Toro Appliance Matias E. Vara Larsen www.torokernel.io matiasevara@gmail.com Application-oriented Kernel Toro Kernel Process Memory Devices Filesystem Networking Toro

More information

Introduction. L25: Modern Compiler Design

Introduction. L25: Modern Compiler Design Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

Dynamic Binary Instrumentation: Introduction to Pin

Dynamic Binary Instrumentation: Introduction to Pin Dynamic Binary Instrumentation: Introduction to Pin Instrumentation A technique that injects instrumentation code into a binary to collect run-time information 2 Instrumentation A technique that injects

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Efficient and Retargetable Dynamic Binary Translation

Efficient and Retargetable Dynamic Binary Translation Efficient and Retargetable Dynamic Binary Translation Ding-Yong Hong April 2013 Computer Science National Tsing Hua University Submitted in partial fulfillment of the requirements for the degree of Doctor

More information

Exploiting Longer SIMD Lanes in Dynamic Binary Translation

Exploiting Longer SIMD Lanes in Dynamic Binary Translation Exploiting Longer SIMD Lanes in Dynamic Binary Translation Ding-Yong Hong 1, Sheng-Yu Fu 2, Yu-Ping Liu 2, Jan-Jan Wu 1 and Wei-Chung Hsu 2 1 Institute of Information Science, Academia Sinica Email: {dyhong,wuj}@iis.sinica.edu.tw

More information

CHAPTER 1 Introduction to Computers and Java

CHAPTER 1 Introduction to Computers and Java CHAPTER 1 Introduction to Computers and Java Copyright 2016 Pearson Education, Inc., Hoboken NJ Chapter Topics Chapter 1 discusses the following main topics: Why Program? Computer Systems: Hardware and

More information

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator).

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator). Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.

More information

An overview of virtual machine architecture

An overview of virtual machine architecture An overview of virtual machine architecture Outline History Standardized System Components Virtual Machine Basics Process VMs System VMs Virtualizing Process Summary and Taxonomy History In ancient times:

More information

Computers in Engineering COMP 208. Computer Structure. Computer Architecture. Computer Structure Michael A. Hawker

Computers in Engineering COMP 208. Computer Structure. Computer Architecture. Computer Structure Michael A. Hawker Computers in Engineering COMP 208 Computer Structure Michael A. Hawker Computer Structure We will briefly look at the structure of a modern computer That will help us understand some of the concepts that

More information

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2016 1 ISA is the HW/SW

More information

Hands-on with the Sitara Linux SDK

Hands-on with the Sitara Linux SDK Hands-on with the Sitara Linux SDK This presentation provides a hands-on overview of the Sitara Linux SDK. It focuses on the software and tools found in the SDK and how to use these tools to develop for

More information

Making Dynamic Instrumentation Great Again

Making Dynamic Instrumentation Great Again Making Dynamic Instrumentation Great Again Malware Research Team @ @xabiugarte [advertising space ] Deep Packer Inspector https://packerinspector.github.io https://packerinspector.com Many instrumentation

More information

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD

SIMD. Utilization of a SIMD unit in the OS Kernel. Shogo Saito 1 and Shuichi Oikawa 2 2. SIMD. SIMD (Single SIMD SIMD SIMD SIMD OS SIMD 1 2 SIMD (Single Instruction Multiple Data) SIMD OS (Operating System) SIMD SIMD OS Utilization of a SIMD unit in the OS Kernel Shogo Saito 1 and Shuichi Oikawa 2 Nowadays, it is very common that

More information

Is dynamic compilation possible for embedded system?

Is dynamic compilation possible for embedded system? Is dynamic compilation possible for embedded system? Scopes 2015, St Goar Victor Lomüller, Henri-Pierre Charles CEA DACLE / Grenoble www.cea.fr June 2 2015 Introduction : Wake Up Questions Session FAQ

More information

OpenMP Device Offloading to FPGA Accelerators. Lukas Sommer, Jens Korinth, Andreas Koch

OpenMP Device Offloading to FPGA Accelerators. Lukas Sommer, Jens Korinth, Andreas Koch OpenMP Device Offloading to FPGA Accelerators Lukas Sommer, Jens Korinth, Andreas Koch Motivation Increasing use of heterogeneous systems to overcome CPU power limitations 2017-07-12 OpenMP FPGA Device

More information

ARM System Design. Aim: to introduce. ARM-based embedded system design the ARM and Thumb instruction sets. the ARM software development toolkit

ARM System Design. Aim: to introduce. ARM-based embedded system design the ARM and Thumb instruction sets. the ARM software development toolkit Aim: to introduce ARM System Design ARM-based embedded system design the ARM and Thumb instruction sets including hands-on programming sessions the ARM software development toolkit used in the hands-on

More information

ARM NEON Assembly Optimization

ARM NEON Assembly Optimization NEON Assembly Optimization Dae-Hwan Kim Department of Computer and Information, Suwon Science College, 288 Seja-ro, Jeongnam-myun, Hwaseong-si, Gyeonggi-do, Rep. of Korea kimdh@ssc.ac.kr Abstract is one

More information

The X86 Assembly Language Instruction Nop Means

The X86 Assembly Language Instruction Nop Means The X86 Assembly Language Instruction Nop Means As little as 1 CPU cycle is "wasted" to execute a NOP instruction (the exact and other "assembly tricks", as explained also in this thread on Programmers.

More information

T-EMU 2.0: The Next Generation LLVM Based Micro-Processor Emulator. Dr. Mattias Holm

T-EMU 2.0: The Next Generation LLVM Based Micro-Processor Emulator. Dr. Mattias Holm T-EMU 2.0: The Next Generation LLVM Based Micro-Processor Emulator Dr. Mattias Holm Outline Emulation Primer T-EMU 2.0 History Architectural Overview TableGen and LLVM use Future Directions

More information

OptiCode: Machine Code Deobfuscation for Malware Analysis

OptiCode: Machine Code Deobfuscation for Malware Analysis OptiCode: Machine Code Deobfuscation for Malware Analysis NGUYEN Anh Quynh, COSEINC CONFidence, Krakow - Poland 2013, May 28th 1 / 47 Agenda 1 Obfuscation problem in malware analysis

More information

ELC4438: Embedded System Design Embedded Processor

ELC4438: Embedded System Design Embedded Processor ELC4438: Embedded System Design Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University 1. Processor Architecture General PC Von Neumann Architecture a.k.a. Princeton Architecture

More information

CPS104 Computer Organization Lecture 1. CPS104: Computer Organization. Meat of the Course. Robert Wagner

CPS104 Computer Organization Lecture 1. CPS104: Computer Organization. Meat of the Course. Robert Wagner CPS104 Computer Organization Lecture 1 Robert Wagner Slides available on: http://www.cs.duke.edu/~raw/cps104/lectures 1 CPS104: Computer Organization Instructor: Robert Wagner Office: LSRC D336, 660-6536

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 COSCOⅣ Heterogeneous SoCs M5171111 HASEGAWA TORU M5171112 IDONUMA TOSHIICHI May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1 Contents Background Heterogeneous technology May 28, 2014 COMPUTER SYSTEM COLLOQUIUM

More information

Organising benchmarking LLVM-based compiler: Arm experience

Organising benchmarking LLVM-based compiler: Arm experience Organising benchmarking LLVM-based compiler: Arm experience Evgeny Astigeevich LLVM Dev Meeting April 2018 2018 Arm Limited Terminology Upstream: everything on llvm.org side. Downstream: everything on

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

CPS104 Computer Organization Lecture 1

CPS104 Computer Organization Lecture 1 CPS104 Computer Organization Lecture 1 Robert Wagner Slides available on: http://www.cs.duke.edu/~raw/cps104/lectures 1 CPS104: Computer Organization Instructor: Robert Wagner Office: LSRC D336, 660-6536

More information

CSE A215 Assembly Language Programming for Engineers

CSE A215 Assembly Language Programming for Engineers CSE A215 Assembly Language Programming for Engineers Lecture 6 MIPS vs. ARM and Number Representation Part-2 (COD Chapter 3) September 20, 2012 Sam Siewert Comparison of MIPS32 and ARM General Purpose

More information

UFCETW-20-2 Examination Answer all questions in Section A (60 marks) and 2 questions from Section B (40 marks)

UFCETW-20-2 Examination Answer all questions in Section A (60 marks) and 2 questions from Section B (40 marks) Embedded Systems Programming Exam 20010-11 Answer all questions in Section A (60 marks) and 2 questions from Section B (40 marks) Section A answer all questions (60%) A1 Embedded Systems: ARM Appendix

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization Review I Prof. Michel A. Kinsy Computing: The Art of Abstraction Application Algorithm Programming Language Operating System/Virtual Machine Instruction Set Architecture (ISA)

More information

ECE 471 Embedded Systems Lecture 3

ECE 471 Embedded Systems Lecture 3 ECE 471 Embedded Systems Lecture 3 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 September 2018 Announcements New classroom: Stevens 365 HW#1 was posted, due Friday Reminder:

More information

QEMU: Architecture and Internals Lecture for the Embedded Systems Course CSD, University of Crete (April 18, 2016)

QEMU: Architecture and Internals Lecture for the Embedded Systems Course CSD, University of Crete (April 18, 2016) QEMU: Architecture and Internals Lecture for the Embedded Systems Course CSD, University of Crete (April 18, 2016) ManolisMarazakis (maraz@ics.forth.gr) Institute of Computer Science (ICS) Foundation for

More information

Intel X86 Assembler Instruction Set Opcode Table

Intel X86 Assembler Instruction Set Opcode Table Intel X86 Assembler Instruction Set Opcode Table x86 Instruction Set Reference. Derived from the September 2014 version of the Intel 64 and IA-32 LGDT, Load Global/Interrupt Descriptor Table Register.

More information

Porting & Optimising Code 32-bit to 64-bit

Porting & Optimising Code 32-bit to 64-bit Porting & Optimising Code 32-bit to 64-bit Matthew Gretton-Dann Technical Lead - Toolchain Working Group Linaro Connect, Dublin July 2013 A Presentation of Four Parts Register Files Structure Layout &

More information

Describe The Addressing Modes Using Proper Instruction Format

Describe The Addressing Modes Using Proper Instruction Format Describe The Addressing Modes Using Proper Instruction Format Many of the measurements are presented using a small set of benchmarks, B _RISC_ takes advantage of the similarities to describe eight instruction

More information

Building Ultra-Low Power Wearable SoCs

Building Ultra-Low Power Wearable SoCs Building Ultra-Low Power Wearable SoCs 1 Wearable noun An item that can be worn adjective Easy to wear, suitable for wearing 2 Wearable Opportunity: Fastest Growing Market Segment Projected Growth from

More information

Throughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform

Throughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform Throughput Exploration and Optimization of a Consumer Camera Interface for a Reconfigurable Platform By: Floris Driessen (f.c.driessen@student.tue.nl) Introduction 1 Video applications on embedded platforms

More information

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases Steve Steele, ARM 1 Today s Computational Challenges Trends Growing display sizes and resolutions, richer

More information

ECE 598 Advanced Operating Systems Lecture 4

ECE 598 Advanced Operating Systems Lecture 4 ECE 598 Advanced Operating Systems Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Announcements HW#1 was due HW#2 was posted, will be tricky Let me know

More information

Code Generation for QEMU-SystemC Cosimulation from SysML

Code Generation for QEMU-SystemC Cosimulation from SysML Code Generation for QEMU- Cosimulation from SysML Da He, Fabian Mischkalla, Wolfgang Mueller University of Paderborn/C-Lab, Fuerstenallee 11, 33102 Paderborn, Germany {dahe, fabianm, wolfgang}@c-lab.de

More information

HKG18-TR08: Upstreaming SVE in QEMU. Alex Bennée and Richard Henderson

HKG18-TR08: Upstreaming SVE in QEMU. Alex Bennée and Richard Henderson HKG18-TR08: Upstreaming SVE in QEMU Alex Bennée and Richard Henderson Contents Introductions The QEMU Project Development Process Upstreaming Criteria SVE Work Who we are What QEMU is Native Vectors for

More information

CARE: the Comprehensive Archiver for Reproducible Execution

CARE: the Comprehensive Archiver for Reproducible Execution Introducing CARE CARE: the Comprehensive Archiver for Reproducible Execution STMicroelectronics, Grenoble, France TRUST 14 June 12th, 2014 Outline Introducing CARE 1 Introducing CARE 2 Typical use cases

More information

Introduction to Symbolic Execution

Introduction to Symbolic Execution Introduction to Symbolic Execution Classic Symbolic Execution 1 Problem 1: Infinite execution path Problem 2: Unsolvable formulas 2 Problem 3: symbolic modeling External function calls and system calls

More information

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

Take GPU Processing Power Beyond Graphics with Mali GPU Computing Take GPU Processing Power Beyond Graphics with Mali GPU Computing Roberto Mijat Visual Computing Marketing Manager August 2012 Introduction Modern processor and SoC architectures endorse parallelism as

More information

An In Depth Look at VOLK

An In Depth Look at VOLK An In Depth Look at VOLK The Vector-Optimize Library of Kernels Nathan West U.S. Naval Research Laboratory 26 August 2015 (U) 26 August 2015 1 / 19 A brief look at VOLK organization VOLK is a sub-project

More information

Software Ecosystem for Arm-based HPC

Software Ecosystem for Arm-based HPC Software Ecosystem for Arm-based HPC CUG 2018 - Stockholm Florent.Lebeau@arm.com Ecosystem for HPC List of components needed: Linux OS availability Compilers Libraries Job schedulers Debuggers Profilers

More information

McSema: Static Translation of X86 Instructions to LLVM

McSema: Static Translation of X86 Instructions to LLVM McSema: Static Translation of X86 Instructions to LLVM ARTEM DINABURG, ARTEM@TRAILOFBITS.COM ANDREW RUEF, ANDREW@TRAILOFBITS.COM About Us Artem Security Researcher blog.dinaburg.org Andrew PhD Student,

More information

QuartzV: Bringing Quality of Time to Virtual Machines

QuartzV: Bringing Quality of Time to Virtual Machines QuartzV: Bringing Quality of Time to Virtual Machines Sandeep D souza and Raj Rajkumar Carnegie Mellon University IEEE RTAS @ CPS Week 2018 1 A Shared Notion of Time Coordinated Actions Ordering of Events

More information

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)

More information

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100 GATE- 2016-17 Postal Correspondence 1 Compiler Design Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,

More information

Computer Organization & Assembly Language Programming (CSE 2312)

Computer Organization & Assembly Language Programming (CSE 2312) Computer Organization & Assembly Language Programming (CSE 2312) Lecture 1 Taylor Johnson Outline Administration Course Objectives Computer Organization Overview August 21, 2014 CSE2312, Fall 2014 2 Administration

More information

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin

More information

17. Instruction Sets: Characteristics and Functions

17. Instruction Sets: Characteristics and Functions 17. Instruction Sets: Characteristics and Functions Chapter 12 Spring 2016 CS430 - Computer Architecture 1 Introduction Section 12.1, 12.2, and 12.3 pp. 406-418 Computer Designer: Machine instruction set

More information

INF5110 Compiler Construction

INF5110 Compiler Construction INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction Introduction Compiler architecture & phases Bootstrapping and cross-compilation 2 / 33 Outline 1. Introduction Introduction

More information

OPERATING SYSTEM. Chapter 4: Threads

OPERATING SYSTEM. Chapter 4: Threads OPERATING SYSTEM Chapter 4: Threads Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Objectives To

More information

LLVM, Clang and Embedded Linux Systems. Bruno Cardoso Lopes University of Campinas

LLVM, Clang and Embedded Linux Systems. Bruno Cardoso Lopes University of Campinas LLVM, Clang and Embedded Linux Systems Bruno Cardoso Lopes University of Campinas What s LLVM? What s LLVM Compiler infrastructure Frontend (clang) IR Optimizer Backends JIT Tools Assembler Disassembler

More information

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.

5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture. Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction

More information

A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture. Anthony Fox and Magnus O. Myreen University of Cambridge

A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture. Anthony Fox and Magnus O. Myreen University of Cambridge A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen University of Cambridge Background Instruction set architectures play an important role in

More information

Chapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018

Chapters 3. ARM Assembly. Embedded Systems with ARM Cortext-M. Updated: Wednesday, February 7, 2018 Chapters 3 ARM Assembly Embedded Systems with ARM Cortext-M Updated: Wednesday, February 7, 2018 Programming languages - Categories Interpreted based on the machine Less complex, not as efficient Efficient,

More information

Android System Development Training 4-day session

Android System Development Training 4-day session Android System Development Training 4-day session Title Android System Development Training Overview Understanding the Android Internals Understanding the Android Build System Customizing Android for a

More information

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX

INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX INTRODUCTION TO MATLAB PARALLEL COMPUTING TOOLBOX Keith Ma ---------------------------------------- keithma@bu.edu Research Computing Services ----------- help@rcs.bu.edu Boston University ----------------------------------------------------

More information

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis. Compiler Structure Source Program Text Phases of Compilation Compilation process is partitioned into a series of four distinct subproblems called phases, each with a separate well-defined translation task

More information

The Changing Face of Edge Compute

The Changing Face of Edge Compute The Changing Face of Edge Compute 2018 Arm Limited Alvin Yang Nov 2018 Market trends acceleration of technology deployment 26 years 4 years 100 billion chips shipped 100 billion chips shipped 1 Trillion

More information