Re-constructing High-Level Information for Language-specific Binary Re-optimization
|
|
- Jacob Baldwin
- 5 years ago
- Views:
Transcription
1 CGO 2016 Re-constructing High-Level Information for Language-specific Binary Re-optimization Toshihiko Koju IBM Research - Tokyo Reid Copeland IBM Canada Motohiro Kawahito IBM Research - Tokyo Moriyoshi Ohara IBM Research - Tokyo 1
2 Summary Language-specific binary optimizer can achieve competitive performance relative to the state-of-the-art source compiler. source code latest compiler re-compiled binary earlier compiler Achieved +40.1% by our binary optimizer vs % by the latest source compiler. original binary binary optimizer optimized binary language-specific contextual info. 2
3 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 3
4 Compilation Technologies are becoming Increasingly Important! Recent processors are equipped with specialized HW units. e.g. SIMD, FPGA, GPU, DFP (Decimal Floating-Point)! Re-compilation with the latest compilation technologies is typically required to exploit them. single-thread perf. of CPU downlevel processor with specialized HW units w/o specialized HW units CPU generation re-compilation Specialized HW units complement diminishing single-thread performance improvements. latest processor DFP SIMD FPGA GPU 4 existing application latest compiler re-compiled application
5 Compiler Migration is Often Challenging for Large Enterprise Customers! Adapting to changes in the following can be a significant undertaking: language, compiler option, build environment, runtime environment downlevel processor changes in language behavior, compiler options, build environment, runtime environment compiler migration re-compilation latest processor DFP SIMD FPGA GPU existing application latest compiler re-compiled application 5
6 Ease Migration by Binary Optimization! We have developed a production-level binary optimizer for mainframe COBOL applications. IBM Automatic Binary Optimizer for z/os.! The binary optimizer utilizes the same optimization engine of the latest source code COBOL compiler. Source code re-compilation: compiler migration changes in language behavior, compiler options, build environment, runtime environment source code Binary re-optimization: existing binary 6 latest compiler Provide strong compatibility with the current environment. binary optimizer re-compiled binary optimized binary
7 Traditional Approach does not Work! Naive translation from machine instructions to IR (intermediate Representation) degraded the performance. This is typically for binary translators based on general compiler engines. original binary naive translation w/o optimizations naive translation with optimizations 7 higher is better Relative Performance 180.0% 160.0% 140.0% 120.0% 100.0% 80.0% 60.0% 40.0% 20.0% 0.0% 155.2% 100.0% 90.0% 62.3% cob42.opt binopt.o0 binopt.o1 cob5.dev source code re-compilation
8 Our Goal! Achieve competitive performance relative to the state-of-theart source code compiler. original binary naive translation w/o optimizations naive translation with optimizations 8 higher is better Relative Performance 180.0% 160.0% 140.0% 120.0% 100.0% 80.0% 60.0% 40.0% 20.0% 0.0% 155.2% fill in the gap 100.0% 90.0% 62.3% cob42.opt binopt.o0 binopt.o1 cob5.dev source code re-compilation our results
9 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 9
10 We Need High-level Information to Optimize Binaries! Compiler optimizations require "HLI" (high-level information). e.g. variables, data types, scopes, constants, library semantics! Only limited HLI is provided by the naive translation. Resulting in limited optimization opportunities. poor HLI (high-level information): variables data types scopes constants library semantics existing binary naive IR generator IR optimization engine optimized binary limited optimization opportunities 10
11 Our Approach: Language-Specific Binary Optimization! General binary optimizers typically do not assume source languages which results in better applicability. e.g. Dynamo [Bala00], DynamoRIO [Bruening03], DBT86/RASP [Hertzberg11]! Our binary optimizer assumes the source language to reconstruct HLI by using the "contextual information". e.g. data structure, register usage, instruction pattern General binary optimizer: No assumption on source languages. general binary binary optimizer poor HLI optimized binary limited optimizations Our binary optimizer: Assuming the source language. COBOL binary binary optimizer optimized binary 11 contextual information rich HLI sophisticated optimizations
12 Re-construction of the Key HLI machine instructions Variable location inspection of usage: location instruction pattern abstract interpretation: data structure register usage use-define analysis: EFA data structure liveness analysis: location data structure EFA (Effective Address) scope generation of aliasing information: location scope inspection of literal pool: EFA data structure determination of routine: EFA data structure inspection of instruction pattern: variable instruction pattern constant value runtime routine analyze parameter: semantics of routine parameter information data type aliasing original operation 12
13 Abstract Interpretation: Analysis for EFA! Take advantage of knowledge about data structures and register usages of COBOL. Machine instructions: COBOL Data structures & register usage: Abstract interpretation (a) Load R2,(R9+0x200) (b) Load R3,(R2+0x100) (c) Load R4,(R2+0x200) (d) Load R5,(R2+R4) R9 control block R13 stack heap (a) EFA=control block+off_heap R2=&heap (b) EFA=heap+0x100 R3=unknown (c) EFA=heap+0x200 R4=unknown (d) EFA=heap+unknown R5=unknown R12 label table literal pool 13 Without the contextual information, we cannot tell EFA which is a basis of many other HLI.
14 Four Major Optimizations! There are four significant optimizations which are newly introduced to the latest source code compiler. 1. Type Reduction of Decimal Type. 2. Strength Reduction of Edited Moves. 3. Skipping Truncations of Binary Numbers. 4. Inlining Efficient Version of Routines. 14 higher is better original binary 180.0% 160.0% 140.0% 120.0% 100.0% 80.0% 60.0% 40.0% 20.0% 0.0% four optimizations Relative Performance 100.0% naive translation w/o optimizations 62.3% 90.0% 155.2% cob42.opt binopt.o0 binopt.o1 cob5.dev naive translation with optimizations source code re-compilation
15 Optimization 1: Type Reduction of Decimal Type (1/2)! Convert decimal types to decimal floating-point (DFP) types. Computation on decimal type: memory-to-memory Computation on DFP type: register-to-register Representation of integer number 87: binary integer type " 87 zoned decimal type " 0xF8F7 packed decimal type " 0x087F Business applications perform a lot of decimal computations. Simple Example: COBOL: ZonedDecimal A, B; A = A + B HLI about variables is necessary to enable the type reduction. Original instruction: ZonedToPacked (R13+272, 5 bytes), (R2+0, 8 bytes) ByteOR (R13+276), 0xf ZonedToPacked (R13+280, 5 bytes), (R2+8, 8 bytes) ByteOR (R13+284),0xf PackedAdd (R13+272, 5bytes), (R13+280, 5 bytes) PackedToZoned (R2+0, 8 bytes), (R13+272, 5 bytes) 15
16 Optimization 1: Type Reduction of Decimal Type (2/2) Original instruction: ZonedToPacked (R13+272, 5 bytes), (R2+0, 8 bytes) ByteOR (R13+276), 0xf ZonedToPacked (R13+280, 5 bytes), (R2+8, 8 bytes) ByteOR (R13+284),0xf PackedAdd (R13+272, 5bytes), (R13+280, 5 bytes) PackedToZoned (R2+0, 8 bytes), (R13+272, 5 bytes) 16 EFA information use-def analysis liveness analysis Use-Def Analysis: *D=DEF, U=Use ZonedToPacked D ByteOR UD ZonedToPacked D ByteOR UD PackedAdd PackedToZoned UD U U Variables: TMP1 TMP2 A B U D U Optimized instruction: ZonedToDFP FPR1,*(A) ZonedToDFP FPR2,*(B) DFPAdd FPR1,FPR2 DFPToZoned FPR1,*(A) Type Reduction ZonedToPacked TMP1, A PackedSetSign TMP1, plus ZonedToPacked TMP2, B PackedSetSign TMP2, plus PackedAdd TMP1, TMP2 PackedToZoned TMP1, A Re-constructed HLI about variables
17 Optimization 2: Strength Reduction of Edited Moves! Generate a specialized instruction sequence for a string format operation in COBOL. Simple Example: COBOL: * FOO: string of the format MOVE 123 TO FOO. " FOO=0123 Original instruction: EDIT control-bytes,numeric EDIT is a millicode instruction which interprets each byte of the control-bytes. Re-construct HLI about the original operation from control-bytes. Optimized instruction: ConvToZoned TMP, numeric CopyZoned FOO, TMP 17
18 Optimization 3: Skipping Truncations of Binary Numbers! Generate guard code to skip unnecessary truncation of binary integer numbers. Simple Example: COBOL: * FOO: binary number with 4 digits. FOO=FOO Original code: TMP=FOO+1000 FOO=TMP%10000 Divide instruction is unconditionally executed to suppress overflows. Overflow is rare. Re-construct HLI about the original operation: Quot is not in use. Dividend is a constant of Pow10. Optimized code: TMP=FOO+1000 if abs(tmp) >= TMP=TMP%10000 FOO=TMP 18
19 Optimization 4: Inlining Efficient Version of Routines! Replace a runtime routine with the equivalent code which exploits a more efficient algorithm and newer instructions. Simple Example: COBOL: * FOO, BAR: large packed decimal numbers (e.g. 18 digits). FOO=FOO/BAR Original instruction: CALL LargeDivide Runtime path length is more than 100 instructions. Exploit new DFP instructions. Re-construct HLI about the original operation: Semantics of the runtime routine. Parameter information. Optimized instruction: PackedToDFP FPR1,FOO PackedToDFP FPR2,BAR DFPDivide FPR1,FPR2 DFPRound FPR1 DFPToPacked FPR1,FOO 19
20 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 20
21 Experimental Environment! Machine: 5.5 GHz zec12 running z/os.! Benchmark: IBM Internal benchmark suite for the development of the COBOL compilers.! Binary optimizer: Development version of IBM Automatic Binary Optimizer (2015).! Original source code compiler: IBM Enterprise COBOL V4R2 (2009).! Latest source code compiler: Development version of IBM Enterprise COBOL V5 (2015). 21 cevalnd amort6 Decimal Benchmarks accoun*ng computa*ons. mortgage payment comparisons and amor*za*on computa*ons (external FP). djpcom85 record processing. seqpr ixseq upda*ng sequen*al file records. upda*ng indexed VSAM file records. telco invupd amort2 atm06 Non- Decimal Benchmarks telecom system. upda*ng sales master records. mortgage payment comparisons and amor*za*on computa*ons (long FP). automa*c teller machine transac*ons.
22 Performance Impact of Each Optimization a. O0 (no optimization) " -37.7% b. O1 (without HLI) " -10.0% c. b. + Skipping truncation " -1.4% d. c. + Type reduction " +25.4% e. d. + Optimization of Edited move " +36.7% f. e. + inlining routines (full) " +40.1% Improved performance from -10.0% to +40.1% by re-constructing HLI.! The baseline (100%) is the performance of the original binaries. Original binaries are compiled with the highest optimization-level of the old compiler. decimal benchmarks non-decimal benchmarks 300.0%$ higher is better 22 Rela?ve$Performance$ 250.0%$ 200.0%$ 150.0%$ 100.0%$ 50.0%$ 0.0%$ cevalnd$ amort6$ djpcom85$ seqpr$ ixseq$ telco$ invupd$ amort2$ atm06$ geo.$mean$ a.$ b.$ c.$ d.$ e.$ f.$ g.$
23 Performance Comparisons with the Re-compilation f. e. + inlining routines (full) " +40.1% g. latest source code compiler " +55.2% Good competitiveness of the binary optimizer relative to the state-of-theart source code compiler.! The baseline (100%) is the performance of the original binaries. Original binaries are compiled with the highest optimization-level of the old compiler. decimal benchmarks non-decimal benchmarks 300.0%$ higher is better 23 Rela?ve$Performance$ 250.0%$ 200.0%$ 150.0%$ 100.0%$ 50.0%$ 0.0%$ cevalnd$ amort6$ djpcom85$ seqpr$ ixseq$ telco$ invupd$ amort2$ atm06$ geo.$mean$ a.$ b.$ c.$ d.$ e.$ f.$ g.$
24 Performance Comparisons with the Re-compilation f. e. + inlining routines (full) " +40.1% g. latest source code compiler " +55.2% Can get even closer by: Enabling loop optimizations. Extend range of analysis of runtime routines.! The baseline (100%) is the performance of the original binaries. Original binaries are compiled with the highest optimization-level of the old compiler. decimal benchmarks non-decimal benchmarks 300.0%$ higher is better 24 Rela?ve$Performance$ 250.0%$ 200.0%$ 150.0%$ 100.0%$ 50.0%$ 0.0%$ cevalnd$ amort6$ djpcom85$ seqpr$ ixseq$ telco$ invupd$ amort2$ atm06$ geo.$mean$ a.$ b.$ c.$ d.$ e.$ f.$ g.$
25 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 25
26 Conclusion Language-specific binary optimizer can achieve competitive performance relative to the state-of-the-art source compiler. source code latest compiler re-compiled binary earlier compiler Achieved +40.1% by our binary optimizer vs % by the latest source compiler. original binary binary optimizer optimized binary language-specific contextual info. 26
27 Questions? 27
A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler
A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April
More informationTrace-based JIT Compilation
Trace-based JIT Compilation Hiroshi Inoue, IBM Research - Tokyo 1 Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace,
More informationLecture 9 Dynamic Compilation
Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method
More informationMake Your C/C++ and PL/I Code FLY With the Right Compiler Options
Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Visda Vokhshoori/Peter Elderon IBM Corporation Session 13790 Insert Custom Session QR if Desired. WHAT does good application performance
More informationReducing CPU Usage for Critical Applications with IBM s Cutting-Edge COBOL Offerings
Reducing CPU Usage for Critical Applications with IBM s Cutting-Edge COBOL Offerings Roland Koo, Offering Manager, COBOL, ABO and Node.js on z/os DevOps for IBM Z Virtual Conference July 25 27 Disclaimer
More informationIBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2
Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Third edition (January 2018) This edition applies to Version 6 Release 2 of IBM Enterprise COBOL
More informationCOBOL performance: Myths and Realities
COBOL performance: Myths and Realities Speaker Name: Tom Ross Speaker Company: IBM Date of Presentation: August 10, 2011 Session Number: 9655 Agenda Performance of COBOL compilers - myths and realities
More informationSoftware Implementation of Decimal Floating-Point
0 Software Implementation of Decimal Floating-Point John Harrison johnh@ichips.intel.com JNAO, 1st June 2006 1 Decimal and binary arithmetic Some early computers like ENIAC used decimal. But nowadays:
More informationM1 Computers and Data
M1 Computers and Data Module Outline Architecture vs. Organization. Computer system and its submodules. Concept of frequency. Processor performance equation. Representation of information characters, signed
More informationEnterprise COBOL V6.1: What s New? Tom Ross Captain COBOL February 29
Enterprise COBOL V6.1: What s New? Tom Ross Captain COBOL February 29 What new features are in Enterprise COBOL V6? Improved compiler capacity to allow compilation and optimization of very large COBOL
More informationExercise: Using Numbers
Exercise: Using Numbers Problem: You are a spy going into an evil party to find the super-secret code phrase (made up of letters and spaces), which you will immediately send via text message to your team
More informationMultiple Choice Type Questions
Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Computer Architecture Subject Code: CS 403 Multiple Choice Type Questions 1. SIMD represents an organization that.
More informationS16150: What s New in COBOL Version 5 since GA
S16150: What s New in COBOL Version 5 since GA Tom Ross IBM Aug 4, 2014 1 Title: What's new in COBOL v5 since GA Refresher about COBOL V5 requirements Service updates Improved compatibility New Function
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationField Analysis. Last time Exploit encapsulation to improve memory system performance
Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More information55:132/22C:160, HPCA Spring 2011
55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is
More informationIBM. Performance Tuning Guide. Enterprise COBOL for z/os. Version 6.1
Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.1 Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.1 Note Before using this information and the product it supports, be
More informationSemantic actions for expressions
Semantic actions for expressions Semantic actions Semantic actions are routines called as productions (or parts of productions) are recognized Actions work together to build up intermediate representations
More informationIntroduction. L25: Modern Compiler Design
Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript
More informationNovember IBM XL C/C++ Compilers Insights on Improving Your Application
November 2010 IBM XL C/C++ Compilers Insights on Improving Your Application Page 1 Table of Contents Purpose of this document...2 Overview...2 Performance...2 Figure 1:...3 Figure 2:...4 Exploiting the
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationWhy Study Assembly Language?
Why Study Assembly Language? This depends on the decade in which you studied assembly language. 1940 s You cannot study assembly language. It does not exist yet. 1950 s You study assembly language because,
More informationS Coding in COBOL for optimum performance
S16613 - Coding in COBOL for optimum performance Tom Ross IBM March 4, 2015 Insert Custom Session QR if Desired. Title: Coding in COBOL for optimum performance Compiler options Dealing with data types
More informationCrusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor
Crusoe Reference Thinking Outside the Box The Transmeta Crusoe Processor 55:132/22C:160 High Performance Computer Architecture The Technology Behind Crusoe Processors--Low-power -Compatible Processors
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationName, Scope, and Binding. Outline [1]
Name, Scope, and Binding In Text: Chapter 3 Outline [1] Variable Binding Storage bindings and lifetime Type bindings Type Checking Scope Lifetime vs. Scope Referencing Environments N. Meng, S. Arthur 2
More informationSIMD Exploitation in (JIT) Compilers
SIMD Exploitation in (JIT) Compilers Hiroshi Inoue, IBM Research - Tokyo 1 What s SIMD? Single Instruction Multiple Data Same operations applied for multiple elements in a vector register input 1 A0 input
More informationAn Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1
An Overview to Compiler Design 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1 Outline An Overview of Compiler Structure Front End Middle End Back End 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 2 Reading
More informationAdaptive SMT Control for More Responsive Web Applications
Adaptive SMT Control for More Responsive Web Applications Hiroshi Inoue and Toshio Nakatani IBM Research Tokyo University of Tokyo Oct 27, 2014 IISWC @ Raleigh, NC, USA Response time matters! Peak throughput
More informationAccelerating Ruby with LLVM
Accelerating Ruby with LLVM Evan Phoenix Oct 2, 2009 RUBY RUBY Strongly, dynamically typed RUBY Unified Model RUBY Everything is an object RUBY 3.class # => Fixnum RUBY Every code context is equal RUBY
More informationSupporting the new IBM z13 mainframe and its SIMD vector unit
Supporting the new IBM z13 mainframe and its SIMD vector unit Dr. Ulrich Weigand Senior Technical Staff Member GNU/Linux Compilers & Toolchain Date: Apr 13, 2015 2015 IBM Corporation Agenda IBM z13 Vector
More informationIBM. Performance Tuning Guide. Enterprise COBOL for z/os. Version 6.2 SC
Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.2 SC27-9202-00 Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.2 SC27-9202-00 Note Before using this information and the
More informationMain Memory Organization
Main Memory Organization Bit Smallest piece of memory Stands for binary digit Has values 0 (off) or 1 (on) Byte Is 8 consecu>ve bits Word Usually 4 consecu>ve bytes Has an address 8 bits 0 1 1 0 0 1 1
More informationSE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms
SE352b Software Engineering Design Tools W3: Programming Paradigms Feb. 3, 2005 SE352b, ECE,UWO, Hamada Ghenniwa SE352b: Roadmap CASE Tools: Introduction System Programming Tools Programming Paradigms
More informationNotos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation
Schützenbahn 70 45127 Essen, Germany Notos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation Robert Sauter, Sascha Jungen, Richard Figura, and Pedro José Marrón, Germany
More informationCompiling Java For High Performance on Servers
Compiling Java For High Performance on Servers Ken Kennedy Center for Research on Parallel Computation Rice University Goal: Achieve high performance without sacrificing language compatibility and portability.
More informationCS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic
CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Nick Weaver & John Wawrzynek http://inst.eecs.berkeley.edu/~cs61c/sp18 3/16/18 Spring 2018 Lecture #17
More informationChapter 5 Names, Binding, Type Checking and Scopes
Chapter 5 Names, Binding, Type Checking and Scopes Names - We discuss all user-defined names here - Design issues for names: -Maximum length? - Are connector characters allowed? - Are names case sensitive?
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,
More informationCOMPUTER ORGANIZATION & ARCHITECTURE
COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5a 1 What are Instruction Sets The complete collection of instructions that are understood by a CPU Can be considered as a functional
More informationIntroduction to Multicore Programming
Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming
More informationDivide: Paper & Pencil
Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or
More informationBuffer overflow prevention, and other attacks
Buffer prevention, and other attacks Comp Sci 3600 Security Outline 1 2 Two approaches to buffer defense Aim to harden programs to resist attacks in new programs Run time Aim to detect and abort attacks
More informationOutline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors
CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 Outline Defining Performance
More informationCS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic
CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 10/24/17 Fall 2017-- Lecture
More informationWhy is the CPU Time For a Job so Variable?
Why is the CPU Time For a Job so Variable? Cheryl Watson, Frank Kyne Watson & Walker, Inc. www.watsonwalker.com technical@watsonwalker.com August 5, 2014, Session 15836 Insert Custom Session QR if Desired.
More informationThe Present and Future of Large Memory in DB2
The Present and Future of Large Memory in DB2 John B. Tobler Senior Technical Staff Member DB2 for z/os, IBM Michael Schultz Advisory Software Engineer DB2 for z/os, IBM Monday August 12, 2013 3:00PM -
More informationGPU Floating Point Features
CSE 591: GPU Programming Floating Point Considerations Klaus Mueller Computer Science Department Stony Brook University Objective To understand the fundamentals of floating-point representation To know
More informationPhase-based Adaptive Recompilation in a JVM
Phase-based Adaptive Recompilation in a JVM Dayong Gu Clark Verbrugge Sable Research Group, School of Computer Science McGill University, Montréal, Canada {dgu1, clump}@cs.mcgill.ca April 7, 2008 Sable
More informationWhat is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573
What is a compiler? Xiaokang Qiu Purdue University ECE 573 August 21, 2017 What is a compiler? What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g.,
More informationG Programming Languages - Fall 2012
G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter
More informationChapter 5. Names, Bindings, and Scopes
Chapter 5 Names, Bindings, and Scopes Chapter 5 Topics Introduction Names Variables The Concept of Binding Scope Scope and Lifetime Referencing Environments Named Constants 1-2 Introduction Imperative
More informationEnterprise COBOL. B Batch Compilation...7:14-15 BINARY... 9:41 BINARY (COMP or COMP-4)...9:39-40 Bit Manipulation Routines... 7:45
A Accessing XML Documents...4:8-9 Addressing: 24 versus 31 Bit... 6:3 AIXBLD... 9:20 AMODE... 6:4 ARITH - EXTEND or COMPAT... 9:4 Assignment... 2:10 Automatic Date Recognition... 8:4 AWO or NOAWO... 9:5
More information231 Spring Final Exam Name:
231 Spring 2010 -- Final Exam Name: No calculators. Matching. Indicate the letter of the best description. (1 pt. each) 1. b address 2. d object code 3. g condition code 4. i byte 5. k ASCII 6. m local
More informationIBM Education Assistance for z/os V2R2
IBM Education Assistance for z/os V2R2 Item: RSM Scalability Element/Component: Real Storage Manager Material current as of May 2015 IBM Presentation Template Full Version Agenda Trademarks Presentation
More informationElevating Application Performance with Latest IBM COBOL Offerings. Tom Ross Captain COBOL March 9, 20017
Elevating Application Performance with Latest IBM COBOL Offerings Tom Ross Captain COBOL March 9, 20017 Agenda Why the need to stay current with compiler technology? COBOL V6.1 ABO V1.2 Benefits of Using
More informationChapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST
Chapter 2. Instruction Set Principles and Examples In-Cheol Park Dept. of EE, KAIST Stack architecture( 0-address ) operands are on the top of the stack Accumulator architecture( 1-address ) one operand
More informationIBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation
数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 Trace Compilation Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace, a hot path identified
More informationYETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto)
YETI GraduallY Extensible Trace Interpreter Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto) VEE 2007 1 Goal Create a VM that is more easily extended with a just
More informationCOBOL for AIX, Version 4.1
software Application development for today s changing marketplace COBOL for AIX, Version 4.1 To remain competitive, you need a complete business strategy to help you modernize, integrate, and manage existing
More informationOperational Semantics. One-Slide Summary. Lecture Outline
Operational Semantics #1 One-Slide Summary Operational semantics are a precise way of specifying how to evaluate a program. A formal semantics tells you what each expression means. Meaning depends on context:
More informationLecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Notation. The rules. Evaluation Rules So Far.
Lecture Outline Operational Semantics of Cool COOL operational semantics Motivation Adapted from Lectures by Profs. Alex Aiken and George Necula (UCB) Notation The rules CS781(Prasad) L24CG 1 CS781(Prasad)
More informationThe latest IBM Z COBOL compiler: Enterprise COBOL V6.2! Tom Ross Captain COBOL SHARE Providence August 7,2017
The latest IBM Z COBOL compiler: Enterprise COBOL V6.2! Tom Ross Captain COBOL SHARE Providence August 7,2017 1 COBOL V6.2? YES! The 4 th release of the new generation of IBM Z COBOL compilers Announced:
More informationOperational Semantics of Cool
Operational Semantics of Cool Key Concepts semantics: the meaning of a program, what does program do? how the code is executed? operational semantics: high level code generation steps of calculating values
More informationWritten Homework 3. Floating-Point Example (1/2)
Written Homework 3 Assigned on Tuesday, Feb 19 Due Time: 11:59pm, Feb 26 on Tuesday Problems: 3.22, 3.23, 3.24, 3.41, 3.43 Note: You have 1 week to work on homework 3. 3 Floating-Point Example (1/2) Q:
More informationRISC Architecture Ch 12
RISC Architecture Ch 12 Some History Instruction Usage Characteristics Large Register Files Register Allocation Optimization RISC vs. CISC 18 Original Ideas Behind CISC (Complex Instruction Set Comp.)
More informationOperating system integrated energy aware scratchpad allocation strategies for multiprocess applications
University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel
More informationSurvey. Motivation 29.5 / 40 class is required
Survey Motivation 29.5 / 40 class is required Concerns 6 / 40 not good at examination That s why we have 3 examinations 6 / 40 this class sounds difficult 8 / 40 understand the instructor Want class to
More informationJAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder
JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance
More informationG Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University
G22.2110-001 Programming Languages Spring 2010 Lecture 4 Robert Grimm, New York University 1 Review Last week Control Structures Selection Loops 2 Outline Subprograms Calling Sequences Parameter Passing
More informationCS 415 Midterm Exam Spring 2002
CS 415 Midterm Exam Spring 2002 Name KEY Email Address Student ID # Pledge: This exam is closed note, closed book. Good Luck! Score Fortran Algol 60 Compilation Names, Bindings, Scope Functional Programming
More informationLecture 3: C Programm
0 3 E CS 1 Lecture 3: C Programm ing Reading Quiz Note the intimidating red border! 2 A variable is: A. an area in memory that is reserved at run time to hold a value of particular type B. an area in memory
More informationSIGNED AND UNSIGNED SYSTEMS
EE 357 Unit 1 Fixed Point Systems and Arithmetic Learning Objectives Understand the size and systems used by the underlying HW when a variable is declared in a SW program Understand and be able to find
More informationThread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications
Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Janghaeng Lee, Haicheng Wu, Madhumitha Ravichandran, Nathan Clark Motivation Hardware Trends Put more cores
More informationLECTURE 19. Subroutines and Parameter Passing
LECTURE 19 Subroutines and Parameter Passing ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments behind a simple name. Data abstraction: hide data
More informationItem A The first line contains basic information about the dump including its code, transaction identifier and dump identifier. The Symptom string
1 2 Item A The first line contains basic information about the dump including its code, transaction identifier and dump identifier. The Symptom string contains information which is normally used to perform
More informationzcobol System Programmer s Guide v1.5.06
zcobol System Programmer s Guide v1.5.06 Automated Software Tools Corporation. zc390 Translator COBOL Language Verb Macros COMPUTE Statement Example zcobol Target Source Language Generation Macros ZC390LIB
More informationTransparent Pointer Compression for Linked Data Structures
Transparent Pointer Compression for Linked Data Structures lattner@cs.uiuc.edu Vikram Adve vadve@cs.uiuc.edu June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu llvm.cs.uiuc.edu/ Growth of 64-bit computing
More informationCS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic
CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/25/16 Fall 2016 -- Lecture #17
More informationCIS 110: Introduction to Computer Programming
CIS 110: Introduction to Computer Programming Lecture 3 Express Yourself ( 2.1) 9/16/2011 CIS 110 (11fa) - University of Pennsylvania 1 Outline 1. Data representation and types 2. Expressions 9/16/2011
More informationIBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2
Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 November 2018 This edition applies to Version 6 Release 2 of IBM Enterprise COBOL for z/os (program
More informationECE 468, Fall Midterm 2
ECE 468, Fall 08. Midterm INSTRUCTIONS (read carefully) Fill in your name and PUID. NAME: PUID: Please sign the following: I affirm that the answers given on this test are mine and mine alone. I did not
More informationgpucc: An Open-Source GPGPU Compiler
gpucc: An Open-Source GPGPU Compiler Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt One-Slide Overview Motivation
More informationOutline. L9: Project Discussion and Floating Point Issues. Project Parts (Total = 50%) Project Proposal (due 3/8) 2/13/12.
Outline L9: Project Discussion and Floating Point Issues Discussion of semester projects Floating point Mostly single precision until recent architectures Accuracy What s fast and what s not Reading: Ch
More informationgpucc: An Open-Source GPGPU Compiler
gpucc: An Open-Source GPGPU Compiler Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt One-Slide Overview Motivation
More informationC Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:
C Programming Code: MBD101 Duration: 10 Hours Prerequisites: You are a computer science Professional/ graduate student You can execute Linux/UNIX commands You know how to use a text-editing tool You should
More informationMulti-threaded Queries. Intra-Query Parallelism in LLVM
Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted
More informationPerformance Profiling
Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance
More informationMethod-Level Phase Behavior in Java Workloads
Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS
More informationComputer (Literacy) Skills. Number representations and memory. Lubomír Bulej KDSS MFF UK
Computer (Literacy Skills Number representations and memory Lubomír Bulej KDSS MFF UK Number representations? What for? Recall: computer works with binary numbers Groups of zeroes and ones 8 bits (byte,
More informationOn a 64-bit CPU. Size/Range vary by CPU model and Word size.
On a 64-bit CPU. Size/Range vary by CPU model and Word size. unsigned short x; //range 0 to 65553 signed short x; //range ± 32767 short x; //assumed signed There are (usually) no unsigned floats or doubles.
More informationNumber Systems and Computer Arithmetic
Number Systems and Computer Arithmetic Counting to four billion two fingers at a time What do all those bits mean now? bits (011011011100010...01) instruction R-format I-format... integer data number text
More informationMore about Binary 9/6/2016
More about Binary 9/6/2016 Unsigned vs. Two s Complement 8-bit example: 1 1 0 0 0 0 1 1 2 7 +2 6 + 2 1 +2 0 = 128+64+2+1 = 195-2 7 +2 6 + 2 1 +2 0 = -128+64+2+1 = -61 Why does two s complement work this
More informationJava on System z. IBM J9 virtual machine
Performance, Agility, and Cost Savings: IBM SDK5 and SDK6 on System z by Marcel Mitran, Levon Stepanian, and Theresa Tai Java on System z Java is a platform-agnostic (compile-once-run-anywhere), agile,
More informationThe Potentials and Challenges of Trace Compilation:
Peng Wu IBM Research January 26, 2011 The Potentials and Challenges of Trace Compilation: Lessons learned from building a trace-jit on top of J9 JVM (Joint work with Hiroshige Hayashizaki and Hiroshi Inoue,
More informationCode Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University
Code Generation Frédéric Haziza Department of Computer Systems Uppsala University Spring 2008 Operating Systems Process Management Memory Management Storage Management Compilers Compiling
More informationPrinciples of Programming Languages. Lecture Outline
Principles of Programming Languages CS 492 Lecture 1 Based on Notes by William Albritton 1 Lecture Outline Reasons for studying concepts of programming languages Programming domains Language evaluation
More informationWhere We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.
Where We Are Source Code Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Machine Code Where We Are Source Code Lexical Analysis Syntax Analysis
More informationUniversity of Wisconsin-Madison
Evolving RPC for Active Storage Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin-Madison Architecture of the future Everything is active Cheaper, faster processing
More information