Re-constructing High-Level Information for Language-specific Binary Re-optimization

Size: px
Start display at page:

Download "Re-constructing High-Level Information for Language-specific Binary Re-optimization"

Transcription

1 CGO 2016 Re-constructing High-Level Information for Language-specific Binary Re-optimization Toshihiko Koju IBM Research - Tokyo Reid Copeland IBM Canada Motohiro Kawahito IBM Research - Tokyo Moriyoshi Ohara IBM Research - Tokyo 1

2 Summary Language-specific binary optimizer can achieve competitive performance relative to the state-of-the-art source compiler. source code latest compiler re-compiled binary earlier compiler Achieved +40.1% by our binary optimizer vs % by the latest source compiler. original binary binary optimizer optimized binary language-specific contextual info. 2

3 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 3

4 Compilation Technologies are becoming Increasingly Important! Recent processors are equipped with specialized HW units. e.g. SIMD, FPGA, GPU, DFP (Decimal Floating-Point)! Re-compilation with the latest compilation technologies is typically required to exploit them. single-thread perf. of CPU downlevel processor with specialized HW units w/o specialized HW units CPU generation re-compilation Specialized HW units complement diminishing single-thread performance improvements. latest processor DFP SIMD FPGA GPU 4 existing application latest compiler re-compiled application

5 Compiler Migration is Often Challenging for Large Enterprise Customers! Adapting to changes in the following can be a significant undertaking: language, compiler option, build environment, runtime environment downlevel processor changes in language behavior, compiler options, build environment, runtime environment compiler migration re-compilation latest processor DFP SIMD FPGA GPU existing application latest compiler re-compiled application 5

6 Ease Migration by Binary Optimization! We have developed a production-level binary optimizer for mainframe COBOL applications. IBM Automatic Binary Optimizer for z/os.! The binary optimizer utilizes the same optimization engine of the latest source code COBOL compiler. Source code re-compilation: compiler migration changes in language behavior, compiler options, build environment, runtime environment source code Binary re-optimization: existing binary 6 latest compiler Provide strong compatibility with the current environment. binary optimizer re-compiled binary optimized binary

7 Traditional Approach does not Work! Naive translation from machine instructions to IR (intermediate Representation) degraded the performance. This is typically for binary translators based on general compiler engines. original binary naive translation w/o optimizations naive translation with optimizations 7 higher is better Relative Performance 180.0% 160.0% 140.0% 120.0% 100.0% 80.0% 60.0% 40.0% 20.0% 0.0% 155.2% 100.0% 90.0% 62.3% cob42.opt binopt.o0 binopt.o1 cob5.dev source code re-compilation

8 Our Goal! Achieve competitive performance relative to the state-of-theart source code compiler. original binary naive translation w/o optimizations naive translation with optimizations 8 higher is better Relative Performance 180.0% 160.0% 140.0% 120.0% 100.0% 80.0% 60.0% 40.0% 20.0% 0.0% 155.2% fill in the gap 100.0% 90.0% 62.3% cob42.opt binopt.o0 binopt.o1 cob5.dev source code re-compilation our results

9 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 9

10 We Need High-level Information to Optimize Binaries! Compiler optimizations require "HLI" (high-level information). e.g. variables, data types, scopes, constants, library semantics! Only limited HLI is provided by the naive translation. Resulting in limited optimization opportunities. poor HLI (high-level information): variables data types scopes constants library semantics existing binary naive IR generator IR optimization engine optimized binary limited optimization opportunities 10

11 Our Approach: Language-Specific Binary Optimization! General binary optimizers typically do not assume source languages which results in better applicability. e.g. Dynamo [Bala00], DynamoRIO [Bruening03], DBT86/RASP [Hertzberg11]! Our binary optimizer assumes the source language to reconstruct HLI by using the "contextual information". e.g. data structure, register usage, instruction pattern General binary optimizer: No assumption on source languages. general binary binary optimizer poor HLI optimized binary limited optimizations Our binary optimizer: Assuming the source language. COBOL binary binary optimizer optimized binary 11 contextual information rich HLI sophisticated optimizations

12 Re-construction of the Key HLI machine instructions Variable location inspection of usage: location instruction pattern abstract interpretation: data structure register usage use-define analysis: EFA data structure liveness analysis: location data structure EFA (Effective Address) scope generation of aliasing information: location scope inspection of literal pool: EFA data structure determination of routine: EFA data structure inspection of instruction pattern: variable instruction pattern constant value runtime routine analyze parameter: semantics of routine parameter information data type aliasing original operation 12

13 Abstract Interpretation: Analysis for EFA! Take advantage of knowledge about data structures and register usages of COBOL. Machine instructions: COBOL Data structures & register usage: Abstract interpretation (a) Load R2,(R9+0x200) (b) Load R3,(R2+0x100) (c) Load R4,(R2+0x200) (d) Load R5,(R2+R4) R9 control block R13 stack heap (a) EFA=control block+off_heap R2=&heap (b) EFA=heap+0x100 R3=unknown (c) EFA=heap+0x200 R4=unknown (d) EFA=heap+unknown R5=unknown R12 label table literal pool 13 Without the contextual information, we cannot tell EFA which is a basis of many other HLI.

14 Four Major Optimizations! There are four significant optimizations which are newly introduced to the latest source code compiler. 1. Type Reduction of Decimal Type. 2. Strength Reduction of Edited Moves. 3. Skipping Truncations of Binary Numbers. 4. Inlining Efficient Version of Routines. 14 higher is better original binary 180.0% 160.0% 140.0% 120.0% 100.0% 80.0% 60.0% 40.0% 20.0% 0.0% four optimizations Relative Performance 100.0% naive translation w/o optimizations 62.3% 90.0% 155.2% cob42.opt binopt.o0 binopt.o1 cob5.dev naive translation with optimizations source code re-compilation

15 Optimization 1: Type Reduction of Decimal Type (1/2)! Convert decimal types to decimal floating-point (DFP) types. Computation on decimal type: memory-to-memory Computation on DFP type: register-to-register Representation of integer number 87: binary integer type " 87 zoned decimal type " 0xF8F7 packed decimal type " 0x087F Business applications perform a lot of decimal computations. Simple Example: COBOL: ZonedDecimal A, B; A = A + B HLI about variables is necessary to enable the type reduction. Original instruction: ZonedToPacked (R13+272, 5 bytes), (R2+0, 8 bytes) ByteOR (R13+276), 0xf ZonedToPacked (R13+280, 5 bytes), (R2+8, 8 bytes) ByteOR (R13+284),0xf PackedAdd (R13+272, 5bytes), (R13+280, 5 bytes) PackedToZoned (R2+0, 8 bytes), (R13+272, 5 bytes) 15

16 Optimization 1: Type Reduction of Decimal Type (2/2) Original instruction: ZonedToPacked (R13+272, 5 bytes), (R2+0, 8 bytes) ByteOR (R13+276), 0xf ZonedToPacked (R13+280, 5 bytes), (R2+8, 8 bytes) ByteOR (R13+284),0xf PackedAdd (R13+272, 5bytes), (R13+280, 5 bytes) PackedToZoned (R2+0, 8 bytes), (R13+272, 5 bytes) 16 EFA information use-def analysis liveness analysis Use-Def Analysis: *D=DEF, U=Use ZonedToPacked D ByteOR UD ZonedToPacked D ByteOR UD PackedAdd PackedToZoned UD U U Variables: TMP1 TMP2 A B U D U Optimized instruction: ZonedToDFP FPR1,*(A) ZonedToDFP FPR2,*(B) DFPAdd FPR1,FPR2 DFPToZoned FPR1,*(A) Type Reduction ZonedToPacked TMP1, A PackedSetSign TMP1, plus ZonedToPacked TMP2, B PackedSetSign TMP2, plus PackedAdd TMP1, TMP2 PackedToZoned TMP1, A Re-constructed HLI about variables

17 Optimization 2: Strength Reduction of Edited Moves! Generate a specialized instruction sequence for a string format operation in COBOL. Simple Example: COBOL: * FOO: string of the format MOVE 123 TO FOO. " FOO=0123 Original instruction: EDIT control-bytes,numeric EDIT is a millicode instruction which interprets each byte of the control-bytes. Re-construct HLI about the original operation from control-bytes. Optimized instruction: ConvToZoned TMP, numeric CopyZoned FOO, TMP 17

18 Optimization 3: Skipping Truncations of Binary Numbers! Generate guard code to skip unnecessary truncation of binary integer numbers. Simple Example: COBOL: * FOO: binary number with 4 digits. FOO=FOO Original code: TMP=FOO+1000 FOO=TMP%10000 Divide instruction is unconditionally executed to suppress overflows. Overflow is rare. Re-construct HLI about the original operation: Quot is not in use. Dividend is a constant of Pow10. Optimized code: TMP=FOO+1000 if abs(tmp) >= TMP=TMP%10000 FOO=TMP 18

19 Optimization 4: Inlining Efficient Version of Routines! Replace a runtime routine with the equivalent code which exploits a more efficient algorithm and newer instructions. Simple Example: COBOL: * FOO, BAR: large packed decimal numbers (e.g. 18 digits). FOO=FOO/BAR Original instruction: CALL LargeDivide Runtime path length is more than 100 instructions. Exploit new DFP instructions. Re-construct HLI about the original operation: Semantics of the runtime routine. Parameter information. Optimized instruction: PackedToDFP FPR1,FOO PackedToDFP FPR2,BAR DFPDivide FPR1,FPR2 DFPRound FPR1 DFPToPacked FPR1,FOO 19

20 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 20

21 Experimental Environment! Machine: 5.5 GHz zec12 running z/os.! Benchmark: IBM Internal benchmark suite for the development of the COBOL compilers.! Binary optimizer: Development version of IBM Automatic Binary Optimizer (2015).! Original source code compiler: IBM Enterprise COBOL V4R2 (2009).! Latest source code compiler: Development version of IBM Enterprise COBOL V5 (2015). 21 cevalnd amort6 Decimal Benchmarks accoun*ng computa*ons. mortgage payment comparisons and amor*za*on computa*ons (external FP). djpcom85 record processing. seqpr ixseq upda*ng sequen*al file records. upda*ng indexed VSAM file records. telco invupd amort2 atm06 Non- Decimal Benchmarks telecom system. upda*ng sales master records. mortgage payment comparisons and amor*za*on computa*ons (long FP). automa*c teller machine transac*ons.

22 Performance Impact of Each Optimization a. O0 (no optimization) " -37.7% b. O1 (without HLI) " -10.0% c. b. + Skipping truncation " -1.4% d. c. + Type reduction " +25.4% e. d. + Optimization of Edited move " +36.7% f. e. + inlining routines (full) " +40.1% Improved performance from -10.0% to +40.1% by re-constructing HLI.! The baseline (100%) is the performance of the original binaries. Original binaries are compiled with the highest optimization-level of the old compiler. decimal benchmarks non-decimal benchmarks 300.0%$ higher is better 22 Rela?ve$Performance$ 250.0%$ 200.0%$ 150.0%$ 100.0%$ 50.0%$ 0.0%$ cevalnd$ amort6$ djpcom85$ seqpr$ ixseq$ telco$ invupd$ amort2$ atm06$ geo.$mean$ a.$ b.$ c.$ d.$ e.$ f.$ g.$

23 Performance Comparisons with the Re-compilation f. e. + inlining routines (full) " +40.1% g. latest source code compiler " +55.2% Good competitiveness of the binary optimizer relative to the state-of-theart source code compiler.! The baseline (100%) is the performance of the original binaries. Original binaries are compiled with the highest optimization-level of the old compiler. decimal benchmarks non-decimal benchmarks 300.0%$ higher is better 23 Rela?ve$Performance$ 250.0%$ 200.0%$ 150.0%$ 100.0%$ 50.0%$ 0.0%$ cevalnd$ amort6$ djpcom85$ seqpr$ ixseq$ telco$ invupd$ amort2$ atm06$ geo.$mean$ a.$ b.$ c.$ d.$ e.$ f.$ g.$

24 Performance Comparisons with the Re-compilation f. e. + inlining routines (full) " +40.1% g. latest source code compiler " +55.2% Can get even closer by: Enabling loop optimizations. Extend range of analysis of runtime routines.! The baseline (100%) is the performance of the original binaries. Original binaries are compiled with the highest optimization-level of the old compiler. decimal benchmarks non-decimal benchmarks 300.0%$ higher is better 24 Rela?ve$Performance$ 250.0%$ 200.0%$ 150.0%$ 100.0%$ 50.0%$ 0.0%$ cevalnd$ amort6$ djpcom85$ seqpr$ ixseq$ telco$ invupd$ amort2$ atm06$ geo.$mean$ a.$ b.$ c.$ d.$ e.$ f.$ g.$

25 Outline! Motivation & Goal.! Our Approach. Four Major Optimizations.! Performance Evaluation.! Conclusions. 25

26 Conclusion Language-specific binary optimizer can achieve competitive performance relative to the state-of-the-art source compiler. source code latest compiler re-compiled binary earlier compiler Achieved +40.1% by our binary optimizer vs % by the latest source compiler. original binary binary optimizer optimized binary language-specific contextual info. 26

27 Questions? 27

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April

More information

Trace-based JIT Compilation

Trace-based JIT Compilation Trace-based JIT Compilation Hiroshi Inoue, IBM Research - Tokyo 1 Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace,

More information

Lecture 9 Dynamic Compilation

Lecture 9 Dynamic Compilation Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method

More information

Make Your C/C++ and PL/I Code FLY With the Right Compiler Options

Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Visda Vokhshoori/Peter Elderon IBM Corporation Session 13790 Insert Custom Session QR if Desired. WHAT does good application performance

More information

Reducing CPU Usage for Critical Applications with IBM s Cutting-Edge COBOL Offerings

Reducing CPU Usage for Critical Applications with IBM s Cutting-Edge COBOL Offerings Reducing CPU Usage for Critical Applications with IBM s Cutting-Edge COBOL Offerings Roland Koo, Offering Manager, COBOL, ABO and Node.js on z/os DevOps for IBM Z Virtual Conference July 25 27 Disclaimer

More information

IBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2

IBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Third edition (January 2018) This edition applies to Version 6 Release 2 of IBM Enterprise COBOL

More information

COBOL performance: Myths and Realities

COBOL performance: Myths and Realities COBOL performance: Myths and Realities Speaker Name: Tom Ross Speaker Company: IBM Date of Presentation: August 10, 2011 Session Number: 9655 Agenda Performance of COBOL compilers - myths and realities

More information

Software Implementation of Decimal Floating-Point

Software Implementation of Decimal Floating-Point 0 Software Implementation of Decimal Floating-Point John Harrison johnh@ichips.intel.com JNAO, 1st June 2006 1 Decimal and binary arithmetic Some early computers like ENIAC used decimal. But nowadays:

More information

M1 Computers and Data

M1 Computers and Data M1 Computers and Data Module Outline Architecture vs. Organization. Computer system and its submodules. Concept of frequency. Processor performance equation. Representation of information characters, signed

More information

Enterprise COBOL V6.1: What s New? Tom Ross Captain COBOL February 29

Enterprise COBOL V6.1: What s New? Tom Ross Captain COBOL February 29 Enterprise COBOL V6.1: What s New? Tom Ross Captain COBOL February 29 What new features are in Enterprise COBOL V6? Improved compiler capacity to allow compilation and optimization of very large COBOL

More information

Exercise: Using Numbers

Exercise: Using Numbers Exercise: Using Numbers Problem: You are a spy going into an evil party to find the super-secret code phrase (made up of letters and spaces), which you will immediately send via text message to your team

More information

Multiple Choice Type Questions

Multiple Choice Type Questions Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Computer Architecture Subject Code: CS 403 Multiple Choice Type Questions 1. SIMD represents an organization that.

More information

S16150: What s New in COBOL Version 5 since GA

S16150: What s New in COBOL Version 5 since GA S16150: What s New in COBOL Version 5 since GA Tom Ross IBM Aug 4, 2014 1 Title: What's new in COBOL v5 since GA Refresher about COBOL V5 requirements Service updates Improved compatibility New Function

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Field Analysis. Last time Exploit encapsulation to improve memory system performance Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April

More information

Red Fox: An Execution Environment for Relational Query Processing on GPUs

Red Fox: An Execution Environment for Relational Query Processing on GPUs Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

IBM. Performance Tuning Guide. Enterprise COBOL for z/os. Version 6.1

IBM. Performance Tuning Guide. Enterprise COBOL for z/os. Version 6.1 Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.1 Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.1 Note Before using this information and the product it supports, be

More information

Semantic actions for expressions

Semantic actions for expressions Semantic actions for expressions Semantic actions Semantic actions are routines called as productions (or parts of productions) are recognized Actions work together to build up intermediate representations

More information

Introduction. L25: Modern Compiler Design

Introduction. L25: Modern Compiler Design Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript

More information

November IBM XL C/C++ Compilers Insights on Improving Your Application

November IBM XL C/C++ Compilers Insights on Improving Your Application November 2010 IBM XL C/C++ Compilers Insights on Improving Your Application Page 1 Table of Contents Purpose of this document...2 Overview...2 Performance...2 Figure 1:...3 Figure 2:...4 Exploiting the

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Why Study Assembly Language?

Why Study Assembly Language? Why Study Assembly Language? This depends on the decade in which you studied assembly language. 1940 s You cannot study assembly language. It does not exist yet. 1950 s You study assembly language because,

More information

S Coding in COBOL for optimum performance

S Coding in COBOL for optimum performance S16613 - Coding in COBOL for optimum performance Tom Ross IBM March 4, 2015 Insert Custom Session QR if Desired. Title: Coding in COBOL for optimum performance Compiler options Dealing with data types

More information

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor

Crusoe Reference. What is Binary Translation. What is so hard about it? Thinking Outside the Box The Transmeta Crusoe Processor Crusoe Reference Thinking Outside the Box The Transmeta Crusoe Processor 55:132/22C:160 High Performance Computer Architecture The Technology Behind Crusoe Processors--Low-power -Compatible Processors

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Name, Scope, and Binding. Outline [1]

Name, Scope, and Binding. Outline [1] Name, Scope, and Binding In Text: Chapter 3 Outline [1] Variable Binding Storage bindings and lifetime Type bindings Type Checking Scope Lifetime vs. Scope Referencing Environments N. Meng, S. Arthur 2

More information

SIMD Exploitation in (JIT) Compilers

SIMD Exploitation in (JIT) Compilers SIMD Exploitation in (JIT) Compilers Hiroshi Inoue, IBM Research - Tokyo 1 What s SIMD? Single Instruction Multiple Data Same operations applied for multiple elements in a vector register input 1 A0 input

More information

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1 An Overview to Compiler Design 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1 Outline An Overview of Compiler Structure Front End Middle End Back End 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 2 Reading

More information

Adaptive SMT Control for More Responsive Web Applications

Adaptive SMT Control for More Responsive Web Applications Adaptive SMT Control for More Responsive Web Applications Hiroshi Inoue and Toshio Nakatani IBM Research Tokyo University of Tokyo Oct 27, 2014 IISWC @ Raleigh, NC, USA Response time matters! Peak throughput

More information

Accelerating Ruby with LLVM

Accelerating Ruby with LLVM Accelerating Ruby with LLVM Evan Phoenix Oct 2, 2009 RUBY RUBY Strongly, dynamically typed RUBY Unified Model RUBY Everything is an object RUBY 3.class # => Fixnum RUBY Every code context is equal RUBY

More information

Supporting the new IBM z13 mainframe and its SIMD vector unit

Supporting the new IBM z13 mainframe and its SIMD vector unit Supporting the new IBM z13 mainframe and its SIMD vector unit Dr. Ulrich Weigand Senior Technical Staff Member GNU/Linux Compilers & Toolchain Date: Apr 13, 2015 2015 IBM Corporation Agenda IBM z13 Vector

More information

IBM. Performance Tuning Guide. Enterprise COBOL for z/os. Version 6.2 SC

IBM. Performance Tuning Guide. Enterprise COBOL for z/os. Version 6.2 SC Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.2 SC27-9202-00 Enterprise COBOL for z/os IBM Performance Tuning Guide Version 6.2 SC27-9202-00 Note Before using this information and the

More information

Main Memory Organization

Main Memory Organization Main Memory Organization Bit Smallest piece of memory Stands for binary digit Has values 0 (off) or 1 (on) Byte Is 8 consecu>ve bits Word Usually 4 consecu>ve bytes Has an address 8 bits 0 1 1 0 0 1 1

More information

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms SE352b Software Engineering Design Tools W3: Programming Paradigms Feb. 3, 2005 SE352b, ECE,UWO, Hamada Ghenniwa SE352b: Roadmap CASE Tools: Introduction System Programming Tools Programming Paradigms

More information

Notos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation

Notos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation Schützenbahn 70 45127 Essen, Germany Notos: Efficient Emulation of Wireless Sensor Networks with Binary-to-Source Translation Robert Sauter, Sascha Jungen, Richard Figura, and Pedro José Marrón, Germany

More information

Compiling Java For High Performance on Servers

Compiling Java For High Performance on Servers Compiling Java For High Performance on Servers Ken Kennedy Center for Research on Parallel Computation Rice University Goal: Achieve high performance without sacrificing language compatibility and portability.

More information

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Nick Weaver & John Wawrzynek http://inst.eecs.berkeley.edu/~cs61c/sp18 3/16/18 Spring 2018 Lecture #17

More information

Chapter 5 Names, Binding, Type Checking and Scopes

Chapter 5 Names, Binding, Type Checking and Scopes Chapter 5 Names, Binding, Type Checking and Scopes Names - We discuss all user-defined names here - Design issues for names: -Maximum length? - Are connector characters allowed? - Are names case sensitive?

More information

Red Fox: An Execution Environment for Relational Query Processing on GPUs

Red Fox: An Execution Environment for Relational Query Processing on GPUs Red Fox: An Execution Environment for Relational Query Processing on GPUs Georgia Institute of Technology: Haicheng Wu, Ifrah Saeed, Sudhakar Yalamanchili LogicBlox Inc.: Daniel Zinn, Martin Bravenboer,

More information

COMPUTER ORGANIZATION & ARCHITECTURE

COMPUTER ORGANIZATION & ARCHITECTURE COMPUTER ORGANIZATION & ARCHITECTURE Instructions Sets Architecture Lesson 5a 1 What are Instruction Sets The complete collection of instructions that are understood by a CPU Can be considered as a functional

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming

More information

Divide: Paper & Pencil

Divide: Paper & Pencil Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or

More information

Buffer overflow prevention, and other attacks

Buffer overflow prevention, and other attacks Buffer prevention, and other attacks Comp Sci 3600 Security Outline 1 2 Two approaches to buffer defense Aim to harden programs to resist attacks in new programs Run time Aim to detect and abort attacks

More information

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 Outline Defining Performance

More information

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 10/24/17 Fall 2017-- Lecture

More information

Why is the CPU Time For a Job so Variable?

Why is the CPU Time For a Job so Variable? Why is the CPU Time For a Job so Variable? Cheryl Watson, Frank Kyne Watson & Walker, Inc. www.watsonwalker.com technical@watsonwalker.com August 5, 2014, Session 15836 Insert Custom Session QR if Desired.

More information

The Present and Future of Large Memory in DB2

The Present and Future of Large Memory in DB2 The Present and Future of Large Memory in DB2 John B. Tobler Senior Technical Staff Member DB2 for z/os, IBM Michael Schultz Advisory Software Engineer DB2 for z/os, IBM Monday August 12, 2013 3:00PM -

More information

GPU Floating Point Features

GPU Floating Point Features CSE 591: GPU Programming Floating Point Considerations Klaus Mueller Computer Science Department Stony Brook University Objective To understand the fundamentals of floating-point representation To know

More information

Phase-based Adaptive Recompilation in a JVM

Phase-based Adaptive Recompilation in a JVM Phase-based Adaptive Recompilation in a JVM Dayong Gu Clark Verbrugge Sable Research Group, School of Computer Science McGill University, Montréal, Canada {dgu1, clump}@cs.mcgill.ca April 7, 2008 Sable

More information

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573 What is a compiler? Xiaokang Qiu Purdue University ECE 573 August 21, 2017 What is a compiler? What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g.,

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

Chapter 5. Names, Bindings, and Scopes

Chapter 5. Names, Bindings, and Scopes Chapter 5 Names, Bindings, and Scopes Chapter 5 Topics Introduction Names Variables The Concept of Binding Scope Scope and Lifetime Referencing Environments Named Constants 1-2 Introduction Imperative

More information

Enterprise COBOL. B Batch Compilation...7:14-15 BINARY... 9:41 BINARY (COMP or COMP-4)...9:39-40 Bit Manipulation Routines... 7:45

Enterprise COBOL. B Batch Compilation...7:14-15 BINARY... 9:41 BINARY (COMP or COMP-4)...9:39-40 Bit Manipulation Routines... 7:45 A Accessing XML Documents...4:8-9 Addressing: 24 versus 31 Bit... 6:3 AIXBLD... 9:20 AMODE... 6:4 ARITH - EXTEND or COMPAT... 9:4 Assignment... 2:10 Automatic Date Recognition... 8:4 AWO or NOAWO... 9:5

More information

231 Spring Final Exam Name:

231 Spring Final Exam Name: 231 Spring 2010 -- Final Exam Name: No calculators. Matching. Indicate the letter of the best description. (1 pt. each) 1. b address 2. d object code 3. g condition code 4. i byte 5. k ASCII 6. m local

More information

IBM Education Assistance for z/os V2R2

IBM Education Assistance for z/os V2R2 IBM Education Assistance for z/os V2R2 Item: RSM Scalability Element/Component: Real Storage Manager Material current as of May 2015 IBM Presentation Template Full Version Agenda Trademarks Presentation

More information

Elevating Application Performance with Latest IBM COBOL Offerings. Tom Ross Captain COBOL March 9, 20017

Elevating Application Performance with Latest IBM COBOL Offerings. Tom Ross Captain COBOL March 9, 20017 Elevating Application Performance with Latest IBM COBOL Offerings Tom Ross Captain COBOL March 9, 20017 Agenda Why the need to stay current with compiler technology? COBOL V6.1 ABO V1.2 Benefits of Using

More information

Chapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST

Chapter 2. Instruction Set Principles and Examples. In-Cheol Park Dept. of EE, KAIST Chapter 2. Instruction Set Principles and Examples In-Cheol Park Dept. of EE, KAIST Stack architecture( 0-address ) operands are on the top of the stack Accumulator architecture( 1-address ) one operand

More information

IBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation

IBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術 Trace Compilation Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace, a hot path identified

More information

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto)

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto) YETI GraduallY Extensible Trace Interpreter Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto) VEE 2007 1 Goal Create a VM that is more easily extended with a just

More information

COBOL for AIX, Version 4.1

COBOL for AIX, Version 4.1 software Application development for today s changing marketplace COBOL for AIX, Version 4.1 To remain competitive, you need a complete business strategy to help you modernize, integrate, and manage existing

More information

Operational Semantics. One-Slide Summary. Lecture Outline

Operational Semantics. One-Slide Summary. Lecture Outline Operational Semantics #1 One-Slide Summary Operational semantics are a precise way of specifying how to evaluate a program. A formal semantics tells you what each expression means. Meaning depends on context:

More information

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Notation. The rules. Evaluation Rules So Far.

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Notation. The rules. Evaluation Rules So Far. Lecture Outline Operational Semantics of Cool COOL operational semantics Motivation Adapted from Lectures by Profs. Alex Aiken and George Necula (UCB) Notation The rules CS781(Prasad) L24CG 1 CS781(Prasad)

More information

The latest IBM Z COBOL compiler: Enterprise COBOL V6.2! Tom Ross Captain COBOL SHARE Providence August 7,2017

The latest IBM Z COBOL compiler: Enterprise COBOL V6.2! Tom Ross Captain COBOL SHARE Providence August 7,2017 The latest IBM Z COBOL compiler: Enterprise COBOL V6.2! Tom Ross Captain COBOL SHARE Providence August 7,2017 1 COBOL V6.2? YES! The 4 th release of the new generation of IBM Z COBOL compilers Announced:

More information

Operational Semantics of Cool

Operational Semantics of Cool Operational Semantics of Cool Key Concepts semantics: the meaning of a program, what does program do? how the code is executed? operational semantics: high level code generation steps of calculating values

More information

Written Homework 3. Floating-Point Example (1/2)

Written Homework 3. Floating-Point Example (1/2) Written Homework 3 Assigned on Tuesday, Feb 19 Due Time: 11:59pm, Feb 26 on Tuesday Problems: 3.22, 3.23, 3.24, 3.41, 3.43 Note: You have 1 week to work on homework 3. 3 Floating-Point Example (1/2) Q:

More information

RISC Architecture Ch 12

RISC Architecture Ch 12 RISC Architecture Ch 12 Some History Instruction Usage Characteristics Large Register Files Register Allocation Optimization RISC vs. CISC 18 Original Ideas Behind CISC (Complex Instruction Set Comp.)

More information

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel

More information

Survey. Motivation 29.5 / 40 class is required

Survey. Motivation 29.5 / 40 class is required Survey Motivation 29.5 / 40 class is required Concerns 6 / 40 not good at examination That s why we have 3 examinations 6 / 40 this class sounds difficult 8 / 40 understand the instructor Want class to

More information

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance

More information

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University G22.2110-001 Programming Languages Spring 2010 Lecture 4 Robert Grimm, New York University 1 Review Last week Control Structures Selection Loops 2 Outline Subprograms Calling Sequences Parameter Passing

More information

CS 415 Midterm Exam Spring 2002

CS 415 Midterm Exam Spring 2002 CS 415 Midterm Exam Spring 2002 Name KEY Email Address Student ID # Pledge: This exam is closed note, closed book. Good Luck! Score Fortran Algol 60 Compilation Names, Bindings, Scope Functional Programming

More information

Lecture 3: C Programm

Lecture 3: C Programm 0 3 E CS 1 Lecture 3: C Programm ing Reading Quiz Note the intimidating red border! 2 A variable is: A. an area in memory that is reserved at run time to hold a value of particular type B. an area in memory

More information

SIGNED AND UNSIGNED SYSTEMS

SIGNED AND UNSIGNED SYSTEMS EE 357 Unit 1 Fixed Point Systems and Arithmetic Learning Objectives Understand the size and systems used by the underlying HW when a variable is declared in a SW program Understand and be able to find

More information

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications

Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Thread Tailor Dynamically Weaving Threads Together for Efficient, Adaptive Parallel Applications Janghaeng Lee, Haicheng Wu, Madhumitha Ravichandran, Nathan Clark Motivation Hardware Trends Put more cores

More information

LECTURE 19. Subroutines and Parameter Passing

LECTURE 19. Subroutines and Parameter Passing LECTURE 19 Subroutines and Parameter Passing ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments behind a simple name. Data abstraction: hide data

More information

Item A The first line contains basic information about the dump including its code, transaction identifier and dump identifier. The Symptom string

Item A The first line contains basic information about the dump including its code, transaction identifier and dump identifier. The Symptom string 1 2 Item A The first line contains basic information about the dump including its code, transaction identifier and dump identifier. The Symptom string contains information which is normally used to perform

More information

zcobol System Programmer s Guide v1.5.06

zcobol System Programmer s Guide v1.5.06 zcobol System Programmer s Guide v1.5.06 Automated Software Tools Corporation. zc390 Translator COBOL Language Verb Macros COMPUTE Statement Example zcobol Target Source Language Generation Macros ZC390LIB

More information

Transparent Pointer Compression for Linked Data Structures

Transparent Pointer Compression for Linked Data Structures Transparent Pointer Compression for Linked Data Structures lattner@cs.uiuc.edu Vikram Adve vadve@cs.uiuc.edu June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu llvm.cs.uiuc.edu/ Growth of 64-bit computing

More information

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic Instructors: Bernhard Boser & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/ 10/25/16 Fall 2016 -- Lecture #17

More information

CIS 110: Introduction to Computer Programming

CIS 110: Introduction to Computer Programming CIS 110: Introduction to Computer Programming Lecture 3 Express Yourself ( 2.1) 9/16/2011 CIS 110 (11fa) - University of Pennsylvania 1 Outline 1. Data representation and types 2. Expressions 9/16/2011

More information

IBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2

IBM. Data Sheet. Enterprise COBOL for z/os. Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 Enterprise COBOL for z/os IBM Data Sheet Version 6.2 November 2018 This edition applies to Version 6 Release 2 of IBM Enterprise COBOL for z/os (program

More information

ECE 468, Fall Midterm 2

ECE 468, Fall Midterm 2 ECE 468, Fall 08. Midterm INSTRUCTIONS (read carefully) Fill in your name and PUID. NAME: PUID: Please sign the following: I affirm that the answers given on this test are mine and mine alone. I did not

More information

gpucc: An Open-Source GPGPU Compiler

gpucc: An Open-Source GPGPU Compiler gpucc: An Open-Source GPGPU Compiler Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt One-Slide Overview Motivation

More information

Outline. L9: Project Discussion and Floating Point Issues. Project Parts (Total = 50%) Project Proposal (due 3/8) 2/13/12.

Outline. L9: Project Discussion and Floating Point Issues. Project Parts (Total = 50%) Project Proposal (due 3/8) 2/13/12. Outline L9: Project Discussion and Floating Point Issues Discussion of semester projects Floating point Mostly single precision until recent architectures Accuracy What s fast and what s not Reading: Ch

More information

gpucc: An Open-Source GPGPU Compiler

gpucc: An Open-Source GPGPU Compiler gpucc: An Open-Source GPGPU Compiler Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt One-Slide Overview Motivation

More information

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites: C Programming Code: MBD101 Duration: 10 Hours Prerequisites: You are a computer science Professional/ graduate student You can execute Linux/UNIX commands You know how to use a text-editing tool You should

More information

Multi-threaded Queries. Intra-Query Parallelism in LLVM

Multi-threaded Queries. Intra-Query Parallelism in LLVM Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

Method-Level Phase Behavior in Java Workloads

Method-Level Phase Behavior in Java Workloads Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS

More information

Computer (Literacy) Skills. Number representations and memory. Lubomír Bulej KDSS MFF UK

Computer (Literacy) Skills. Number representations and memory. Lubomír Bulej KDSS MFF UK Computer (Literacy Skills Number representations and memory Lubomír Bulej KDSS MFF UK Number representations? What for? Recall: computer works with binary numbers Groups of zeroes and ones 8 bits (byte,

More information

On a 64-bit CPU. Size/Range vary by CPU model and Word size.

On a 64-bit CPU. Size/Range vary by CPU model and Word size. On a 64-bit CPU. Size/Range vary by CPU model and Word size. unsigned short x; //range 0 to 65553 signed short x; //range ± 32767 short x; //assumed signed There are (usually) no unsigned floats or doubles.

More information

Number Systems and Computer Arithmetic

Number Systems and Computer Arithmetic Number Systems and Computer Arithmetic Counting to four billion two fingers at a time What do all those bits mean now? bits (011011011100010...01) instruction R-format I-format... integer data number text

More information

More about Binary 9/6/2016

More about Binary 9/6/2016 More about Binary 9/6/2016 Unsigned vs. Two s Complement 8-bit example: 1 1 0 0 0 0 1 1 2 7 +2 6 + 2 1 +2 0 = 128+64+2+1 = 195-2 7 +2 6 + 2 1 +2 0 = -128+64+2+1 = -61 Why does two s complement work this

More information

Java on System z. IBM J9 virtual machine

Java on System z. IBM J9 virtual machine Performance, Agility, and Cost Savings: IBM SDK5 and SDK6 on System z by Marcel Mitran, Levon Stepanian, and Theresa Tai Java on System z Java is a platform-agnostic (compile-once-run-anywhere), agile,

More information

The Potentials and Challenges of Trace Compilation:

The Potentials and Challenges of Trace Compilation: Peng Wu IBM Research January 26, 2011 The Potentials and Challenges of Trace Compilation: Lessons learned from building a trace-jit on top of J9 JVM (Joint work with Hiroshige Hayashizaki and Hiroshi Inoue,

More information

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University Code Generation Frédéric Haziza Department of Computer Systems Uppsala University Spring 2008 Operating Systems Process Management Memory Management Storage Management Compilers Compiling

More information

Principles of Programming Languages. Lecture Outline

Principles of Programming Languages. Lecture Outline Principles of Programming Languages CS 492 Lecture 1 Based on Notes by William Albritton 1 Lecture Outline Reasons for studying concepts of programming languages Programming domains Language evaluation

More information

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization. Where We Are Source Code Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Machine Code Where We Are Source Code Lexical Analysis Syntax Analysis

More information

University of Wisconsin-Madison

University of Wisconsin-Madison Evolving RPC for Active Storage Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin-Madison Architecture of the future Everything is active Cheaper, faster processing

More information