Delft-Java Link Translation Buffer

Similar documents
Delft-Java Dynamic Translation

Towards a Java-enabled 2Mbps wireless handheld device

Global Scheduler. Global Issue. Global Retire

Delft-Java Dynamic Translation

Run-time Program Management. Hwansoo Han

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Introduction to Java

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture

Overview of Java s Support for Polymorphism

Java On Steroids: Sun s High-Performance Java Implementation. History

Today. Instance Method Dispatch. Instance Method Dispatch. Instance Method Dispatch 11/29/11. today. last time

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

<Insert Picture Here> Maxine: A JVM Written in Java

CSc 453 Interpreters & Interpretation

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

picojava I Java Processor Core DATA SHEET DESCRIPTION

High-Level Language VMs

Compiling Techniques

Lecture 1: Introduction to Java

CS2110 Fall 2011 Lecture 25. Under the Hood: The Java Virtual Machine, Part II

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley

The Java Language Implementation

Lecture 1: Overview of Java

Program Dynamic Analysis. Overview

3/15/18. Overview. Program Dynamic Analysis. What is dynamic analysis? [3] Why dynamic analysis? Why dynamic analysis? [3]

Method-Level Phase Behavior in Java Workloads

Chapter 1 Introduction to Java

Just-In-Time Compilation

Python Implementation Strategies. Jeremy Hylton Python / Google

Virtual Machine Design

Java Internals. Frank Yellin Tim Lindholm JavaSoft

CSE 237B Fall 2009 Virtualization, Security and RTOS. Rajesh Gupta Computer Science and Engineering University of California, San Diego.

Outline. Introduction to Java. What Is Java? History. Java 2 Platform. Java 2 Platform Standard Edition. Introduction Java 2 Platform

CS263: Runtime Systems Lecture: High-level language virtual machines. Part 1 of 2. Chandra Krintz UCSB Computer Science Department

Main Points of the Computer Organization and System Software Module

CSE 504: Compiler Design. Runtime Environments

On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine

02 B The Java Virtual Machine

MethodHandle implemention tips and tricks

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

Introduction Basic elements of Java

Instruction Set Principles and Examples. Appendix B

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

Static Analysis of Dynamic Languages. Jennifer Strater

Evolution of Virtual Machine Technologies for Portability and Application Capture. Bob Vandette Java Hotspot VM Engineering Sept 2004

55:132/22C:160, HPCA Spring 2011

JavaSplit. A Portable Distributed Runtime for Java. Michael Factor Assaf Schuster Konstantin Shagin

Hardware-Supported Pointer Detection for common Garbage Collections

Efficient Method Invocation on an Embedded Bytecode Processor by Autonomous Functional Hardware Submodules and Bytecode Pre-Processing

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

<Insert Picture Here> Symmetric multilanguage VM architecture: Running Java and JavaScript in Shared Environment on a Mobile Phone

Seminar report Java Submitted in partial fulfillment of the requirement for the award of degree Of CSE

Space Exploration EECS /25

JAVA MICROARCHITECTURES

Interfacing Operating Systems and Polymorphic Computing Platforms based on the MOLEN Programming Paradigm

ShortCut: Architectural Support for Fast Object Access in Scripting Languages

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

Playing with bird guts. Jonathan Worthington YAPC::EU::2007

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

JSR 292 Cookbook: Fresh Recipes with New Ingredients

Notes of the course - Advanced Programming. Barbara Russo

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

Compilers and Code Optimization EDOARDO FUSELLA

Chapter 5. A Closer Look at Instruction Set Architectures

Distributed Information Processing

Running class Timing on Java HotSpot VM, 1

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Chapter 1 Introduction to Computers, Programs, and Java

Optimization Techniques

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

CS263: Runtime Systems Lecture: High-level language virtual machines

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Java Language. Programs. Computer programs, known as software, are instructions to the computer. You tell a computer what to do through programs.

Martin Kruliš, v

Advanced Computer Architecture

CSE 2021: Computer Organization

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Introduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview

Introduction to Programming Using Java (98-388)

Procedure and Object- Oriented Abstraction

Final Exam. 11 May 2018, 120 minutes, 26 questions, 100 points

VM instruction formats. Bytecode translator

CS/B.TECH/CSE(OLD)/SEM-6/CS-605/2012 OBJECT ORIENTED PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70

Interaction of JVM with x86, Sparc and MIPS

Certified Core Java Developer VS-1036

Proceedings of the Java Virtual Machine Research and Technology Symposium (JVM '01)

Sri Vidya College of Engineering & Technology

Reducing the Overhead of Dynamic Compilation

Compiler construction 2009

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Runtime Application Self-Protection (RASP) Performance Metrics

Question. 1 Features of Java.

Reducing the Overhead of Dynamic Compilation

CHAPTER - 4 REMOTE COMMUNICATION

Lecture 9 Dynamic Compilation

Transcription:

Delft-Java Link Translation Buffer John Glossner 1,2 and Stamatis Vassiliadis 2 1 Lucent / Bell Labs Advanced DSP Architecture and Compiler Research Allentown, Pa glossner@lucent.com 2 Delft University of Technology CARDIT - Computer Architecture and Digital Technique Delft, The Netherlands {glossner,stamatis@cardit.et.tudelft.nl

Overview Java Properties S/W Bytecode Execution H/W Bytecode Execution Delft-Java Engine Java Hardware Support Java Dynamic Instruction Translation Link Translation Buffer Preliminary Results Conclusions

Java Properties Object-Oriented Programming Language Inheritance and Polymorphism Supported Programmer Supplied Parallelism (Threads) Dynamically Linked Resolved C++ s fragile class problem but imposes performance constraints on class access Entire set of objects in system not required at compile time Strongly Typed Statically determinable type state enables simple on-the-fly translation of bytecodes into efficient machine code [Gos95] Compiled to Platform Independent Virtual Machine

Interpretation S/W Bytecode Execution S/W emulates the Java Virtual Machine Cross platform but poses performance issues Just-in-time Compilation Translate from bytecode to native code just prior to execution 5-10x performance improvement over Interpretation Compilation is only resident for the current program invocation Native Compilation Native code generated directly from the Java source Best Performance but contrary to write once, run anywhere

S/W Bytecode Execution(2) Off-line Compilation Program is distributed in bytecode form and translated into a native machine code (and stored on disk) prior to execution Additional (time-consuming) optimizations may be performed Bytecode contains nearly the same amount of information as the Java source code Hybrid/Dynamic Compilation Highly optimized JIT integrated in run-time environment Interpreted code profiled during execution Hotspots detected are dynamically compiled Improvements up to 140x interpreted and 13x JIT reported

H/W Bytecode Execution Sun picojava Direct Execution Stack Cache Implemented with registers Automatic stack spill/fill Acceleration instruction folding Instruction and Data Cache Global L1 Extended bytecodes Complex instructions trap Contiguous Stack Frame Delft-Java Dynamic Translation Translated to RISC instructions Indirect register access Runtime register allocation Acceleration compounding instruction issue multiple thread units Link Translation Buffer Instruction and Data Cache Global L1 Cache Per thread L0 Instruction, Stack, Local Variable Cache Superset of instructions (w/ BEX) Complex instructions trap Contiguous Stack Frame

Delft-Java Engine RISC-style Architecture 32-bit Instructions Multiple Register Files Concurrent Multithreaded Organization Multiple Hdwr Thread Units Multiple Instruction Issue Per Thread Indirect Register Access Supervisory Instructions Branch Java View (bex) Integer & Floating Point 8, 16, 32, and 64-bit Signed & Unsigned Integers IEEE-754 Floating Point Multimedia Instructions SIMD Parallelism DSP Arithmetic Extensions Saturation Logic Rounding Modes 32-bit Address Space Base + Offset + Displacement

Java Hardware Support Transparent Extraction of Parallelism Multiple concurrent thread units Dynamic Java Instruction Translation Register file caches stack with indirect access JVM Reserved Instruction Used For BEX Link Translation Buffer For Dynamic Linking Associates a caller s object reference and constant pool entry ID with a linked object invocation Logical Controller For Non-Supported Translations Thin interpretive layer and Java run-time

Delft-Java Organization

Link Translation Buffer A global cache for dynamically resolved names Associates a caller s method invocation with the (previously) resolved method address First time an object method is invoked, the controller resolves the constant pool name Additional invoke instructions with same signature are executed from the LTB cache JVM invoke instructions maintain high-level information which allow use of a cache

C++ vs. Java Invocation C++ class MyClass { public: virtual void instancemethod() {; ; class MySubClass : public MyClass { public: virtual void instancemethod() {; ; void main() { MyClass mc = MyClass(); MyClass msc = MySubClass(); mc.instancemethod(); msc.instancemethod(); Java public class MyClass { public void instancemethod() { public class MySubClass extends MyClass { public void instancemethod() { class Test { public static void main(string args[]) { MyClass mc = new MyClass(); MyClass msc = new MySubClass(); mc.instancemethod(); msc.instancemethod();

C++ Virtual Tables C++ class MyClass { public: virtual void instancemethod() {; ; Vtables Memory class MySubClass : public MyClass { public: virtual void instancemethod() {; ; MySubClass 3 2 1 0 instancem() void main() { MyClass mc = MyClass(); MyClass msc = MySubClass(); mc.instancemethod(); msc.instancemethod(); MyClass 3 2 1 0 instancem()

Java Method Dispatch Java public class MyClass { public void instancemethod() { public class MySubClass extends MyClass { public void instancemethod() { class Test { public static void main(string args[]) { MyClass mc = new MyClass(); MyClass msc = new MySubClass(); mc.instancemethod(); msc.instancemethod(); Java Bytecodes Method void main(java.lang.string []) Line instaddr Instr 1 0 new #3 <Class MyClass> 2 3 dup 3 4 invokenonvirtual #7 <Method MyClass.<init>()V> 4 7 store_1 5 8 new #4 <Class MySubClass> 6 11 dup 7 12 invokenonvirtual #6 <Method MySubClass.<init>()V> 8 15 astore_2 9 16 aload_1 10 17 invokevirtual #5 <Method MyClass.instanceMethod()V> 11 20 aload_2 12 21 invokevirtual #5 <Method MyClass.instanceMethod()V> 13 24 return Lines 10 and 12 appear to invoke same method Disambiguated by aload in lines 9 and 11 Line 4 stored a MyClass Object Reference in LV[1] Line 8 stored a MySubClass Object Reference in LV[2] Functions much like a C++ virtual table. A JVM may decide to build a vtable (or method dispatch table) dynamically at runtime

LTB Acceleration Constant Pool contains name of of method to be invoked stored as a string (e.g. a symbol table) JVM searches for method based on run-time type and returns the address of the method being invoked The resolved address can now be associated with the run-time type and Constant Pool offset An LTB accelerates latebinding by storing the association in a special fast-access memory If the invocation signature is found in the LTB, the invocation address is quickly return. Otherwise, the request is forwarded to the control unit.

LTB Organization Caller's Reference 32-bit CPool Entry 16-bit Callee's Object Ref 32-bit LV[0] 32-bit CP[0] 32-bit Other Given a Caller s ID and a Callee s ID, a method or field can be associated with a (previously resolved) physical address Other Possible Optimizations Each (JVM 16-bit) per frame Local Variables is mapped to a starting 32-bit physical address Each (JVM 16-bit) per class Constant Pool location is mapped to a starting 32-bit physical address Synchronization locks Garbage collection reference counts Field data cache Extended instructions may lock, free, or flush an LTB line

LTB Performance Assumptions: Unit latency execution except LTB > 100 cycles on 1st access 1 cycle if in LTB Perfect branch prediction Perfect caches (except LTB) Single-thread, single in-order issue Random Replacement Algorithm Fully Associative LTB Preliminary: Synthetic benchmarks used C++ Model is not a compliant JVM JVM APIs not supported GC and IO not implemented Workload Characteristics Work Load Objects Created Percent Dynamic Instr Ideal Speedup WL1 2048 40% 1.67 WL2 32 10% 1.11 WL3 512 20% 1.25 WL4 1024 30% 1.43 Available Speedup Application Speedup 1.5 1.4 1.3 1.2 1.1 1.0 0% 5% 10% 20% 30% % Method Invocations Ideal 10x 6x 5x 4x 3x 2x 1.5x 1.1x

LTB Performance Speedup Application Speedup 1.70 1.60 1.50 1.40 1.30 1.20 1.10 16 Entry 32 Entry 64 Entry 128 Entry 256 Entry 512 Entry 1024 Entry Ideal 1.00 0.90 Miss Rate WL1 WL2 WL3 WL4 Work Load Work Load Objects Created Percent Dynamic Instr Ideal Speedup WL1 2048 40% 1.67 WL2 32 10% 1.11 WL3 512 20% 1.25 WL4 1024 30% 1.43 100% 90% 80% Miss Rate 70% 60% 50% 40% 30% 20% 10% 0% 16 Entry 32 Entry 64 Entry 128 Entry 256 Entry 512 Entry 1024 Entry Ideal -10% WL1 WL2 WL3 WL4 Work Load

Conclusions LTB Buffers May Improve Java Performance 1.1x to 1.5x on synthetic benchmarks Dynamic method invocation supported in ISA Provides for architecturally transparent acceleration (LTB) A sequence of instructions is produced for a typical ISA High-level operation (dynamic link invocation) is lost Little possibility for run-time acceleration Future Work Extend model to compliant JVM (non-synthetic benchmarks) Characterize use of an instruction address as the caller s ID Characterize performance versus LTB associativity