Writing a Sony PlayStation Emulator Using Java Technology

Similar documents
CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Java On Steroids: Sun s High-Performance Java Implementation. History

Compiling Techniques

Virtual Machine Design

Bytecode Manipulation Techniques for Dynamic Applications for the Java Virtual Machine

Hardware Emulation and Virtual Machines

Crash Course in Java. Why Java? Java notes for C++ programmers. Network Programming in Java is very different than in C/C++

Lecture 9 Dynamic Compilation

Processor design - MIPS

IMPLEMENTING PARSERS AND STATE MACHINES IN JAVA. Terence Parr University of San Francisco Java VM Language Summit 2009

Architecture II. Computer Systems Laboratory Sungkyunkwan University

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

CS 61C: Great Ideas in Computer Architecture. MIPS Instruction Formats

For our next chapter, we will discuss the emulation process which is an integral part of virtual machines.

5/17/2012. Recap from Last Time. CSE 2021: Computer Organization. The RISC Philosophy. Levels of Programming. Stored Program Computers

Recap from Last Time. CSE 2021: Computer Organization. Levels of Programming. The RISC Philosophy 5/19/2011

Java: framework overview and in-the-small features

CSE 2021: Computer Organization

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

CSc 520 Principles of Programming Languages

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

23/02/15. Compile, execute, debug. Advanced Programming THE JAVA PLATFORM

PC I/O. May 7, Howard Huang 1

Advanced processor designs

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

High-Level Language VMs

Real Time: Understanding the Trade-offs Between Determinism and Throughput

ecture 12 From Code to Program: CALL (Compiling, Assembling, Linking, and Loading) Friedland and Weaver Computer Science 61C Spring 2017

Notes of the course - Advanced Programming. Barbara Russo

Test Patterns in Java

Thomas Polzer Institut für Technische Informatik

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

CSc 453. Compilers and Systems Software. 18 : Interpreters. Department of Computer Science University of Arizona. Kinds of Interpreters

Git. all meaningful operations can be expressed in terms of the rebase command. -Linus Torvalds, 2015

Instruction Set Architecture

Techniques described here for one can sometimes be used for the other.

Advanced Computer Architecture

Instruction Set Principles and Examples. Appendix B

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.

CS64 Week 5 Lecture 1. Kyle Dewey

CS577 Modern Language Processors. Spring 2018 Lecture Interpreters

Lectures 3-4: MIPS instructions

Slide Set 4. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

LECTURE 2: INSTRUCTIONS

CS 61c: Great Ideas in Computer Architecture

New Compiler Optimizations in the Java HotSpot Virtual Machine

Welcome to the session...

Topic Notes: MIPS Instruction Set Architecture

COMPUTER ORGANIZATION AND DESI

Lecture 4: MIPS Instruction Set

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June

MemoryLint. Petr Nejedlý, Radim Kubacki SUN Microsystems, BOF-9066

Traps, Exceptions, System Calls, & Privileged Mode

Operating Systems: Virtual Machines & Exceptions

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

COMP163. Introduction to Computer Programming. Introduction and Overview of the Hardware

Computer Architecture

Lec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements

CENG3420 Lecture 03 Review

A hybrid approach to application instrumentation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

Just-In-Time Compilers & Runtime Optimizers

Emulation 2. G. Lettieri. 15 Oct. 2014

Kampala August, Agner Fog

Name: Login: cs61c- Decimal Ternary 5 12 three three

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

CSc 553. Principles of Compilation. 2 : Interpreters. Department of Computer Science University of Arizona

Java and Software Design

CS150 Project Final Report

Lecture #7: Implementing Mutual Exclusion

The MIPS Instruction Set Architecture

Introduction to the Performance Analyzer For PlayStation 2. Kirk Bender & Geoff Audy Developer Support Engineers Sony Computer Entertainment America

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Chapter. Focus of the Course. Object-Oriented Software Development. program design, implementation, and testing

Machine Instructions - II. Hwansoo Han

CPSC 213. Introduction to Computer Systems. The Operating System. Unit 2e

Very short answer questions. "True" and "False" are considered short answers.

School of Computer Science CPS109 Course Notes 5 Alexander Ferworn Updated Fall 15

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto)

CSc 453 Interpreters & Interpretation

AUTOMATED HEAPDUMP ANALYSIS FOR DEVELOPERS, TESTERS, AND SUPPORT EMPLOYEES

Computer Components. Software{ User Programs. Operating System. Hardware

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook)

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

2B 52 AB CA 3E A1 +29 A B C. CS120 Fall 2018 Final Prep and super secret quiz 9

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

Evolution of Virtual Machine Technologies for Portability and Application Capture. Bob Vandette Java Hotspot VM Engineering Sept 2004

Running class Timing on Java HotSpot VM, 1

OS Structures. ICS332 Operating Systems

ECE 498 Linux Assembly Language Lecture 1

Emulation. Michael Jantz

I-Format Instructions (3/4) Define fields of the following number of bits each: = 32 bits

Virtual Machines. Virtual Machines

CENG3420 L03: Instruction Set Architecture

Transcription:

Writing a Sony PlayStation Emulator Using Java Technology Graham Sanderson Matt Howitt Lombardi Software www.lombardisoftware.com TS-5547 2006 JavaOne SM Conference Session TS-5547

High Performance Java Technology I ll share some of the performance tricks I used to implement a fast PlayStation emulator in the Java programming language, and talk about some cool Java technology stuff I got to mess with in the process 2006 JavaOne SM Conference Session TS-5547 2

Agenda Introduction Enough Already, Let s See It! Performance Tricks Cool Stuff Java HotSpot VM for R3000? Q&A and Another Demo 2006 JavaOne SM Conference Session TS-5547 3

Agenda Introduction Enough Already, Let s See It! Performance Tricks Cool Stuff Java HotSpot VM for R3000? Q&A and Another Demo 2006 JavaOne SM Conference Session TS-5547 4

Technical Requirements Sony PlayStation Specs 32 bit RISC CPU @ 33.9MHz Geometry co-processor 500k lighted triangles per sec Graphics co-processor @ 33.9MHz 360000 triangles per sec Thousands of 2d sprites with rotation/scaling Alpha transparency and Gouraud shading Resolutions up to 640x512 at 30fps Decompression co-processor for video 24 channel sound @ 44kHz 2006 JavaOne SM Conference Session TS-5547 5

Architecture Goals Object oriented Machine should be assembled from loosely-coupled component classes representing: Physical hardware Processor instructions Memory mapped code/data Internal emulator components e.g. byte code generators Written entirely in the Java language If it physically can be done entirely in the Java language, then do so Implemented with clear maintainable code 2006 JavaOne SM Conference Session TS-5547 6

How Did It Turn Out? Emulation machine is assembled from arbitrary components Except: address space and some execution flow internals Uses well known connection points Entirely Java technnology (you do need to use a Java Native Interface (JNI) based media component to run directly off CD) Code is clear(ish) 2006 JavaOne SM Conference Session TS-5547 7

DEMO The Emulator in Action! 2006 JavaOne SM Conference Session TS-5547 8

Agenda Introduction Enough Already, Let s See It! Performance Tricks Cool Stuff Java HotSpot VM for R3000? Q&A and Another Demo 2006 JavaOne SM Conference Session TS-5547 9

Handler Functions // Instruction decoding in the interpreter: // Calling different handler functions // for different op-codes // simplified instruction interface interface Instruction { public void execute(int opcode); class CPU { // simplified interpreter loop public void execute() { while (true) { int opcode = memory[ip++]; instructions[opcode&0x3f].execute(opcode); 2006 JavaOne SM Conference Session TS-5547 10

Handler Functions // Memory Mapped I/O // similar, but address range is big and sparsely filled // Possible Solution 1: Map Handler handler = (Handler)handlerMap.get(new Integer(address)); handler.write(data); // Possible Solution 2: Switch with non static method switch (address) { case 20: handler20.write(data); break; case 100: handler100.write(data); break; // Possible Solution 3: Switch with static method switch (address) { case 20: Handler20.write(data); break; case 100: Handler100.write(data); break; 2006 JavaOne SM Conference Session TS-5547 11

Handler Functions Test Map/Object Call Array/Interface Call J2SE 1.4.2 Client Java SE 6 Client (Beta) Java SE 6 Server (Beta) Array/Object Call Switch/Object Call Switch/Static Call Binary Tree/Static Call Source: Average of 5 runs after warm up on my Windows XP laptop 0 50 100 150 Million Calls Per Second 2006 JavaOne SM Conference Session TS-5547 12

Handler Functions Summary Picked method for fastest execution Binary Tree of if statements Calls to a static member function of a particular implementation class Still needed run time configurability Instructions/Handlers registered in Array/Map during start up Utility class uses BCEL to build optimal method to dispatch calls 2006 JavaOne SM Conference Session TS-5547 13

Costly IFs // Generic class which handles 4 possible // rendering combinations public class Renderer { public void render( boolean alpha, boolean paletted) { // simplified pixel loop for_all_pixels { int color; if (!paletted) color = texture[src]; else color = palette[texture[src]&0xf]; if (alpha) color = alpha*color + (1-alpha)*background; screen[dest] = color; 2006 JavaOne SM Conference Session TS-5547 14

Costly IFs // Specialized class which handles just one combination // no-alpha, no-palette public class NoAlphaNoPaletteRenderer { public void render() { // simplified pixel loop for_all_pixels { screen[dest] = texture[src]; 2006 JavaOne SM Conference Session TS-5547 15

Costly IFs // Using JAVAC to specialize the class for us public class NoAlphaNoPaletteRenderer { public static final boolean alpha = false; public static final boolean paletted = false; public void render() { // simplified pixel loop for_all_pixels { int color; if (!paletted) color = texture[src]; else color = palette[texture[src]&0xf]; if (alpha) color = alpha*color + (1-alpha)*background; screen[dest] = color; 2006 JavaOne SM Conference Session TS-5547 16

Costly IFs // Another generic class which handles all 4 combinations public class TemplateRenderer { public static final boolean alpha = isalpha(); public static final boolean paletted = ispalette(); public void render() { // simplified pixel loop for_all_pixels { int color; if (!paletted) color = texture[src]; else color = palette[texture[src]&0xf]; if (alpha) color = alpha*color + (1-alpha)*background; screen[dest] = color; 2006 JavaOne SM Conference Session TS-5547 17

Costly IFs // as compiled by HotSpot if alpha and palette // are set to false during static initialization public class TemplateRenderer { public static final boolean alpha = getfalse(); public static final boolean paletted = getfalse(); public void render() { // simplified pixel loop for_all_pixels { int color; if (!paletted) color = texture[src]; else color = palette[texture[src]&0xf]; if (alpha) color = alpha*color + (1-alpha)*background; screen[dest] = color; 2006 JavaOne SM Conference Session TS-5547 18

Costly IFs Using code generation I clone and rename my generic class, and change the bytecode for the static member initializers Without using code generation Set static final variables based on immutable configuration properties Factory alternate implementations of the same class in separate class loaders Or 2006 JavaOne SM Conference Session TS-5547 19

Costly IFs // hybrid case! public static final boolean xismutable = ; public static final boolean xinitialvalue = ; boolean xvalue = xinitialvalue; public boolean getx () { return xismutable? xvalue : xinitialvalue; if xismutable == true, simplifies to xvalue; if xismutable == false, simplifies to xinitialvalue, and hence statically either true or false, In either case it will likely be inlined 2006 JavaOne SM Conference Session TS-5547 20

1ms Timer Resolution I want to Have a main CPU thread Have a separate background thread for asynchronous hardware Which means I need to Measure time to 1ms accuracy Add callbacks at arbitrary but accurate frequencies But how entirely in the Java language? On some platforms System.currentTimeMillis() has poor resolution And Thread.sleep()? 2006 JavaOne SM Conference Session TS-5547 21

Poor Man s 1ms Resolution Timer // TimeKeeper thread running at Thread.MAX_PRIORITY while (true) { // repeated calls to Thread.sleep(1) actually // keep the delay accurate on Windows! Thread.sleep(1); synchronized (this) { // provide rough estimate of time (can be behind) time++; // schedule notification to event worker thread // running at Thread.NORM_PRIORITY+2 if (time>nextscheduledeventtime) { notify(); 2006 JavaOne SM Conference Session TS-5547 22

Agenda Introduction Enough Already, Let s See It! Performance Tricks Cool Stuff Java HotSpot VM for R3000? Q&A and Another Demo 2006 JavaOne SM Conference Session TS-5547 23

Two-Stage Compiler Converts units of R3000 code into Java classes 1 st stage Used in preference to interpreter Simple translation of code Gathers data to help second stage 2 nd stage Used for hot methods Does flow analysis and constant propagation Uses information from the first stage for some key optimizations 2006 JavaOne SM Conference Session TS-5547 24

What Is a Code Unit? Starts at the target address of any call, or any jump to a dynamic address Includes all instructions which can be determined to be reached; i.e. it stops at a branch to another dynamic address. It turns out that this is often a single C function 2006 JavaOne SM Conference Session TS-5547 25

Stage 1 Unit Class public class _1XXXXXXXX implements Executable { // holder of runtime state for this code unit public static CodeUnit unit; // static method to execute the unit public static int s(int retaddr, boolean jump){ // 1) forward to stage 2 if this is hot if (unit.usestage2()) return _2XXXXXXXX.s(retAddr, jump); // 2) simple state machine if (unit.count>0) unit.count--; else unit.countcomplete(); // 3) implementation of R3000 code (omitted) 2006 JavaOne SM Conference Session TS-5547 26

Code Units Call Each-other Directly 0x80103020 addiu r2, r0, #4 ; load register 2 with 4 0x80103024 jal 0x80104088 ; call function at 80104088 ; saving return address in r31 0x80103028 nop ; delay slot 0x8010302c ; next instruction public class _180103020 implements Executable { public static int s(int retaddr, boolean jump) { // preceding code omitted Compiler.reg_2 = 4; Compiler.reg_31 = 0x8010302c; _180104088.s(0x8010302c,false); // following code omitted 2006 JavaOne SM Conference Session TS-5547 27

R3000 Calls Are Java Language Calls public class _180103020 implements Executable { * @param retaddr the expected return address of * the current R3000 frame * @param jump true we re here by jump not call * @return the next execution address public static int s(int retaddr, boolean jump) { // (omitted all but the last instruction) // code for jr r31 (basically return ) int target = Compiler.reg_31; while (true) { if (target == retaddr jump) return target; else target = Compiler.jump( target, retaddr); 2006 JavaOne SM Conference Session TS-5547 28

Compiler Architecture CPU execution thread (normal priority) Runs interpreter loop, calls into stage 1 classes for any JAL ClassLoader does stage 1 compilation as necessary Schedules for background stage 2 compilation any code units which have become hot Background compilation thread 1 (low priority) Does stage 1 compilation Background compilation thread 2 (low priority) Does stage 2 compilation Stage 1 compilation Includes scheduling for background compilation any referenced (by JAL) but currently missing stage 1 classes 2006 JavaOne SM Conference Session TS-5547 29

Stage 2 Unit Class // 80131000 lui r2, 0x8001 ; load r2 with 0x80010000 // 80131004 lw r2, r2[0x1234] ; load r2 from 0x80011234 public class _280131000 implements Executable { public static int s(int retaddr, boolean jump) { if (replaced) return _380131000(retAddr, jump); // (stage 1 version is) // Compiler.reg_2 = 0x80010000 // Compiler.reg_2 = // AddressSpace.read32(Compiler.reg_2); Compiler.reg_2 = AddressSpace.ram[0x11234/4]; // remaining R3000 code omitted 2006 JavaOne SM Conference Session TS-5547 30

ArrayIndexOutOfBoundsException Is Our Friend! // 80131004 lw r2, r6[0] public class _2XXXXXXXX implements Executable { public static int s(int retaddr, boolean jump) { if (replaced) return _3XXXXXXXX(retAddr, jump); // (stage 1 version is) // AddressSpace.tagRead(0x80131004,Compiler.reg_6); // Compiler.reg_2 = // AddressSpace.read32(Compiler.reg_6); Compiler.reg_2 = AddressSpace.ram[(Compiler.reg_6&RAM_MASK)/4]; // remaining R3000 code omitted 2006 JavaOne SM Conference Session TS-5547 31

Other Interesting Tidbits Oops R3000 code in RAM is over-writable! We have to throw away our class loader on instruction cache flush There are bugs in the R3000 code too! The compiler is just a component too you can replace it if you like We detect and avoid busy-wait, so we have time for our background threads, and don t hog the CPU 2006 JavaOne SM Conference Session TS-5547 32

Q&A and Another Demo 2006 JavaOne SM Conference Session TS-5547 33

Summary Java technology is fast enough to run a PlayStation emulator on modern hardware Byte-code generation is cool, but you can do a bunch of stuff without it You don t have to sacrifice code maintainability I plan to open source, so people can start adding stuff (e.g., Java 3D API), SPU rewrite with latest Java Sound API, etc. 2006 JavaOne SM Conference Session TS-5547 34

Writing a Sony PlayStation Emulator Using Java Technology Graham Sanderson Matt Howitt Lombardi Software www.lombardisoftware.com TS-5547 2006 JavaOne SM Conference Session TS-5547