A Type System for Object Initialization In the Java TM Bytecode Language

Similar documents
A.java class A f void f() f... g g - Java - - class file Compiler > B.class network class file A.class Java Virtual Machine Loa

A Type System for Java Bytecode Subroutines

Homework 6. Reading. Problems. Handout 7 CS242: Autumn November

void f() { try { something(); } finally { done(); } } Figure 1: A method using a try-finally statement. system developed by Stata and Abadi [SA98a]. E

Java byte code verification

Parsing Scheme (+ (* 2 3) 1) * 1

1 Introduction One of the contributions of Java is in its bytecode verier, which checks type safety of bytecode for JVM (Java Virtual Machine) prior t

A type system for JVM threads

Deep type inference for mobile functions

Problems of Bytecode Verification

Run-time Program Management. Hwansoo Han

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Project 3 Due October 21, 2015, 11:59:59pm

Type-Based Information Flow Analysis for Low-Level Languages

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Finding References in Java Stacks

Static Program Analysis

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Integrated Java Bytecode Verification

Towards Array Bound Check Elimination in Java TM Virtual Machine Language

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

Java Internals. Frank Yellin Tim Lindholm JavaSoft

Hardware-Supported Pointer Detection for common Garbage Collections

Supplementary Notes on Exceptions

Verifying a Compiler for Java Threads

CS4215 Programming Language Implementation

Formal languages and computation models

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

Recursively Enumerable Languages, Turing Machines, and Decidability

G Programming Languages - Fall 2012

The Turing Machine. Unsolvable Problems. Undecidability. The Church-Turing Thesis (1936) Decision Problem. Decision Problems

A Static Type System for JVM Access Control

Informatica 3 Syntax and Semantics

Programming Language Concepts Scoping. Janyl Jumadinova January 31, 2017

Intermediate Code Generation

On a New Method for Dataow Analysis of Java Virtual Machine Subroutines Masami Hagiya and Akihiko Tozawa Department of Information Science, Graduate S

JAM 16: The Instruction Set & Sample Programs

Chapter 13: Reference. Why reference Typing Evaluation Store Typings Safety Notes

CSc 453 Interpreters & Interpretation

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2016

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012

Foundations. Yu Zhang. Acknowledgement: modified from Stanford CS242

Systems I. Machine-Level Programming V: Procedures

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

(Not Quite) Minijava

CS 242. Fundamentals. Reading: See last slide

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

Single-pass Static Semantic Check for Efficient Translation in YAPL

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Constrained Types and their Expressiveness

CS558 Programming Languages Winter 2018 Lecture 4a. Andrew Tolmach Portland State University

Interaction of JVM with x86, Sparc and MIPS

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Operational Semantics of Cool

Course Administration

Announcements. Scope, Function Calls and Storage Management. Block Structured Languages. Topics. Examples. Simplified Machine Model.

Summer 2003 Lecture 14 07/02/03

A Note on the Succinctness of Descriptions of Deterministic Languages

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Goal of lecture. Object-oriented Programming. Context of discussion. Message of lecture

Part VI. Imperative Functional Programming

INF3110 Programming Languages Runtime Organization part II

CS558 Programming Languages

Type Checking and Type Equality

Motivation was to facilitate development of systems software, especially OS development.

CSc 520 final exam Wednesday 13 December 2000 TIME = 2 hours

Lecture Notes on Program Equivalence

High-Level Language VMs

Today's Topics. CISC 458 Winter J.R. Cordy

Untyped Memory in the Java Virtual Machine

Compiling Techniques

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

A New Type System for JVM Lock Primitives

CIS 341 Final Examination 4 May 2017

CS2110 Fall 2011 Lecture 25. Under the Hood: The Java Virtual Machine, Part II

We can emit stack-machine-style code for expressions via recursion

Java Class Loading and Bytecode Verification

CSCE 314 Programming Languages

Parameter passing. Parameter: A variable in a procedure that represents some other data from the procedure that invoked the given procedure.

Wednesday, October 15, 14. Functions

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

CSC 2400: Computing Systems. X86 Assembly: Function Calls"

Plan for Today. Safe Programming Languages. What is a secure programming language?

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Today. Instance Method Dispatch. Instance Method Dispatch. Instance Method Dispatch 11/29/11. today. last time

Semantic Analysis and Type Checking

On Meaning Preservation of a Calculus of Records

Computing Science 114 Solutions to Midterm Examination Tuesday October 19, In Questions 1 20, Circle EXACTLY ONE choice as the best answer

Operational Semantics. One-Slide Summary. Lecture Outline

CS 2210 Programming Project (Part IV)

Dependant Type Systems (saying what you are) Data Abstraction (hiding what you are) Review. Dependent Types. Dependent Type Notation

Chapter 1. Introduction

Binary Code Analysis: Concepts and Perspectives

CS 415 Midterm Exam Spring 2002

Procedure and Object- Oriented Abstraction

Midterm II CS164, Spring 2006

Functions in C. Lecture Topics. Lecture materials. Homework. Machine problem. Announcements. ECE 190 Lecture 16 March 9, 2011

Transcription:

Electronic Notes in Theoretical Computer Science 10 (1998) URL: http://www.elsevier.nl/locate/entcs/volume10.html 7 pages A Type System for Object Initialization In the Java TM Bytecode Language Stephen N. Freund John C. Mitchell Department of Computer Science Stanford University Stanford, CA 94305-9045 {freunds, mitchell}@cs.stanford.edu Abstract In the standard Java implementation, a Java language program is compiled to Java bytecode and this bytecode is then interpreted by the Java Virtual Machine. Since bytecode may be written by hand, or corrupted during network transmission, the Java Virtual Machine contains a bytecode verifier that performs a number of consistency checks before code is interpreted. However, there is no formal specification of the verifier. As one-step towards such a specification, we develop a precise specification of a subset of the bytecode language dealing with object creation and initialization. For this subset, we prove that for every Java bytecode program that satisfies our typing constraints, every object is initialized before it is used. 1 Introduction The Java programming language is a statically-typed general-purpose programming language with an implementation architecture that is designed to facilitate transmission of compiled code across a network. In the standard implementation, a Java language program is compiled to Java bytecode and this bytecode is then interpreted by the Java Virtual Machine. The intermediate bytecode language, which we refer to as JVML, is a typed, machineindependent form of assembly language with some low-level instructions that reflect specific high-level Java source language constructs. For example, classes are a basic notion in JVML. Since bytecode may be written by hand, or corrupted during network transmission, the Java Virtual Machine contains a bytecode verifier that performs a number of consistency checks before code is interpreted. This protects the receiver from certain security risks and various forms of attack. In this paper, we develop a specification for a fragment of the bytecode language that includes object creation (allocation of memory) and initialization. This work is based on a prior study of the bytecodes for local subroutine c 1998 Published by Elsevier Science B. V.

call and return [2]. We prove soundness of the type system by a traditional method using operational semantics. It follows from the soundness theorem that any bytecode program that passes the static checks will initialize every object before it is used. 2 Object Initialization As in many other object-oriented languages, the Java implementation creates new objects in two steps. The first step is to allocate space for the object. This usually requires some environment specific operation to obtain an appropriate region of memory. In the second step, user-defined code is executed to initialize the object. In Java, the user initialization code is provided by a constructor defined in the class of the object. Only after both of these steps are completed can a method be invoked on an object. In the Java source language, allocation and initialization are combined into a single statement. This is illustrated in the following code fragment: Point p = new Point(3); p.print(); Since every Java object is created by a statement like the one in the first line here, it does not seem difficult to prevent Java source language programs from invoking methods on objects that have not been initialized. It is much more difficult to recognize initialization-before-use in bytecode. This can be seen by looking at the five lines of bytecode that are produced by compiling the two lines of source code above: 1: new #1 <Class Point> 2: dup 3: iconst_3 4: invokespecial #4 <Method Point(int)> 5: invokevirtual #5 <Method void Print()> The most striking difference is that memory allocation (line 1) is separated from the constructor invocation (line 4) by two lines of code. The first intervening line, dup, duplicates the pointer to the uninitialized object. This line is needed due to the calling convention of the Java Virtual Machine. The second line, iconst 3, pushes the constructor argument 3 onto the stack. Since pointers may be duplicated, as above, and there may be more than one uninitialized object present at any time, some form of aliasing analysis must be used. Sun s Java Virtual Machine Specification [1] describes the alias analysis used by the Sun s JDK verifier. For each line of the bytecode program, some status information is recorded for every local variable and stack location. When a location points to an object that is not known to be initialized in all executions reaching this statement, the status will include not only the property uninitialized, but also the line number on which the uninitialized object would have been created. As references are duplicated on the stack 2

and stored and loaded in the local variables, the analysis also duplicates these line numbers. All references having the same line number are assumed to refer to the same object. When an object is initialized, all pointers that refer to objects created at the same line number are set to initialized. In other words, all references to objects of a certain type are partitioned into equivalence classes according to what is statically known about each reference. Since aliasing is irrelevant for objects that have been initialized, it is not necessary to track aliasing once a reference leads to an initialized object. Since our approach is type based, the status information associated with each reference for the alias analysis is recorded as part of its type. 3 JVML i This section describes the JVML i language, a subset of JVML encompassing basic constructs and object initialization. The run-time environment for JVML i consists only of an operand stack and a finite set of local variables. A JVML i program will be a map from Addr to instructions drawn from the following list: instruction ::= push 0 inc pop if L store x load x new σ init σ use σ halt where x is a local variable name, σ is an object type, and L is an address of another instruction in the program. We refer the reader to Stata and Abadi for a description of those instructions not defined below: new σ: allocates a new, uninitialized object of type σ. init σ: initializes a previously uninitialized object of type σ. This represents calling the constructor of an object. use σ: performs an operation on an initialized object of type σ. This corresponds to several operations in JVML, including method invocation (invokevirtual), accessing an instance field (putfield/getfield), etc. 4 Operational and Static Semantics 4.1 Values and Types JVML i types are generated by the grammar: τ ::= Int σ ˆσ i Top where σ T, the set of valid object types. The type ˆσ i will be given to values resulting from allocating an object of type σ on line i of a program. The type Int will be used for the type of integers. One final type is Top. Any value of 3

P[pc] = new σ â Aˆσpc,Unused(â, f, s) P pc, f, s pc + 1, f, â s P[pc] = init σ â Aˆσj a A σ,unused(a, f, s) P pc, f, â s pc + 1, [a/â]f, [a/â]s P[pc] = use σ a A σ P pc, f, a s pc + 1, f, s Fig. 1. JVML i operational semantics. any type also has type Top. This type will represent unusable values in our static analysis. Each object and uninitialized object type has a corresponding infinite set of values which can be distinguished from values of any other type. For any object type σ, this set of values is A σ. Likewise, there is a set of values Aˆσ i for all uninitialized object types ˆσ i. The basic type rules for values are: v is a value v : Top n is an integer n : Int a A τ a : τ a Aˆτ i a : ˆτ i where τ T and i Addr. We also extend values and types to sequences: ǫ : ǫ a : τ s : α a s : τ α 4.2 Operational Semantics The bytecode interpreter for JVML i instructions is modeled using the standard framework of operational semantics. Each instruction is characterized by a transformation of machine states, where a machine state is a tuple of the form pc, f, s, which has the following meaning: pc is a program counter, indicating the address of the instruction that is about to be executed. f is a total map from Var, the set of local variables, to values. s is a stack of values representing the operand stack. Each bytecode instruction yields one or more rules in the operational semantics. These rules use the judgment P pc, f, s pc, f, s to indicate that a program P in state pc, f, s can move to state pc, f, s in one step. The one-step operational semantics for the instructions we have added to study object initialization are shown in Figure 1. 4

(new) P[i] = new σ σ T F i+1 = F i S i+1 = ˆσ i S i ˆσ i S i y Dom(F i ). F i [y] ˆσ i i + 1 Dom(P) F, S, i P (use) (init) P[i] = use σ σ T F i+1 = F i S i = σ S i+1 i + 1 Dom(P) F, S, i P P[i] = init σ σ T F i+1 = [σ/ˆσ j ]F i S i = ˆσ j α, j Dom(P) S i+1 = [σ/ˆσ j ]α i + 1 Dom(P) F, S, i P Fig. 2. Static semantics. The rules for object initialization use the additional judgment Unused, defined by (unused) a s y Var. f[y] a Unused(a, f, s) This will allow the virtual machine to pick a value that is currently not used by the program. The operational rules not presented here are discussed in [2]. These rules have been designed so that a step cannot be made from an illegal state, such as being at a pop statement when there is an empty stack. When a new object is created, a currently unused value of an uninitialized object type is placed on the stack. The type of that value is determined by the object type named in the instruction and the line number of the instruction. When the value for an uninitialized object is initialized by an init instruction, all occurrences of that value are replaced by a value corresponding to an initialized object. 4.3 Static Semantics A program P is well typed if there exist F and S such that F, S P. F is a map from Addr to functions mapping local variables to types. As described in [2], elements in a map over Addr are accessed as F i instead of F[i]. Thus, F i [y] is the type of local variable y at line i of a program. Likewise, S is a map from Addr to stack types such that S i is the type of the operand stack at location i of the program. The judgment which allows us to conclude 5

that a program P is well typed by F and S is F 1 = F Top S 1 = ǫ i Dom(P). F, S, i P (wt prog) F, S P where F Top is a function mapping all variables in Var to Top. The first two lines of (wt prog) constrain the initial conditions for the program s execution. The third line requires that each instruction in the program is well typed according to local judgments for each instruction. The rules for those instructions dealing with object initialization are presented in Figure 2. 4.4 Soundness We prove that any well-typed program will not get stuck due to an error. This notion is captured by our soundness theorem. Theorem 4.1 (Soundness) Given P, F, and S such that F, S P: pc, f 0, f, s. ( P 1, f0, ǫ pc, f, s pc, f, s. P pc, f, s pc, f, s P[pc] = halt In the initial state, the first instruction in the program is about to be executed, the operand stack is empty, and f 0 may map the local variables to any values. 5 Discussion Given the need to guarantee type safety for mobile Java code, developing correct type checking and analysis techniques for JVML is crucial. However, there is not an existing specification which fully captures how Java bytecodes must be type checked. We have built on the previous work of Stata and Abadi to develop such a specification by formulating a sound type system for a fairly complex subset of JVML. Although our model is still rather abstract, it has already proved effective as a foundation for examining both JVML and existing bytecode verifiers. Even without a complete object model or notion of an object heap, we have been able to study initialization, and also several extensions to the work presented here. In fact, a previously unpublished bug in Sun s verifier implementation was found as a result of the analysis performed while studying the soundness proofs for JVML i extended with subroutines. Acknowledgement Thanks to Martín Abadi and Raymie Stata (DEC SRC) for their assistance on this project. Also, we thank Frank Yellin and Sheng Liang for several 6 )

interesting discussions. References [1] Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification. Addison-Wesley, 1996. [2] Raymie Stata and Martín Abadi. A type system for Java bytecode subroutines. In Proc. 25th ACM Symposium on Principles of Programming Languages, January 1998. 7