Let us dene the basic notation and list some results. We will consider that stack eects (type signatures) form a polycyclic monoid (introduced in [NiP

Similar documents
A stack eect (type signature) is a pair of input parameter types and output parameter types. We also consider the type clash as a stack eect. The set

Stack effect calculus with typed wildcards, polymorphism and inheritance. Abstract

Intersection of sets *

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong.

Compiling Techniques

LOGIC AND DISCRETE MATHEMATICS

Static Program Analysis

Parallel Rewriting of Graphs through the. Pullback Approach. Michel Bauderon 1. Laboratoire Bordelais de Recherche en Informatique

1 Introduction One of the contributions of Java is in its bytecode verier, which checks type safety of bytecode for JVM (Java Virtual Machine) prior t

Java byte code verification

CS3110 Spring 2017 Lecture 10 a Module for Rational Numbers

CSCI-GA Scripting Languages

9/19/12. Why Study Discrete Math? What is discrete? Sets (Rosen, Chapter 2) can be described by discrete math TOPICS

Oak Intermediate Bytecodes

such internal data dependencies can be formally specied. A possible approach to specify

Optimizing Finite Automata

javac 29: pop 30: iconst_0 31: istore_3 32: jsr [label_51]

An Interesting Way to Combine Numbers

Associative Operations on a Three-Element Set

3.4 Deduction and Evaluation: Tools Conditional-Equational Logic

VM instruction formats. Bytecode translator

A brief overview of COSTA

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

Computing Fundamentals 2 Introduction to CafeOBJ

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

CSE 12 Abstract Syntax Trees

A Tour of Language Implementation

Java Class Loading and Bytecode Verification

Today. Instance Method Dispatch. Instance Method Dispatch. Instance Method Dispatch 11/29/11. today. last time

.Math 0450 Honors intro to analysis Spring, 2009 Notes #4 corrected (as of Monday evening, 1/12) some changes on page 6, as in .

Introduction to Computer Architecture

02 B The Java Virtual Machine

Dependent Object Types - A foundation for Scala's type system

Fuzzy logic. 1. Introduction. 2. Fuzzy sets. Radosªaw Warzocha. Wrocªaw, February 4, Denition Set operations

Natural Semantics [14] within the Centaur system [6], and the Typol formalism [8] which provides us with executable specications. The outcome of such

2.1 Sets 2.2 Set Operations

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

IT 201 Digital System Design Module II Notes

COMS W4115. Programming Languages and Translators. ASML: White Paper

Synchronization Expressions: Characterization Results and. Implementation. Kai Salomaa y Sheng Yu y. Abstract

CS2110 Fall 2011 Lecture 25. Under the Hood: The Java Virtual Machine, Part II

Lecture 5: The Halting Problem. Michael Beeson

Program Calculus Calculational Programming

Improving Scala's Safe Type-Level Abstraction

Programming Languages Third Edition

Properties of Regular Expressions and Finite Automata

Language Reference Manual simplicity

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Tutorial 1 CSC 201. Java Programming Concepts عؾادئماظربجمةمبادؿكدامماجلاصا

Principles of Programming Languages COMP251: Syntax and Grammars

Exercise 7 Bytecode Verification self-study exercise sheet

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations

Calculus I (part 1): Limits and Continuity (by Evan Dummit, 2016, v. 2.01)

CMa simple C Abstract Machine

Outline. 1 About the course

This book is licensed under a Creative Commons Attribution 3.0 License

2.2 Syntax Definition

Chapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

THE FREUDENTHAL-HOPF THEOREM

Math 302 Introduction to Proofs via Number Theory. Robert Jewett (with small modifications by B. Ćurgus)

How do you create a programming language for the JVM?

Java TM. Multi-Dispatch in the. Virtual Machine: Design and Implementation. Computing Science University of Saskatchewan

1.5 Part - 2 Inverse Relations and Inverse Functions

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Formal Semantics of Programming Languages

2. BOOLEAN ALGEBRA 2.1 INTRODUCTION

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

Algebra of Logic Programming Silvija Seres Michael Spivey Tony Hoare Oxford University Computing Laboratory Wolfson Building, Parks Road, Oxford OX1 3

Formal Semantics of Programming Languages

A fuzzy subset of a set A is any mapping f : A [0, 1], where [0, 1] is the real unit closed interval. the degree of membership of x to f

2 rd class Department of Programming. OOP with Java Programming

n n Try tutorial on front page to get started! n spring13/ n Stack Overflow!

Compiler construction 2009

LECTURE 17. Expressions and Assignment

Operational Semantics. One-Slide Summary. Lecture Outline

Thunks (continued) Olivier Danvy, John Hatcli. Department of Computing and Information Sciences. Kansas State University. Manhattan, Kansas 66506, USA

Hs01006: Language Features, Arithmetic Operators *

Type and Eect Systems via Abstract Interpretation. Jer^ome Vouillon. Pierre Jouvelot. CRI, Ecole des Mines de Paris. Abstract

CSC 4181 Handout : JVM

Basic concepts. Chapter Toplevel loop

The Further Mathematics Support Programme

Foundations of Databases

Run-time Program Management. Hwansoo Han

Distances between intuitionistic fuzzy sets

Experiences Implementing Efficient Java Thread Serialization, Mobility and Persistence

Two Problems - Two Solutions: One System - ECLiPSe. Mark Wallace and Andre Veron. April 1993

Principles of Programming Languages

The Java Virtual Machine. CSc 553. Principles of Compilation. 3 : The Java VM. Department of Computer Science University of Arizona

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

FROM GROUPOIDS TO GROUPS

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Semantic Analysis. Lecture 9. February 7, 2018

Green s relations on the partition monoid and several related monoids

I. An introduction to Boolean inverse semigroups

point in worrying about performance. The goal of our work is to show that this is not true. This paper is organised as follows. In section 2 we introd

Recursive Types and Subtyping

CSCI B522 Lecture 11 Naming and Scope 8 Oct, 2009

Transcription:

Validation of Stack Eects in Java Bytecode Jaanus Poial Institute of Computer Science University of Tartu, Estonia e-mail: jaanus@cs.ut.ee February 21, 1997 Abstract The Java language is widely used in networked world to distribute the platform independent software pieces in form of bytecode for Java Virtual Machine (JVM). Java compilers put a lot of emphasis on early checking for possible problems, runtime checking, and eliminating situations that are error prone. On the other hand, the problem of independent validation of Java bytecode "at receivers end" is still an actual issue for several reasons { security, native code generation/optimisation, etc. Current paper is an attempt to apply the stack eect calculus (originally designed by the author for the Forth programming language) to JVM operand stack manipulations using interface descriptions of JVM instructions. This approach may be used for bytecode validation against illegal stack manipulations performing some analysis of the code instead of executing it. Also it is possible to apply this theory to the bytecode compiler itself to check that no illegal programs in sense of operand stack usage will be generated. Stack eects determine some syntactic rules for stack machine programs. In the last section of the paper the relationship between stack eects and syntactic equations (general rewriting rules) is investigated for one particular case. 1 Stack eects The Java virtual machine [JVM95] is an abstract device for running Java bytecode { the new more or less "common" form of distributing software over the net. This code may be interpreted by some JVM engine or converted to native machine code in which case the programs run considerably faster. The quality of the bytecode itself is also very important if portability to wide range of computers is considered. JVM is a stack machine and it is quite natural to apply some general techniques of compilation and optimisation for stack machines to the bytecode. There are many dierences between Java and earlier approaches (e.g. UNCOL, p-code, DIANA, Forth language/machine, etc.) but also many aspects in common. In this paper we will concentrate on parameter passing through the operand stack of JVM using the so called stack eect calculus. Stack eect checking is a part of bytecode validation that may help in stage of debugging the compiler or other tools, bytecode-level optimisation, preprocessing programs which come of untrusted source, etc. The main goal of stack eect calculus is static type checking of stack machine programs (and tools that generate such programs). Types, subtypes, "wildcard" types and rules for calculating resulting stack eects for dierent constructs have been introduced in [Poi90], [Poi91], [StK93] and [Poi94].

Let us dene the basic notation and list some results. We will consider that stack eects (type signatures) form a polycyclic monoid (introduced in [NiP70]). This holds for simple signatures (uniquely dened stack eects without "wildcards" and subtyping [Poi90]). This also holds for compound signatures (multiple stack eects [Poi91]) with essential restrictions. Set of multiple stack eects in general is not even an inverse semigroup but there always exists a subset which is a polycyclic monoid. For simplicity we use the set of simple signatures everywhere because it represents some common properties of all inverse semigroups ([ClP67] introduces the basic theory of semigroups). Let types be denoted by a, b, c... In real programs these can be int, long, float, double, object, etc. { we do not need any concrete interpretation yet. Let T be the set of types. We will use,,... for type lists. These are nite sequences of types where the rightmost element corresponds to the top of the stack. The set of such lists is T. A type clash appears when some stack operation nds an input argument on the stack which has the unexpected (incompatible) type. We will use the symbol ; for the type clash. A stack eect is a pair of input parameter types and output parameter types. We also consider the type clash as a stack eect. The set of stack eects is dened as follows S = (T T ) [ f;g We use s, t, u... for stack eects as well as ( ) for the pair (; ) 2 T T. Sometimes we use indices to express inputs and outputs s = (s 1 s 2 ) where s 1 ; s 2 2 T The composition (multiplication) of stack eects is dened as follows 8s 2 S : s ; = ; s = ; 8s 1 ; s 2 ; t 1 ; t 2 ; ; 2 T : (s 1 s 2 )(s 2 t 2 ) = (s 1 t 2 ) (s 1 t 1 )(t 1 t 2 ) = (s 1 t 2 ) In all other cases the result will be ; : 1 = ( ) is a unity for this operation: We have an algebraic structure now which is isomorphic to the polycyclic monoid. It has a unity 1, a null element ; and an associative operation of multiplication. Example 1 Let us calculate some products using the denitions above. (a bc)(c de) = (a bde) (ab c)(dc e) = (dab e) (a b)(bc d) = ; ( ab)(ab ) = 1 (a b)(cb d) = (ca d) (ca cb)(cb d) = (ca d) Let us dene the inverse element for any s 2 S in the following way s = ; ) s 1 = ;, i.e. ; 1 = ; s = (s 1 s 2 ) ) s 1 = (s 2 s 1 ), i.e. (s 1 s 2 ) 1 = (s 2 s 1 ) This denition introduces a unique inverse element for each stack eect and allows to dene the partial order relation as follows ; s for any s 2 S and ( ) ( ) for each ; ; 2 T

It is equivalent to the classical denition s t, st 1 = ss 1 and for non-zero eects the following equivalence holds ( s 1 )(t 1 t 2 )(s 2 ) = 1, (s 1 s 2 ) (t 1 t 2 ) All idempotents of S, i.e. elements u for which u = uu, form a commutative subsemigroup of S with unity and null element. Non-zero idempotents have a form of ( ), where 2 T. Having these basic denitions we can return to the question about multiple stack eects (compound signatures). Subset M 2 S, where 2 S denotes the powerset (set of all subsets) of S, is an inverse semigroup, i 1) ; 2 M ) 2 M (M is a subsemigroup), 2) 8 2 M 9 2 M : = (all elements are regular), 3) ; 2 M; = ; = ) = (all idempotents commutate). It is not only a question of how to dene M, but also how to dene the multiplication (and addition). 2 Validation of stack machine programs Let us have a set of stack operations. We can build programs by writing sequences of stack operations (let us forget about control transfer instructions at the beginning). The set of all "programs" (including these which make no sense) is. Each operation p 2 has a given stack eect sig(p) 2 S. Mapping sig :! S is dened as homomorphism sig(empty program) = 1, sig(pq) = sig(p)sig(q). Now it is possible to calculate the stack eect of a given program simply by multiplying stack eects of its parts (notice that we need associativity and homomorphism to do this). The set and homomorphism sig determine a language of valid programs (programs without type clash) V alid(; sig) = f! 2 : sig(!) 6= ;g In some cases a subset of valid programs without input and output parameters is considered Closed(; sig) = f! 2 : sig(!) = 1g Obviously empty program 2 Closed V alid There are dierent ways to treat the correctness of programs with regard to the stack eects. Exact matching when the calculated eect has to be equal to the desired one. If the desired eect is 1 (no inputs, no outputs) we have a closed program. "Operational matching" when the desired eect may be less (in sense of our partial order relation) than the calculated one. We see from calculations that the program operates correctly but does not use all (bottom) elements on the stack. We do not have any idea about desired eects, we only want to have valid programs.

Example 2 If the calculation gives sig(p) = (bc bd) then p is type correct w.r.t. (bc bd) in sense of all denitions. p is not type correct w.r.t. (abc abd) in sense of rst denition, but satises the second one, because (abc abd) (bc bd). p simply does not use the element a. p is not type correct w.r.t. (c d), but is still valid. The reason here is that we cannot put p into context where for example ac is on the top of the stack. It leads to the type clash between a (in context) and b (actually used by program p). At the same time (c d) works well in this context and we will not notice the error. If the programs are generated by a context free grammar then it is possible to guarantee their type correctness by checking the grammar (see [Poi90] for the details). We need to bind an inequality to each grammar rule and to solve the system of inequalities in S. Example 3 Let us dene and sig as follows sig(iconst) = ( i) sig(fconst) = ( f) sig(fnewarray) = (i a) sig(astore) = (a ) sig(aload) = ( a) sig(fstore) = (f ) sig(fload) = ( f) sig(iadd) = (i i i) sig(fadd) = (f f f) sig(fastore) = (a i f ) sig(faload) = (a i f) sig(arraylength) = (a i) The grammar on the left produces the system of inequalities on the right. 1 S <S >! <Stms> S Stms <Stms>! <Stm>! <Stms> <Stm> Stms Stm Stms Stms Stm <Stm>! <A> astore Stm A (a )! <A> <I > <F > fastore Stm! <F > fstore Stm A I F (a i f F (f ) ) <A>! <I > fnewarray A I (i a)! aload A ( a) <I >! <Ia> I Ia! <I > <Ia> iadd I <Ia>! iconst Ia I Ia (i i i) ( i)! <A> arraylength Ia A (a i)! <I > Ia <F >! <F a> F I Fa! <F > <F a> fadd F <F a>! fconst Fa F Fa (f f ( f) f)! <A> <I > faload Fa A I (a i f)! fload Fa! <F > Fa ( f) F This grammar is an output grammar of some syntax directed translation scheme which has a normal reduced input grammar with all the "syntactic sugar".

The system has a solution S = Stms = Stm = ( ) = 1, A = ( a), I = Ia = ( i) and F = Fa = ( f). The grammar produces only closed programs, but not all closed programs. For example, aload iconst fnewarray astore iconst fconst fastore is a closed program not generated by this grammar. We do not need concrete interpretation of stack operations in this paper but for JVM previous program may look like aload 0 iconst 5 newarray <T FLOAT> astore 1 iconst 1 fconst 2 fastore 3 Stack eects and syntactic equations We already dened two non-empty languages V alid(; sig) and Closed(; sig). Formally this is enough to dene a language, but we need to be convinced in usefulness of such a denition. One the one hand, we can calculate the stack eect of a program to decide whether it belongs to the language (no more than semigroup is needed to do this). On the other hand, we need a lot of algebraic properties of the polycyclic monoid to transform the "stack eect syntax notation" to some other form. That is the reason why we still consider that stack eects form at least an inverse semigroup. In this work we are not interested in (very important) questions of concrete type systems, rules for control transfer, etc. Our interest is concentrated on binding the stack eect calculus to methods of syntax description. We will show that the stack eect calculus and general rewriting rules (syntactic equations) are equivalent in particular cases. Example 4 Let us have and sig of example 3 and the language Closed(; sig). We will show that the following syntactic equations dene Closed(; sig) identically (the grammar above dened only a subset of the language). (1) empty program = aload astore (2) empty program = aload iconst fconst fastore (3) empty program = fload fstore (4) aload = iconst fnewarray (5) fnewarray = fnewarray astore aload (6) iadd = fnewarray astore fnewarray astore iconst (7) arraylength = astore iconst (8) fadd = fstore fstore fload (9) fconst = fload (10) faload = fconst fastore fconst This set of equations is not unique, we could choose dierent ones. First part is obvious { we can easily check that equations between stack eects hold for a given set. To prove that the languages coincide we do some more { we forget about and examine the equations. If we get the same (actually isomorphic)

proceeding from the equations then the goal is reached ( determines the equations and equations determine the same ). Let us try to solve the following system in S (1) ( ) = (s 1 s 2 ) (r 1 r 2 ) (2) ( ) = (s 1 s 2 ) (m 1 m 2 ) (p 1 p 2 ) (x 1 x 2 ) (3) ( ) = (u 1 u 2 ) (t 1 t 2 ) (4) (s 1 s 2 ) = (m 1 m 2 ) (q 1 q 2 ) (5) (q 1 q 2 ) = (q 1 q 2 ) (r 1 r 2 ) (s 1 s 2 ) (6) (v 1 v 2 ) = (q 1 q 2 ) (r 1 r 2 ) (q 1 q 2 ) (r 1 r 2 ) (m 1 m 2 ) (7) (z 1 z 2 ) = (r 1 r 2 ) (m 1 m 2 ) (8) (w 1 w 2 ) = (t 1 t 2 ) (t 1 t 2 ) (u 1 u 2 ) (9) (p 1 p 2 ) = (u 1 u 2 ) (10) (y 1 y 2 ) = (p 1 p 2 ) (x 1 x 2 ) (p 1 p 2 ) First equation gives s 1 =, r 2 = and r 1 = s 2, where denotes the empty sequence. Let us dene = r 1 ( = s 2 ) for the future use. Now s = ( ) and r = ( ). Fifth equation (together with facts we already know) allows to deduce that 9 2 T : q 2 =. Fourth equation can now be expressed as ( ) = (m 1 m 2 ) (q 1 ) that gives us m 1 =, 9 2 T : = and m 2 = q 1. Consequently, = = and we can dene = m 2 = q 1. Now m = ( ) and q = ( ). Equations (6) and (7) dene v = ( ) and z = ( ). Third equation is analogous to the (1): we have u 1 = t 2 = and we introduce a new sequence = u 2 = t 1. Now u = ( ) and t = ( ). Again, equations (8) and (9) allow direct calculations: w = ( ) and p = ( ). Now it is time to use the second equation for x = (smp) 1 = p 1 m 1 s 1 that gives us x = ( ). Finally, from (10) we get y = ( ). Let us sum up the result. m = sig(iconst) = ( ) p = sig(fconst) = ( ) q = sig(fnewarray) = ( ) r = sig(astore) = ( ) s = sig(aload) = ( ) t = sig(fstore) = ( ) u = sig(fload) = ( ) v = sig(iadd) = ( ) w = sig(fadd) = ( ) x = sig(fastore) = ( ) y = sig(faload) = ( ) z = sig(arraylength) = ( ) We still have three independent variables ; ; 2 T here and we have used all the equations. The result is isomorphic to the denitions in example 3. As we have learned from this example the syntactic equations may determine stack eects as well as stack eects determine the syntax. At the same time it is not obvious how to choose such equations.

L i t e r a t u r e 1. [ClP67] Cliord A.H., Preston G.B. The algebraic theory of semigroups. Rhode Island, 1967. 2. [JVM95] The Java Virtual Machine Specication. Sun Microsystems, March 15, 1995, 74 pp. 3. [NiP70] Nivat M., Perrot J.F. Une generalisation du monode bicyclique. C.R.Acad.Sci. Paris, 271A, 1970, 824 { 827. 4. [Poi90] Poial J. Algebraic Specications of Stack-eects for Forth Programs. 1990 FORML Conference Proceedings, EuroFORML'90 Conference, Oct 12 { 14, 1990, Ampeld, Nr Romsey, Hampshire, UK, Forth Interest Group, Inc., San Jose, USA, 1991, 282 { 290. 5. [Poi91] Poial J. Multiple Stack-eects of Forth Programs. 1991 FORML Conference Proceedings, euroforml'91 Conference, Oct 11 { 13, 1991, Marianske Lazne, Czechoslovakia, Forth Interest Group, Inc., Oakland, USA, 1992, 400 { 406. 6. [Poi94] Poial J. Forth and Formal Language Theory. EuroForth'94, Nov 4 { 6, 1994, Winchester, UK, 1994, 47 { 52. 7. [StK93] Stoddart B., Knaggs P. Type Inference in Stack Based Languages. Formal Aspects of Computing, BCS, 1993, 5, 289 { 298.