Recovering Erlang AST from BEAM bytecode

Size: px
Start display at page:

Download "Recovering Erlang AST from BEAM bytecode"

Transcription

1 Recovering Erlang AST from BEAM bytecode Dániel Lukács and Melinda Tóth Eötvös Loránd University HU, Faculty of Informatics This research has been supported by the Hungarian Government through the New National Excellence Program of the Ministry of Human Capacities.

2 Motivation Problem statement lib8.beam lib8.beam lib9.beam lib9.beam lib2.erl lib2.erl lib3.erl lib7.beam lib.erl lib6.beam lib3.erl lib7.beam lib.erl lib6.beam mymodule.erl lib5.beam mymodule.erl lib5.beam lib4.erl lib4.erl lib4.beam lib4.beam

3 Goal Problem statement Enable the RefactorErl static analysis system for Erlang to recover information from source dependencies stored as Erlang BEAM bytecode.

4 RefactorErl Problem statement RefactorErl Static analysis framework for Erlang 26, ELTE Faculty of Informatics, Department of Programming Languages And Compilers Functionality: Semantic queries Refactorings Dependency analysis Code metrics, investigation, duplicated code analysis, and more. Web interface, GUI, CLI

5 RefactorErl Problem statement %% Rectangular function rect (T) when abs ( T) < ; abs (T) =:= -> ; rect (_) ->. { function, rect,, 6}. { label,5}. {line,[{ location," example. erl ",}]}. { func_info,{ atom, example },{atom,rect },}. { label,6}. { gc_bif,abs,{f,7},,[{x,}],{x,}}. {test, is_ge,{f,8},[{x,},{ integer,}]}. { label,7}. { gc_bif,abs,{f,9},,[{x,}],{x,}}. {test, is_eq_exact,{f,9},[{x,},{ integer,}]}. { label,8}. {move,{ integer,},{x,}}. return. { label,9}. {move,{ integer,},{x,}}. return.

6 Machine code vs. BEAM Problem statement Machine code Low-level (CPU) Heavily optimized code Complicated loop constructs Dynamic memory allocation Indirect addressing Stack frames Manual process control No metainformation about modules and functions BEAM bytecode High-level (VM) Simplified code Loops always correspond to specific Erlang constructs Managed memory allocation with garbage collection Registers, index displacement Special registers for local variables Process scheduling managed by VM Stores metainformation about modules and functions

7 Goal (revision) Problem statement Enable the RefactorErl static analysis system for Erlang to recover information from source dependencies stored as Erlang BEAM bytecode. Approach: Represent recovered BEAM semantical information in Erlang syntax. Ready for RefactorErl Results can be compared with the original Existing decompilation techniques can be used

8 What is the problem? Problem statement -module(event_handler). -export([handle_event/]). handle_event(event) -> case Event of ok -> done; {message, _} -> done; _ -> unknown_event end. lib8.beam lib3.erl lib2.erl lib.erl lib9.beam lib7.beam lib6.beam {module, event_handler}. %% version = {exports, [{handle_event,},{module_info,},{module_info,}]}. {attributes, []}. {labels, }. {function, handle_event,, 2}. {label,}. {line,[{location,"event_handler.erl",4}]}. {func_info,{atom,event_handler},{atom,handle_event},}. {label,2}. {test,is_tuple,{f,3},[{x,}]}. {test,test_arity,{f,5},[{x,},2]}. {get_tuple_element,{x,},,{x,}}. {test,is_eq_exact,{f,5},[{x,},{atom,message}]}. {jump,{f,4}}. {label,3}. {test,is_eq_exact,{f,5},[{x,},{atom,ok}]}. {label,4}. {move,{atom,done},{x,}}. return. {label,5}. {move,{atom,unknown_event},{x,}}. return. mymodule.erl lib5.beam lib4.erl lib4.beam

9 Is it starightforward? NO! Problem statement -module(event_handler). -export([handle_event/]). handle_event(event) -> case Event of ok -> done; {message, _} -> done; _ -> unknown_event end. {entry,{event_handler,handle_event,}} procedure handle_event [{x,}] at 2: 4 2: 5 if not is_tuple [{x,}] then goto 3; 6 if not test_arity [{x,},2] then goto 5; 7 {x,} := {x,}[]; 8 if not is_eq_exact [{x,},{atom,message}] then goto 5; 3: if not is_eq_exact [{x,},{atom,ok}] then goto 5; lib8.beam lib9.beam lib2.erl lib7.beam 5 5: 6 {x,} := {atom,unknown_event}; 7 return {x,}; 2 4: 3 {x,} := {atom,done}; 4 return {x,}; lib6.beam lib3.erl lib.erl {exit,{event_handler,handle_event,}} mymodule.erl lib5.beam lib4.erl lib4.beam

10 Decompiler workflow Disassembled BEAM Intermediate representation (IR) Control flow graph (CFG) Static single assignment (SSA) Structural analysis Extended A-normal form (ANF) Erlang syntax tree

11 Disassambled BEAM {module, event_handler}. %% version = {exports, [{handle_event,},{module_info,},{module_info,}]}. {attributes, []}. {labels, }. Disassemblers shipped with Erlang/OTP: beam disasm module erlc -S {function, handle_event,, 2}. {label,}. {line,[{location,"event_handler.erl",4}]}. {func_info,{atom,event_handler},{atom,handle_event},}. {label,2}. {test,is_tuple,{f,3},[{x,}]}. {test,test_arity,{f,5},[{x,},2]}. {get_tuple_element,{x,},,{x,}}. {test,is_eq_exact,{f,5},[{x,},{atom,message}]}. {jump,{f,4}}. {label,3}. {test,is_eq_exact,{f,5},[{x,},{atom,ok}]}. {label,4}. {move,{atom,done},{x,}}. return. {label,5}. {move,{atom,unknown_event},{x,}}. return.

12 Intermediate Representation Disassembled BEAM Intermediate representation (IR) Control flow graph (CFG) Static single assignment (SSA) Structural analysis Extended A-normal form (ANF) Explicit syntax Abstraction Internal model Erlang syntax tree {move,{atom,done},{x,}}. {x,} := {atom,done}; {call_ext_only,,{extfunc,erlang,get_module_info,}}. {x,} := {atom,event_handler}; {x,} := call erlang:get_module_info/ [{x,}]; procedure handle_event [{x,}] at 2: : throw {function_clause,{atom,event_handler},{atom,handle_event},}; 2: if not is_tuple [{x,}] then goto 3; if not test_arity [{x,},2] then goto 5; {x,} := {x,}[]; if not is_eq_exact [{x,},{atom,message}] then goto 5; goto 4; 3: if not is_eq_exact [{x,},{atom,ok}] then goto 5; 4: {x,} := {atom,done}; return {x,}; 5: {x,} := {atom,unknown_event}; return {x,};

13 Intermediate Representation An internal model of BEAM semantics. Some BEAM instructions has implicit semantics. Making it explicit reduces possibility for implementation errors. Abstraction layer BEAM semantics is not formally documented BEAM semantics may change

14 Control Flow Graph (CFG) Disassembled BEAM Intermediate representation (IR) Control flow graph (CFG) Static single assignment (SSA) Structural analysis Extended A-normal form (ANF) {entry,{event_handler,handle_event,}} Erlang syntax tree procedure handle_event [{x,}] at 2: Equivalent graph representation of the program flow Consists of blocks (nodes) and flows (edges). Base of further analyses Dominator Static Single Assignment Structuring Simple transformations 4 2: 5 if not is_tuple [{x,}] then goto 3; 6 if not test_arity [{x,},2] then goto 5; 7 {x,} := {x,}[]; 3: 8 if not is_eq_exact [{x,},{atom,message}] then goto 5; if not is_eq_exact [{x,},{atom,ok}] then goto 5; 5 5: 2 4: 6 {x,} := {atom,unknown_event}; 3 {x,} := {atom,done}; 7 return {x,}; 4 return {x,}; {exit,{event_handler,handle_event,}}

15 Static Single Assignment (SSA) Disassembled BEAM Intermediate representation (IR) Control flow graph (CFG) Static single assignment (SSA) Structural analysis Extended A-normal form (ANF) Erlang syntax tree Each symbol is defined only once Calculated based on dominator information SSA Functional representation x := 2 x := f(x)... x# := 2 x#2 := x#3 := φ(x#, x#2) f(x#3)... Cytron, Ron; Ferrante, Jeanne; Rosen, Barry K.; Wegman, Mark N. & Zadeck, F. Kenneth (99), Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems.

16 Structuring 2 Disassembled BEAM Intermediate representation (IR) Control flow graph (CFG) Static single assignment (SSA) Structural analysis Extended A-normal form (ANF) Erlang syntax tree Identifying high level control structures Region: one entry and exit point Nested regions 2 Cifuentes C. (996), Structuring decompiled graphs. In: Gyimóthy T. (eds) Compiler Construction. CC 996. Lecture Notes in Computer Science, vol 6. Springer, Berlin, Heidelberg

17 Structuring 3 {entry,{event_handler,handle_event,}} procedure handle_event [{x,}] at 2: 4 2: 5 if not is_tuple [{x,}] then goto 3; 6 if not test_arity [{x,},2] then goto 5; 7 {x,} := {x,}[]; 3: 8 if not is_eq_exact [{x,},{atom,message}] then goto 5; if not is_eq_exact [{x,},{atom,ok}] then goto 5; 5 5: 2 4: 6 {x,} := {atom,unknown_event}; 3 {x,} := {atom,done}; 7 return {x,}; 4 return {x,}; Unstructured control flow goto No decomposition Pattern based analysis {exit,{event_handler,handle_event,}} 3 Cifuentes C. (996), Structuring decompiled graphs. In: Gyimóthy T. (eds) Compiler Construction. CC 996. Lecture Notes in Computer Science, vol 6. Springer, Berlin, Heidelberg

18 Structuring 5 Graph Rewriting (TGR) 4 Formal semantics and provable properties Expressive, visualisable tmo_clause: Var var4 recv_tmout (x) x: Block Var var3 Var var4 RecvHead recv_tmo_edge: RecvTmoutClause Var tmo_amount wait_edge: Edge Var var Success true wait_back: Var var4 y: Block Var var9 Var var {test,is_timeout_bit_not_set,[]} tmo_edge: Edge Var var2 Success false 4 Ehrig, Hartmut and Pfender, Michael and Schneider, Hans-Jürgen. Graph-grammars: An algebraic approach, Switching and Automata Theory, 973. SWAT 8. IEEE Conference Record of 4th Annual Symposium on, pp. 67-8, 973, IEEE

19 Structuring 6 Erlang branching structures bool_andalso (left_op) left_op: Block Var lpars Var left_body Var ltest Block Var lpars Var left_body AndExpr Var Var right_body Var rtest left2right: Edge Var var2 Success true clause_group (x) x: Var var68 e: Edge Var var57 If Var e_test e3: Edge [] If AndExpr y: Var e_test Var var69 Var e2_test {entry,{event_handler,handle_event,}} procedure handle_event [{x,}] at 2: 4 2: 5 if not is_tuple [{x,}] then goto 3; 6 if not test_arity [{x,},2] then goto 5; 7 {x,} := {x,}[]; 3: 8 if not is_eq_exact [{x,},{atom,message}] then goto 5; if not is_eq_exact [{x,},{atom,ok}] then goto 5; new_left2other: Edge [] Success true right_op: Block Var var9 Var right_body Var rtest left2join: Edge Var var2 Success false e2: Edge Var var58 If Var e2_test 5 5: 2 4: 6 {x,} := {atom,unknown_event}; 3 {x,} := {atom,done}; 7 return {x,}; 4 return {x,}; right2other: right2join: Edge Edge Var var22 Var var23 Success true Success false z: Var var67 {exit,{event_handler,handle_event,}} other_branch: Var var46 join_branch: Var var45 6 Cifuentes C. (996), Structuring decompiled graphs. In: Gyimóthy T. (eds) Compiler Construction. CC 996. Lecture Notes in Computer Science, vol 6. Springer, Berlin, Heidelberg

20 Structuring 7 Context-analysis tmo_clause: Var var4 recv_tmout (x) x: Block Var var3 Var var4 RecvHead recv_tmo_edge: RecvTmoutClause Var tmo_amount wait_edge: Edge Var var Success true wait_back: Var var4 y: Block Var var9 Var var {test,is_timeout_bit_not_set,[]} tmo_edge: Edge Var var2 Success false {entry,{event_handler,handle_event,}} procedure handle_event [{x,}] at 2: 4 2: 5 if not is_tuple [{x,}] then goto 3; 6 if not test_arity [{x,},2] then goto 5; 7 {x,} := {x,}[]; 3: 8 if not is_eq_exact [{x,},{atom,message}] then goto 5; if not is_eq_exact [{x,},{atom,ok}] then goto 5; 5 5: 2 4: 6 {x,} := {atom,unknown_event}; 3 {x,} := {atom,done}; 7 return {x,}; 4 return {x,}; {exit,{event_handler,handle_event,}} 7 Cifuentes C. (996), Structuring decompiled graphs. In: Gyimóthy T. (eds) Compiler Construction. CC 996. Lecture Notes in Computer Science, vol 6. Springer, Berlin, Heidelberg

21 Structuring 8 Identifying regions continuation_edge (head) e2: pdom follow: Var var4 e: e3: dom continuation head: Var var3 if A then (f X) else (f X2) let X3 = if A then else in (f X3) X X2 8 Cifuentes C. (996), Structuring decompiled graphs. In: Gyimóthy T. (eds) Compiler Construction. CC 996. Lecture Notes in Computer Science, vol 6. Springer, Berlin, Heidelberg

22 Structuring 9 {entry,{event_handler,handle_event,}} {entry,{event_handler,handle_event,}} procedure handle_event [{x,}] at 2: 4 2: 5 if not is_tuple [{x,}] then goto 3; 6 if not test_arity [{x,},2] then goto 5; [] {defun_expr,handle_event,[{var,{x,},}],block_end} {cont,step} {upon, {andalso_expr, {test,is_tuple,[{var,{x,},}]}, {andalso_expr, {test,test_arity,[{var,{x,},},2]}, {let_expr, [] {bind_expr,{var,{x,},},{indexed,{var,{x,},},}}, {test,is_eq_exact,[{var,{x,},},{atom,message}]}, block_end []}}}} clause_group {upon,{test,is_tuple,[{var,{x,},}]}} else {upon,{test,is_eq_exact,[{var,{x,},},{atom,ok}]}} 7 {x,} := {x,}[]; 8 if not is_eq_exact [{x,},{atom,message}] then goto 5; 5 5: 6 {x,} := {atom,unknown_event}; 7 return {x,}; 3: if not is_eq_exact [{x,},{atom,ok}] then goto 5; 2 4: 3 {x,} := {atom,done}; 4 return {x,}; [{var,{x,},2}] {let_expr,{bind_expr,{var,{x,},2},{atom,done}},{var,{x,},2},[]} {return,{var,{x,},2}} [{var,{x,},4}] {let_expr,{bind_expr,{var,{x,},4},{atom,unknown_event}},{var,{x,},4},[]} {return,{var,{x,},4}} {exit,{event_handler,handle_event,}} [{{x,},2},{{x,},2}] [{var,{x,},3},{var,{x,},3}] [{{x,},4},{{x,},4}] 9 Cifuentes C. (996), Structuring decompiled graphs. In: Gyimóthy T. (eds) Compiler Construction. CC 996. Lecture Notes in Computer Science, vol 6. Springer, Berlin, Heidelberg

23 A-Normal Form Functional-style program representation Translates out of SSA FUN handle_event [x#] = IF WHEN ANDALSO {test,is_tuple,[x#]} ANDALSO {test,test_arity,[x#,2]} LET x# = {indexed,x#,} IN {test,is_eq_exact,[x#,{atom,message}]} -> LET x#2 = {atom,done} IN x#2 WHEN {test,is_eq_exact,[x#,{atom,ok}]} -> LET x#2 = {atom,done} IN x#2 WHEN true -> LET x#4 = {atom,unknown_event} IN x#4 Manuel M.T. Chakravarty, Gabriele Keller, and Patryk Zadarnowski (24), A Functional Perspective on SSA Optimisation Algorithms. Electronic Notes in Theoretical Computer Science, Volume 82, Issue 2, April 24, Pages

24 Code generation Disassembled BEAM Intermediate representation (IR) Control flow graph (CFG) Static single assignment (SSA) Structural analysis Extended A-normal form (ANF) -module(event_handler). -export([handle_event/]). Erlang syntax tree handle_event(x_) -> if (is_tuple(x_) andalso (size(x_) =:= 2 andalso element(, X_) =:= message)) -> X_2 = done, X_2; X_ =:= ok -> X_2 = done, X_2; true -> X_4 = unknown_event, X_4 end. -module(event_handler). -export([handle_event/]). handle_event(event) -> case Event of ok -> done; {message, _} -> done; _ -> unknown_event end.

25 DEMO Conclusions DEMO

26 DEMO Conclusions Module Funs Blocks Total GR Referl Others mnesia event s 22.8 s 27.2 s.734 s mnesia text s s 43.8 s.84 s mnesia index s s 2.8 s.928 s

27 Future Notes Conclusions Pre-structuring List-comprehension Fun-expression Full Erlang coverage Catch patterns Binaries Extended language support (Elixir)

Semantics of an Intermediate Language for Program Transformation

Semantics of an Intermediate Language for Program Transformation Semantics of an Intermediate Language for Program Transformation Sigurd Schneider Advisors: Sebastian Hack, Gert Smolka Student Session at POPL 2013 Saarland University Graduate School of Computer Science

More information

SSA, CPS, ANF: THREE APPROACHES

SSA, CPS, ANF: THREE APPROACHES SSA, CPS, ANF: THREE APPROACHES TO COMPILING FUNCTIONAL LANGUAGES PATRYK ZADARNOWSKI PATRYKZ@CSE.UNSW.EDU.AU PATRYKZ@CSE.UNSW.EDU.AU 1 INTRODUCTION Features: Intermediate Representation: A programming

More information

Static Single Information from a Functional Perspective

Static Single Information from a Functional Perspective Chapter 1 Static Single Information from a Functional Perspective Jeremy Singer 1 Abstract: Static single information form is a natural extension of the well-known static single assignment form. It is

More information

A Functional Perspective on SSA Optimisation Algorithms

A Functional Perspective on SSA Optimisation Algorithms COCV 03 Preliminary Version A Functional Perspective on SSA Optimisation Algorithms Manuel M. T. Chakravarty 1, Gabriele Keller 1 and Patryk Zadarnowski 1 School of Computer Science and Engineering University

More information

Lecture Notes on Static Single Assignment Form

Lecture Notes on Static Single Assignment Form Lecture Notes on Static Single Assignment Form 15-411: Compiler Design Frank Pfenning Lecture 6 September 12, 2013 1 Introduction In abstract machine code of the kind we have discussed so far, a variable

More information

Intermediate representations of functional programming languages for software quality control

Intermediate representations of functional programming languages for software quality control Intermediate representations of functional programming languages for software quality control Melinda Tóth, Gordana Rakić Eötvös Loránd University Faculty of Informatics University of Novi Sad Chair of

More information

Control Flow Analysis

Control Flow Analysis Control Flow Analysis Last time Undergraduate compilers in a day Today Assignment 0 due Control-flow analysis Building basic blocks Building control-flow graphs Loops January 28, 2015 Control Flow Analysis

More information

Intermediate Representations

Intermediate Representations Most of the material in this lecture comes from Chapter 5 of EaC2 Intermediate Representations Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP

More information

Detecting and Visualising Process Relationships in Erlang

Detecting and Visualising Process Relationships in Erlang Procedia Computer Science Volume 29, 2014, Pages 1525 1534 ICCS 2014. 14th International Conference on Computational Science Melinda Tóth and István Bozó Eötvös Loránd University, Budapest, Hungary {tothmelinda,

More information

Lecture Notes on Static Single Assignment Form

Lecture Notes on Static Single Assignment Form Lecture Notes on Static Single Assignment Form 15-411: Compiler Design Frank Pfenning, Rob Simmons Lecture 10 October 1, 2015 1 Introduction In abstract machine code of the kind we have discussed so far,

More information

Static Single Information from a Functional Perspective

Static Single Information from a Functional Perspective Chapter 5 Static Single Information from a Functional Perspective Jeremy Singer 1 Abstract Static single information form is a natural extension of the well-known static single assignment form. It is a

More information

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev Fall 2015-2016 Compiler Principles Lecture 6: Intermediate Representation Roman Manevich Ben-Gurion University of the Negev Tentative syllabus Front End Intermediate Representation Optimizations Code Generation

More information

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into Intermediate representations source code front end ir opt. ir back end target code front end produce an intermediate representation (IR) for the program. optimizer transforms the code in IR form into an

More information

Intermediate representation

Intermediate representation Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing HIR semantic analysis HIR intermediate code gen.

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Intermediate Representation (IR)

Intermediate Representation (IR) Intermediate Representation (IR) Components and Design Goals for an IR IR encodes all knowledge the compiler has derived about source program. Simple compiler structure source code More typical compiler

More information

Three-address code (TAC) TDT4205 Lecture 16

Three-address code (TAC) TDT4205 Lecture 16 1 Three-address code (TAC) TDT4205 Lecture 16 2 On our way toward the bottom We have a gap to bridge: Words Grammar Source program Semantics Program should do the same thing on all of these, but they re

More information

Fall Compiler Principles Lecture 5: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 5: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev Fall 2016-2017 Compiler Principles Lecture 5: Intermediate Representation Roman Manevich Ben-Gurion University of the Negev Tentative syllabus Front End Intermediate Representation Optimizations Code Generation

More information

Data structures for optimizing programs with explicit parallelism

Data structures for optimizing programs with explicit parallelism Oregon Health & Science University OHSU Digital Commons CSETech March 1991 Data structures for optimizing programs with explicit parallelism Michael Wolfe Harini Srinivasan Follow this and additional works

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

6. Intermediate Representation

6. Intermediate Representation 6. Intermediate Representation Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/

More information

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection

An Overview of GCC Architecture (source: wikipedia) Control-Flow Analysis and Loop Detection An Overview of GCC Architecture (source: wikipedia) CS553 Lecture Control-Flow, Dominators, Loop Detection, and SSA Control-Flow Analysis and Loop Detection Last time Lattice-theoretic framework for data-flow

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 3 Thomas Wies New York University Review Last week Names and Bindings Lifetimes and Allocation Garbage Collection Scope Outline Control Flow Sequencing

More information

Control-Flow Analysis

Control-Flow Analysis Control-Flow Analysis Dragon book [Ch. 8, Section 8.4; Ch. 9, Section 9.6] Compilers: Principles, Techniques, and Tools, 2 nd ed. by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jerey D. Ullman on reserve

More information

Static rules of variable scoping in Erlang

Static rules of variable scoping in Erlang Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 137 145. Static rules of variable scoping in Erlang László Lövei, Zoltán Horváth,

More information

Binary Code Analysis: Concepts and Perspectives

Binary Code Analysis: Concepts and Perspectives Binary Code Analysis: Concepts and Perspectives Emmanuel Fleury LaBRI, Université de Bordeaux, France May 12, 2016 E. Fleury (LaBRI, France) Binary Code Analysis: Concepts

More information

Intermediate Representations

Intermediate Representations Intermediate Representations Intermediate Representations (EaC Chaper 5) Source Code Front End IR Middle End IR Back End Target Code Front end - produces an intermediate representation (IR) Middle end

More information

CodeSurfer/x86 A Platform for Analyzing x86 Executables

CodeSurfer/x86 A Platform for Analyzing x86 Executables CodeSurfer/x86 A Platform for Analyzing x86 Executables Gogul Balakrishnan 1, Radu Gruian 2, Thomas Reps 1,2, and Tim Teitelbaum 2 1 Comp. Sci. Dept., University of Wisconsin; {bgogul,reps}@cs.wisc.edu

More information

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan Language Processing Systems Prof. Mohamed Hamada Software ngineering Lab. The University of Aizu Japan Intermediate/Code Generation Source language Scanner (lexical analysis) tokens Parser (syntax analysis)

More information

CJT^jL rafting Cm ompiler

CJT^jL rafting Cm ompiler CJT^jL rafting Cm ompiler ij CHARLES N. FISCHER Computer Sciences University of Wisconsin Madison RON K. CYTRON Computer Science and Engineering Washington University RICHARD J. LeBLANC, Jr. Computer Science

More information

The Development of Static Single Assignment Form

The Development of Static Single Assignment Form The Development of Static Single Assignment Form Kenneth Zadeck NaturalBridge, Inc. zadeck@naturalbridge.com Ken's Graduate Optimization Seminar We learned: 1.what kinds of problems could be addressed

More information

Lecture 5: Declarative Programming. The Declarative Kernel Language Machine. September 12th, 2011

Lecture 5: Declarative Programming. The Declarative Kernel Language Machine. September 12th, 2011 Lecture 5: Declarative Programming. The Declarative Kernel Language Machine September 12th, 2011 1 Lecture Outline Declarative Programming contd Dataflow Variables contd Expressions and Statements Functions

More information

Problem Score Max Score 1 Syntax directed translation & type

Problem Score Max Score 1 Syntax directed translation & type CMSC430 Spring 2014 Midterm 2 Name Instructions You have 75 minutes for to take this exam. This exam has a total of 100 points. An average of 45 seconds per point. This is a closed book exam. No notes

More information

Static Program Analysis

Static Program Analysis Static Program Analysis Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-18/spa/ Preliminaries Outline of Lecture 1 Preliminaries Introduction

More information

Combining Optimizations: Sparse Conditional Constant Propagation

Combining Optimizations: Sparse Conditional Constant Propagation Comp 512 Spring 2011 Combining Optimizations: Sparse Conditional Constant Propagation Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University

More information

Compiler Optimization Intermediate Representation

Compiler Optimization Intermediate Representation Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

Where shall I parallelize?

Where shall I parallelize? Where shall I parallelize? Judit Dániel Tamás Melinda István Viktória Zoltán Kőszegi Horpácsi Kozsik Tóth Bozó Fördős Horváth Eötvös Loránd University and ELTE-Soft Kft. Budapest, Hungary Erlang User Conference

More information

6. Intermediate Representation!

6. Intermediate Representation! 6. Intermediate Representation! Prof. O. Nierstrasz! Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.! http://www.cs.ucla.edu/~palsberg/!

More information

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization. Where We Are Source Code Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Machine Code Where We Are Source Code Lexical Analysis Syntax Analysis

More information

COP5621 Exam 4 - Spring 2005

COP5621 Exam 4 - Spring 2005 COP5621 Exam 4 - Spring 2005 Name: (Please print) Put the answers on these sheets. Use additional sheets when necessary. Show how you derived your answer when applicable (this is required for full credit

More information

Lecture 23 CIS 341: COMPILERS

Lecture 23 CIS 341: COMPILERS Lecture 23 CIS 341: COMPILERS Announcements HW6: Analysis & Optimizations Alias analysis, constant propagation, dead code elimination, register allocation Due: Wednesday, April 25 th Zdancewic CIS 341:

More information

Functional programming languages

Functional programming languages Functional programming languages Part V: functional intermediate representations Xavier Leroy INRIA Paris-Rocquencourt MPRI 2-4, 2015 2017 X. Leroy (INRIA) Functional programming languages MPRI 2-4, 2015

More information

Intermediate Representations & Symbol Tables

Intermediate Representations & Symbol Tables Intermediate Representations & Symbol Tables Copyright 2014, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission

More information

CMSC430 Spring 2014 Midterm 2 Solutions

CMSC430 Spring 2014 Midterm 2 Solutions CMSC430 Spring 2014 Midterm 2 Solutions 1. (12 pts) Syntax directed translation & type checking Consider the following grammar fragment for an expression for C--: exp CONST IDENT 1 IDENT 2 [ exp 1 ] Assume

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 16: Functional Programming Zheng (Eddy Zhang Rutgers University April 2, 2018 Review: Computation Paradigms Functional: Composition of operations on data.

More information

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function Chen-Ting Chang, Yu-Sheng Chen, I-Wei Wu, and Jyh-Jiun Shann Dept. of Computer Science, National Chiao

More information

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Intermediate representations and code generation Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Today Intermediate representations and code generation Scanner Parser Semantic

More information

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation Comp 204: Computer Systems and Their Implementation Lecture 22: Code Generation and Optimisation 1 Today Code generation Three address code Code optimisation Techniques Classification of optimisations

More information

EXAMINATIONS 2008 MID-YEAR COMP431 COMPILERS. Instructions: Read each question carefully before attempting it.

EXAMINATIONS 2008 MID-YEAR COMP431 COMPILERS. Instructions: Read each question carefully before attempting it. T E W H A R E W Ā N A N G A O T E Ū P O K O O T E I K A A M Ā U I VUW V I C T O R I A UNIVERSITY OF WELLINGTON EXAMINATIONS 2008 MID-YEAR COMP431 COMPILERS Time Allowed: 3 Hours Instructions: Read each

More information

CS415 Compilers. Intermediate Represeation & Code Generation

CS415 Compilers. Intermediate Represeation & Code Generation CS415 Compilers Intermediate Represeation & Code Generation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review - Types of Intermediate Representations

More information

Lecture Notes on Liveness Analysis

Lecture Notes on Liveness Analysis Lecture Notes on Liveness Analysis 15-411: Compiler Design Frank Pfenning André Platzer Lecture 4 1 Introduction We will see different kinds of program analyses in the course, most of them for the purpose

More information

Control in Sequential Languages

Control in Sequential Languages CS 242 2012 Control in Sequential Languages Reading: Chapter 8, Sections 8.1 8.3 (only) Section 7.3 of The Haskell 98 Report, Exception Handling in the I/O Monad, http://www.haskell.org/onlinelibrary/io-13.html

More information

Concepts Introduced in Chapter 6

Concepts Introduced in Chapter 6 Concepts Introduced in Chapter 6 types of intermediate code representations translation of declarations arithmetic expressions boolean expressions flow-of-control statements backpatching EECS 665 Compiler

More information

Lecture 31: Building a Runnable Program

Lecture 31: Building a Runnable Program The University of North Carolina at Chapel Hill Spring 2002 Lecture 31: Building a Runnable Program April 10 1 From Source Code to Executable Code program gcd(input, output); var i, j: integer; begin read(i,

More information

COS 320. Compiling Techniques

COS 320. Compiling Techniques Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly

More information

Intermediate Representations

Intermediate Representations COMP 506 Rice University Spring 2018 Intermediate Representations source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

The View from 35,000 Feet

The View from 35,000 Feet The View from 35,000 Feet This lecture is taken directly from the Engineering a Compiler web site with only minor adaptations for EECS 6083 at University of Cincinnati Copyright 2003, Keith D. Cooper,

More information

Intermediate Representations Part II

Intermediate Representations Part II Intermediate Representations Part II Types of Intermediate Representations Three major categories Structural Linear Hybrid Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique

More information

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM. IR Generation Announcements My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM. This is a hard deadline and no late submissions will be

More information

Compiler Construction Lent Term 2015 Lectures 10, 11 (of 16)

Compiler Construction Lent Term 2015 Lectures 10, 11 (of 16) Compiler Construction Lent Term 15 Lectures 10, 11 (of 16) 1. Slang.2 (Lecture 10) 1. In lecture code walk of slang2_derive 2. Assorted topics (Lecture 11) 1. Exceptions 2. Objects 3. Stacks vs. Register

More information

The role of semantic analysis in a compiler

The role of semantic analysis in a compiler Semantic Analysis Outline The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210

Intermediate Representations. Reading & Topics. Intermediate Representations CS2210 Intermediate Representations CS2210 Lecture 11 Reading & Topics Muchnick: chapter 6 Topics today: Intermediate representations Automatic code generation with pattern matching Optimization Overview Control

More information

Chapter 3 (part 3) Describing Syntax and Semantics

Chapter 3 (part 3) Describing Syntax and Semantics Chapter 3 (part 3) Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings

More information

Single-Pass Generation of Static Single Assignment Form for Structured Languages

Single-Pass Generation of Static Single Assignment Form for Structured Languages 1 Single-Pass Generation of Static Single Assignment Form for Structured Languages MARC M. BRANDIS and HANSPETER MÖSSENBÖCK ETH Zürich, Institute for Computer Systems Over the last few years, static single

More information

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships

Computing Static Single Assignment (SSA) Form. Control Flow Analysis. Overview. Last Time. Constant propagation Dominator relationships Control Flow Analysis Last Time Constant propagation Dominator relationships Today Static Single Assignment (SSA) - a sparse program representation for data flow Dominance Frontier Computing Static Single

More information

Alternatives for semantic processing

Alternatives for semantic processing Semantic Processing Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies

More information

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target. COMP 412 FALL 2017 Computing Inside The Parser Syntax-Directed Translation Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights

More information

LLVM code generation and implementation of nested functions for the SimpliC language

LLVM code generation and implementation of nested functions for the SimpliC language LLVM code generation and implementation of nested functions for the SimpliC language Oscar Legetth Lunds University dat12ole@student.lth.se Gustav Svensson Lunds University dat12gs1@student.lth.se Abstract

More information

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Crafting a Compiler with C (II) Compiler V. S. Interpreter Crafting a Compiler with C (II) 資科系 林偉川 Compiler V S Interpreter Compilation - Translate high-level program to machine code Lexical Analyzer, Syntax Analyzer, Intermediate code generator(semantics Analyzer),

More information

Concepts Introduced in Chapter 6

Concepts Introduced in Chapter 6 Concepts Introduced in Chapter 6 types of intermediate code representations translation of declarations arithmetic expressions boolean expressions flow-of-control statements backpatching EECS 665 Compiler

More information

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University Code Generation Frédéric Haziza Department of Computer Systems Uppsala University Spring 2008 Operating Systems Process Management Memory Management Storage Management Compilers Compiling

More information

A Deterministic Concurrent Language for Embedded Systems

A Deterministic Concurrent Language for Embedded Systems A Deterministic Concurrent Language for Embedded Systems Stephen A. Edwards Columbia University Joint work with Olivier Tardieu SHIM:A Deterministic Concurrent Language for Embedded Systems p. 1/38 Definition

More information

On the correctness of template metaprograms

On the correctness of template metaprograms Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007 Vol 2 pp 301 308 On the correctness of template metaprograms Ádám Sipos, István Zólyomi, Zoltán

More information

Project Compiler. CS031 TA Help Session November 28, 2011

Project Compiler. CS031 TA Help Session November 28, 2011 Project Compiler CS031 TA Help Session November 28, 2011 Motivation Generally, it s easier to program in higher-level languages than in assembly. Our goal is to automate the conversion from a higher-level

More information

CS 432 Fall Mike Lam, Professor. Code Generation

CS 432 Fall Mike Lam, Professor. Code Generation CS 432 Fall 2015 Mike Lam, Professor Code Generation Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00 00

More information

Verified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology

Verified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology Guest lecture for Compiler Construction, Spring 2015 Verified compilers Magnus Myréen Chalmers University of Technology Mentions joint work with Ramana Kumar, Michael Norrish, Scott Owens and many more

More information

Principles of Compiler Design

Principles of Compiler Design Principles of Compiler Design Intermediate Representation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Unambiguous Program representation

More information

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming Closures Mooly Sagiv Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming Summary 1. Predictive Parsing 2. Large Step Operational Semantics (Natural) 3. Small Step Operational Semantics

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 Goals of the Course Develop a fundamental understanding of the major approaches to program analysis and optimization Understand

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

Extended SSA with factored use-def chains to support optimization and parallelism

Extended SSA with factored use-def chains to support optimization and parallelism Oregon Health & Science University OHSU Digital Commons CSETech June 1993 Extended SSA with factored use-def chains to support optimization and parallelism Eric Stoltz Michael P. Gerlek Michael Wolfe Follow

More information

Static Tokens: Using Dataflow to Automate Concurrent Pipeline Synthesis

Static Tokens: Using Dataflow to Automate Concurrent Pipeline Synthesis Static Tokens: Using Dataflow to Automate Concurrent Pipeline Synthesis John Teifel and Rajit Manohar Computer Systems Laboratory Cornell University Ithaca, NY 14853, U.S.A. Abstract We describe a new

More information

EECS 583 Class 6 Dataflow Analysis

EECS 583 Class 6 Dataflow Analysis EECS 583 Class 6 Dataflow Analysis University of Michigan September 26, 2016 Announcements & Reading Material HW 1» Congratulations if you are done!» The fun continues à HW2 out Wednesday, due Oct 17th,

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ws-1819/cc/ Generation of Intermediate Code Outline of Lecture 15

More information

Static Single Assignment (SSA) Form

Static Single Assignment (SSA) Form Static Single Assignment (SSA) Form A sparse program representation for data-flow. CSL862 SSA form 1 Computing Static Single Assignment (SSA) Form Overview: What is SSA? Advantages of SSA over use-def

More information

Generic syntactic analyser: ParsErl

Generic syntactic analyser: ParsErl Generic syntactic analyser: ParsErl Róbert Kitlei, László Lövei, Tamás Nagy, Anikó Nagyné Vig, Zoltán Horváth, Zoltán Csörnyei Department of Programming Languages and Compilers, Eötvös Loránd University,

More information

Lecture Notes on Decompilation

Lecture Notes on Decompilation Lecture Notes on Decompilation 15411: Compiler Design Maxime Serrano Lecture 20 October 31, 2013 1 Introduction In this lecture, we consider the problem of doing compilation backwards - that is, transforming

More information

CS 314 Principles of Programming Languages. Lecture 3

CS 314 Principles of Programming Languages. Lecture 3 CS 314 Principles of Programming Languages Lecture 3 Zheng Zhang Department of Computer Science Rutgers University Wednesday 14 th September, 2016 Zheng Zhang 1 CS@Rutgers University Class Information

More information

An Approach to Polyvariant Binding Time Analysis for a Stack-Based Language

An Approach to Polyvariant Binding Time Analysis for a Stack-Based Language Supported by Russian Foundation for Basic Research project No. 06-01-00574-a and No. 08-07-00280-a, and Russian Federal Agency of Science and Innovation project No. 2007-4-1.4-18-02-064. An Approach to

More information

An Approach to Polyvariant Binding Time Analysis for a Stack-Based Language

An Approach to Polyvariant Binding Time Analysis for a Stack-Based Language An Approach to Polyvariant Binding Time Analysis for a Stack-Based Language Yuri A. Klimov Keldysh Institute of Applied Mathematics, Russian Academy of Sciences RU-125047 Moscow, Russia, yuklimov@keldysh.ru

More information

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto Data Structures and Algorithms in Compiler Optimization Comp314 Lecture Dave Peixotto 1 What is a compiler Compilers translate between program representations Interpreters evaluate their input to produce

More information

Programs as data Parsing cont d; first-order functional language, type checking

Programs as data Parsing cont d; first-order functional language, type checking Programs as data Parsing cont d; first-order functional language, type checking Peter Sestoft Monday 2009-09-14 Plan for today Exercises week 2 Parsing: LR versus LL How does an LR parser work Hand-writing

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 7b Andrew Tolmach Portland State University 1994-2017 Values and Types We divide the universe of values according to types A type is a set of values and

More information

CST-402(T): Language Processors

CST-402(T): Language Processors CST-402(T): Language Processors Course Outcomes: On successful completion of the course, students will be able to: 1. Exhibit role of various phases of compilation, with understanding of types of grammars

More information

CMSC 430 Introduction to Compilers. Spring Intermediate Representations and Bytecode Formats

CMSC 430 Introduction to Compilers. Spring Intermediate Representations and Bytecode Formats CMSC 430 Introduction to Compilers Spring 2016 Intermediate Representations and Bytecode Formats Introduction Front end Source code Lexer Parser Types AST/IR IR 2 IR n IR n.s Middle end Back end Front

More information

Chapter 7 Control I Expressions and Statements

Chapter 7 Control I Expressions and Statements Chapter 7 Control I Expressions and Statements Expressions Conditional Statements and Guards Loops and Variation on WHILE The GOTO Controversy Exception Handling Values and Effects Important Concepts in

More information

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis

CS 432 Fall Mike Lam, Professor. Data-Flow Analysis CS 432 Fall 2018 Mike Lam, Professor Data-Flow Analysis Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00

More information

A Partial Correctness Proof for Programs with Decided Specifications

A Partial Correctness Proof for Programs with Decided Specifications Applied Mathematics & Information Sciences 1(2)(2007), 195-202 An International Journal c 2007 Dixie W Publishing Corporation, U. S. A. A Partial Correctness Proof for Programs with Decided Specifications

More information

LECTURE 18. Control Flow

LECTURE 18. Control Flow LECTURE 18 Control Flow CONTROL FLOW Sequencing: the execution of statements and evaluation of expressions is usually in the order in which they appear in a program text. Selection (or alternation): a

More information

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis? Semantic analysis and intermediate representations Which methods / formalisms are used in the various phases during the analysis? The task of this phase is to check the "static semantics" and generate

More information