Software Similarity Analysis. c April 28, 2011 Christian Collberg
|
|
- Darren Glenn
- 6 years ago
- Views:
Transcription
1 Software Similarity Analysis c April 28, 2011 Christian Collberg
2 Clone detection Duplicates are the result of copy-paste-modify programming. 2/49
3 Clone detection Duplicates are the result of copy-paste-modify programming. Problem during maintenance all copies of bugs need to be fixed. 2/49
4 Clone detection Detection phase: locating similar pieces of code in a program. P f1() f1,f3 P f2() f(r) f3() f2() 3/49
5 Clone detection Detection phase: locating similar pieces of code in a program. Abstraction phase: clones are extracted out into functions. P f1() f1,f3 f2() f3() P f(r) f2() 3/49
6 Clone detection algorithm Finding Clones Detect(P, threshold, minsize): 1 Build a representation rep of P from which it is convenient to find clone pairs. Collect code pairs that are sufficiently similar and sufficiently large to warrent their own abstraction: res rep convenient representation of P for every pair of code segments f, g rep, f g do if similarity(f, g) > threshold && size(f ) minsize && size(g) minsize then res res f, g 4/49
7 Clone detection algorithm Replace Clones Detect(P, threshold, minsize): 2 Break out the code-pairs found in the previous step into their own function and replace them with parameterized calls to this function: for every pair of code segments f, g res do h(r) a parameterized version of f and g P P h(r) replace f with a call to h(r 1) and g with h(r 2) 3 Return res,p 5/49
8 Attack model We don t expect programmers to be malicious! 6/49
9 Attack model We don t expect programmers to be malicious! The code becomes naturally obfuscated because of the specialization process. 6/49
10 Attack model We don t expect programmers to be malicious! The code becomes naturally obfuscated because of the specialization process. The programmer renames variables and replace literals with new values in the copied code. 6/49
11 Attack model We don t expect programmers to be malicious! The code becomes naturally obfuscated because of the specialization process. The programmer renames variables and replace literals with new values in the copied code. More complex changes are unusual. 6/49
12 What has this to do with software protection? Skype binary was protected by adding several hundred hash functions. 7/49
13 What has this to do with software protection? Skype binary was protected by adding several hundred hash functions. Could a clone dector have found them? 7/49
14 Plagiarism of programming assignments Hand in a verbatim copy of a friend s program. 8/49
15 Plagiarism of programming assignments Hand in a verbatim copy of a friend s program. Or, make radical changes to the program to hide the origin of the code. 8/49
16 Plagiarism of programming assignments Hand in a verbatim copy of a friend s program. Or, make radical changes to the program to hide the origin of the code. Borrowing one or more difficult functions from a friend. 8/49
17 Plagiarism of programming assignments Hand in a verbatim copy of a friend s program. Or, make radical changes to the program to hide the origin of the code. Borrowing one or more difficult functions from a friend. Fishing code out of the trash can. 8/49
18 Plagiarism of programming assignments Hand in a verbatim copy of a friend s program. Or, make radical changes to the program to hide the origin of the code. Borrowing one or more difficult functions from a friend. Fishing code out of the trash can. Nabbing code off the printer. 8/49
19 Plagiarism of programming assignments Hand in a verbatim copy of a friend s program. Or, make radical changes to the program to hide the origin of the code. Borrowing one or more difficult functions from a friend. Fishing code out of the trash can. Nabbing code off the printer. Outsource the assignments to an unscrupulous third party ( programming-mills ). 8/49
20 Plagiarism detection Make pair-wise comparisons between all the programs handed in by the students: P1 P2 P3 P1,P2 = 70% P1,P3 = 20% P2,P3 = 10% 9/49
21 Plagiarism detection Algorithm Detect(U, threshold): res for each pair of programs f, g do sim similarity(f, g) if sim > threshold then res res f, g, sim res res sorted on similarity return res 10/49
22 Attack model The student needs the code to look reasonable. 11/49
23 Attack model The student needs the code to look reasonable. General-purpose obfuscation probably not a good idea. 11/49
24 Attack model The student needs the code to look reasonable. General-purpose obfuscation probably not a good idea. Renaming windowsize to sizeofwindow OK. 11/49
25 Attack model The student needs the code to look reasonable. General-purpose obfuscation probably not a good idea. Renaming windowsize to sizeofwindow OK. Renaming windowsize to x93 not OK. 11/49
26 Attack model The student needs the code to look reasonable. General-purpose obfuscation probably not a good idea. Renaming windowsize to sizeofwindow OK. Renaming windowsize to x93 not OK. Replace a while-loop with a for-loop OK. 11/49
27 Attack model The student needs the code to look reasonable. General-purpose obfuscation probably not a good idea. Renaming windowsize to sizeofwindow OK. Renaming windowsize to x93 not OK. Replace a while-loop with a for-loop OK. Unroll the for-loop not OK. 11/49
28 Algorithm ssefm p. 631 AST-based clone detection * + + int + int var var var int a b c 9
29 ssefm: AST-based clone detection Look for clones in this program: (5 + (a + b)) * (7 + (c + 9)) Parse and build an AST S: * + + int + int var var var int a b c 9 13/49
30 An inefficient clone detector... Construct all tree patterns. 14/49
31 An inefficient clone detector... Construct all tree patterns. A tree pattern is a subtree of S where one or more subtrees have been replaced with a wildcard. 14/49
32 An inefficient clone detector... Construct all tree patterns. A tree pattern is a subtree of S where one or more subtrees have been replaced with a wildcard. 14/49
33 An inefficient clone detector... Construct all tree patterns. A tree pattern is a subtree of S where one or more subtrees have been replaced with a wildcard. We ll color the ASTs themselves blue and the tree patterns pink. 14/49
34 Some of the tree patterns? +? +?? * +? *?? +?? int +?? +? + * +? int + int +?? var var var int???? +?? *?? * + +??? + +?? + int +? var?? int? + int +? var var?? var? + int + 5 var var?? 15/49
35 What s a clone in an AST? What s a clone in the context of an AST? 16/49
36 What s a clone in an AST? What s a clone in the context of an AST? Simply a tree pattern for which there s more than one match! 16/49
37 What s a clone in an AST? What s a clone in the context of an AST? Simply a tree pattern for which there s more than one match! Which patterns would make a good clone? 1 has a large number of nodes 2 occurs a large number of times in the AST 3 has few holes 16/49
38 Which patterns would make good clones? This pattern seems like it might make a good choice + int +? var?? 17/49
39 Which patterns would make good clones? This pattern seems like it might make a good choice + int +? var? It matches two large subtrees of S:? + int + 7 var int c 9 + int + 5 var var a b 17/49
40 Extract clones! Now you can extract the clones and turn them into macros: #define CLONE(x,y,z) ((x)+((y)+(z))) CLONE(5,a,b) * CLONE(7,c,9) 18/49
41 A slow algorithm... Build a clone table, a mapping from each pattern to the locations in S where it occurs: 19/49
42 A slow algorithm... Build a clone table, a mapping from each pattern to the locations in S where it occurs: Sort the table with largest patterns, most number of occurrences, fewest number of holes first! 19/49
43 + int +? var?? + int + 7 var int c 9 + int + 5 var var a b int? int 5 int 7? +? + int + 7 var int c 9 + int + 5 var var a b + var int c 9 + var var a b? * + +??? + +?? * + + int + int var var var int a b c 9 var? var a var b var c
44 A heuristic algorithm... Won t scale: exponential number of tree patterns. 21/49
45 A heuristic algorithm... Won t scale: exponential number of tree patterns. Idea: iteratively grow larger tree patterns from smaller ones. 21/49
46 A heuristic algorithm... Won t scale: exponential number of tree patterns. Idea: iteratively grow larger tree patterns from smaller ones. Step 1:? +? + int + 7 var int c 9 + int + 5 var var a b + var int c 9 + var var a b 21/49
47 Step 2-3 We specialize, and the new pattern becomes larger (but only has 2 matches):? +? +? + int + 7 var int c 9 + int + 5 var var a b 22/49
48 Step 2-3 We specialize, and the new pattern becomes larger (but only has 2 matches):? +? +? + int + 7 var int c 9 After two more steps of specialization, we re done: + int + 5 var var a b int? + var? +? + + int 7 var c int 9 + int + 5 var var a b 22/49
49 They found this clone 10 times over some Java classes: for(int i=0; i<? 1; i++) if (? 2[i]!=? 3[i]) return false; The strength of the algorithm is that it allows structural matching: holes can accept any subtree. 23/49
50 Graph-based analysis p. 635
51 Programs are graphs! Control-flow graphs! 25/49
52 Programs are graphs! Control-flow graphs! Dependence graphs! 25/49
53 Programs are graphs! Control-flow graphs! Dependence graphs! Inheritance graphs! 25/49
54 Programs are graphs! Control-flow graphs! Dependence graphs! Inheritance graphs! Can program similarity be computed over graph representations of programs? 25/49
55 Unfortunately... Sub-graph isomorphism is NP-complete. 26/49
56 Unfortunately... Sub-graph isomorphism is NP-complete. Fortunately, graphs computed from programs are not general graphs. 26/49
57 Unfortunately... Sub-graph isomorphism is NP-complete. Fortunately, graphs computed from programs are not general graphs. Control-flow graphs will not be arbitrarily large. 26/49
58 Unfortunately... Sub-graph isomorphism is NP-complete. Fortunately, graphs computed from programs are not general graphs. Control-flow graphs will not be arbitrarily large. Call-graphs tend to be very sparse. 26/49
59 Unfortunately... Sub-graph isomorphism is NP-complete. Fortunately, graphs computed from programs are not general graphs. Control-flow graphs will not be arbitrarily large. Call-graphs tend to be very sparse. Heuristics can be very effective in approximating subgraph isomorphism. 26/49
60 Algorithm sskh p. 636 PDG-based clone detection S 0 S 5 S 2 S3 S 1 S 6 S 7 S 8 S 4
61 sskh: PDG-based clone detection The nodes of a PDF are the statements of a function. 28/49
62 sskh: PDG-based clone detection The nodes of a PDF are the statements of a function. There s an edge m n if 1 n is data-dependent on m, or 2 n is control-dependent on m. 28/49
63 sskh: PDG-based clone detection The nodes of a PDF are the statements of a function. There s an edge m n if 1 n is data-dependent on m, or 2 n is control-dependent on m. Semantics-preserving reordering of the statements of a function won t affect the graph. 28/49
64 Program Dependence Graph S 0 : int k = 0; S 1 : int s = 1; S 2 : while (k < w) { S 3 : if (x[k] == 1) S 4 : R = (s*y) % n; else S 5 : R = s; S 6 : s = R*R % n; S 7 : L = R; S 8 : k=k+1; } 29/49
65 Program Dependence Graph S 0 S 5 S 2 S3 S 1 S 6 S 7 S 8 S 4 30/49
66 sskh: basic idea Build a PDG for each function of the program 31/49
67 sskh: basic idea Build a PDG for each function of the program Compute two isomorphic subraphs by slicing backwards along dependency edges starting with every pair of matching nodes. 31/49
68 sskh: basic idea Build a PDG for each function of the program Compute two isomorphic subraphs by slicing backwards along dependency edges starting with every pair of matching nodes. Two nodes are matching if they have the same syntactic structure. 31/49
69 sskh: basic idea Build a PDG for each function of the program Compute two isomorphic subraphs by slicing backwards along dependency edges starting with every pair of matching nodes. Two nodes are matching if they have the same syntactic structure. Repeat until no more nodes can be added to the slice. 31/49
70 A (contrived) example a 1: a = g(8); b 1: b = z*3; a 2: while(a<10) a 3: a = f(a); b 2: while(b<20) b 3: b = f(b); a 4: if (a==10) { a 5: printf("foo\n"); a 6: x=x+2; } b 4: if (b==20) { b 5: printf("bar\n"); b 6: y=y+2; b 7: printf("baz\n"); } Two similar pieces of code have been intertwined within the same function. 32/49
71 A (contrived) example a 1: a=g(8) b 1: b=z*3 a 2: while(a<10) b 2: while(b<20) a 3: a=f(a) b 3: b=f(b) a 4: if(a==10) b 4: if(b==20) a 5: printf("foo\n") b 5: printf("bar\n") a 6: x=x+2 b 6: y=y+2 b 7: printf("baz\n") 33/49
72 Algorithm: Step 1-3. a 4 and b 4 match. Add them to the slice. 34/49
73 Algorithm: Step 1-3. a 4 and b 4 match. Add them to the slice. Consider a 4 and b 4 s predecessors, a 3 and b 3. 34/49
74 Algorithm: Step 1-3. a 4 and b 4 match. Add them to the slice. Consider a 4 and b 4 s predecessors, a 3 and b 3. a 3 and b 3 match, too. Add them to the slice. 34/49
75 Algorithm: Step 1-3. a 4 and b 4 match. Add them to the slice. Consider a 4 and b 4 s predecessors, a 3 and b 3. a 3 and b 3 match, too. Add them to the slice. Add a 2 and b 2 to the slice since they match and are predecessors of a 3 and b 3. 34/49
76 The PDF after Step 3 a 1: a=g(8) b 1: b=z*3 a 2: while(a<10) b 2: while(b<20) a 3: a=f(a) b 3: b=f(b) a 4: if(a==10) b 4: if(b==20) a 5: printf("foo\n") b 5: printf("bar\n") a 6: x=x+2 b 6: y=y+2 b 7: printf("baz\n") 35/49
77 Algorithm: Step 4 a 5 /b 5 and a 6 /b 6 really should belong to the clone! 36/49
78 Algorithm: Step 4 a 5 /b 5 and a 6 /b 6 really should belong to the clone! But, backwards slice won t include them. 36/49
79 Algorithm: Step 4 a 5 /b 5 and a 6 /b 6 really should belong to the clone! But, backwards slice won t include them. So, slice forward one step from any predicate in an if- and while-statement. 36/49
80 The PDG after Step 4 a 1: a=g(8) b 1: b=z*3 a 2: while(a<10) b 2: while(b<20) a 3: a=f(a) b 3: b=f(b) a 4: if(a==10) b 4: if(b==20) a 5: printf("foo\n") b 5: printf("bar\n") a 6: x=x+2 b 6: y=y+2 b 7: printf("baz\n") 37/49
81 The extracted clone #define CLONE(x,c,d,s,p,y)\ while(x<c) x = f(x);\ if (x==d){\ printf(s);\ y=y+2;\ p=1;}\ else p=0; a = g(8); b = z*3; CLONE(a,10,10,"foo\n",p,x) CLONE(b,20,20,"bar\n",p,y) if (p) printf("baz\n"); 38/49
82 Summary This algorithm handles clones where statements have been reordered, clones that are non-contiguous, and clones that have been intertwined with each other. 39/49
83 Summary This algorithm handles clones where statements have been reordered, clones that are non-contiguous, and clones that have been intertwined with each other. Depressing performance numbers. A 11,540 line C program takes 1 hour and 34 minutes to process. 39/49
84 Algorithm sslchy p. 640 PDG-based plagiarism detection
85 sslchy: PDG-based plagiarism detection Uses PDGs, but for plagiarism detection. 41/49
86 sslchy: PDG-based plagiarism detection Uses PDGs, but for plagiarism detection. Uses a general-purpose subgraph isomorphism algorithm. 41/49
87 sslchy: PDG-based plagiarism detection Uses PDGs, but for plagiarism detection. Uses a general-purpose subgraph isomorphism algorithm. Uses a preprocessing step to weed out unlikely plagiarism candidates. 41/49
88 Plagiarised PDGs? What does it mean for one PDG to be considered a plagiarised version of another? 42/49
89 Plagiarised PDGs? What does it mean for one PDG to be considered a plagiarised version of another? We expect some manner of obfuscation of the code equality is too strong! 42/49
90 Plagiarised PDGs? What does it mean for one PDG to be considered a plagiarised version of another? We expect some manner of obfuscation of the code equality is too strong! The two PDGs should be γ-isomorphic. 42/49
91 Plagiarised PDGs? What does it mean for one PDG to be considered a plagiarised version of another? We expect some manner of obfuscation of the code equality is too strong! The two PDGs should be γ-isomorphic. Set γ = 0.9, ( overhauling (without errors) 10% of a PDG of reasonable size is almost equivalent to rewriting the code. ) 42/49
92 Common Subgraphs Definition Common subgraphs Let G, G 1, and G 2 be graphs. G is a common subgraph of G 1 and G 2 if there exists subgraph isomorphisms from G to G 1 and from G to G 2. G is the maximal common subgraph of two graphs G 1 and G 2 (G = mcs(g 1, G 2 )) if G is a common subgraph of G 1 and G 2 and there exists no other common subgraph G of G 1 and G 2 that has more nodes than G. 43/49
93 Common Subgraphs Example The colored nodes induce a maximal common subgraph of G 1 and G 2 of four nodes: G 1 G 2 44/49
94 Graph similarity and containment Definition Graph similarity and containment Let G be the number of nodes in G. The similarity(g 1, G 2 ) of G 1 and G 2 is defined as similarity(g 1, G 2 ) = mcs(g 1, G 2 ) max( G 1, G 2 ) The containment(g 1, G 2 ) of G 1 within G 2 is defined as containment(g 1, G 2 ) = mcs(g 1, G 2 ). G 1 We say that G 1 is γ-isomorphic to G 2 if containment(g 1, G 2 ) γ, γ (0, 1]. 45/49
95 Graph similarity and containment Example G 1 G 2 similarity(g 1, G 2 ) = 4 7 and 46/49
96 Graph similarity and containment Example G 1 G 2 similarity(g 1, G 2 ) = 4 7 and containment(g 1, G 2 ) = /49
97 Filtering step Subgraph isomorphism testing is expensive prune out 9 10 of all program pairs from consideration: 1 ignore any graph which has too few nodes to be interesting. 47/49
98 Filtering step Subgraph isomorphism testing is expensive prune out 9 10 of all program pairs from consideration: 1 ignore any graph which has too few nodes to be interesting. 2 remove (g,g ) from consideration if g < γ g (would never pass a γ-isomorphism test). 47/49
99 Filtering step Subgraph isomorphism testing is expensive prune out 9 10 of all program pairs from consideration: 1 ignore any graph which has too few nodes to be interesting. 2 remove (g,g ) from consideration if g < γ g (would never pass a γ-isomorphism test). 3 remove (g,g ) if the frequency of their different node types are too different. For example, if g consists solely of function call nodes and g consists solely of nodes representing arithmetic operations, unlikely related. 47/49
100 Summary A PDG is not affected by 1 statement reordering, 48/49
101 Summary A PDG is not affected by 1 statement reordering, 2 variable renaming, 48/49
102 Summary A PDG is not affected by 1 statement reordering, 2 variable renaming, 3 replacing while-loops by for-loops, 48/49
103 Summary A PDG is not affected by 1 statement reordering, 2 variable renaming, 3 replacing while-loops by for-loops, 4 flipping the order of branches in if-statements. 48/49
104 Summary A PDG is not affected by 1 statement reordering, 2 variable renaming, 3 replacing while-loops by for-loops, 4 flipping the order of branches in if-statements. The PDG is affected by 1 inlining and outlining 48/49
105 Summary A PDG is not affected by 1 statement reordering, 2 variable renaming, 3 replacing while-loops by for-loops, 4 flipping the order of branches in if-statements. The PDG is affected by 1 inlining and outlining 2 add bogus dependencies to introduce spurious edges 48/49
Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim
Lecture 25 Clone Detection CCFinder Today s Agenda (1) Recap of Polymetric Views Class Presentation Suchitra (advocate) Reza (skeptic) Today s Agenda (2) CCFinder, Kamiya et al. TSE 2002 Recap of Polymetric
More informationSoftware Clone Detection. Kevin Tang Mar. 29, 2012
Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques
More informationCtcompare: Comparing Multiple Code Trees for Similarity
Ctcompare: Comparing Multiple Code Trees for Similarity Warren Toomey School of IT, Bond University Using lexical analysis with techniques borrowed from DNA sequencing, multiple code trees can be quickly
More informationUsing Slicing to Identify Duplication in Source Code
Using Slicing to Identify Duplication in Source Code Raghavan Komondoor Computer Sciences Department University of Wisconsin-Madison Madison, WI 53706 USA raghavan@cs.wisc.edu Susan Horwitz Computer Sciences
More informationTypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines
TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines Paolo G. Giarrusso 04 March 2011 Software product lines (SPLs) Feature selection SPL = 1 software project
More informationgspan: Graph-Based Substructure Pattern Mining
University of Illinois at Urbana-Champaign February 3, 2017 Agenda What motivated the development of gspan? Technical Preliminaries Exploring the gspan algorithm Experimental Performance Evaluation Introduction
More informationSoftware Protection: How to Crack Programs, and Defend Against Cracking Lecture 4: Code Obfuscation Minsk, Belarus, Spring 2014
Software Protection: How to Crack Programs, and Defend Against Cracking Lecture 4: Code Obfuscation Minsk, Belarus, Spring 2014 Christian Collberg University of Arizona www.cs.arizona.edu/ collberg c June
More informationLearning Analytics. Dr. Bowen Hui Computer Science University of British Columbia Okanagan
Learning Analytics Dr. Bowen Hui Computer Science University of British Columbia Okanagan Last Time: Exercise Given the following code, build its PDG (follow table of vertex types) int sum( int array[]
More informationSoftware Protection: How to Crack Programs, and Defend Against Cracking Lecture 3: Program Analysis Moscow State University, Spring 2014
Software Protection: How to Crack Programs, and Defend Against Cracking Lecture 3: Program Analysis Moscow State University, Spring 2014 Christian Collberg University of Arizona www.cs.arizona.edu/ collberg
More informationCS415 Compilers. Intermediate Represeation & Code Generation
CS415 Compilers Intermediate Represeation & Code Generation These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Review - Types of Intermediate Representations
More informationComparing Multiple Source Code Trees, version 3.1
Comparing Multiple Source Code Trees, version 3.1 Warren Toomey School of IT Bond University April 2010 This is my 3 rd version of a tool to compare source code trees to find similarities. The latest algorithm
More informationRemote Entrusting by Orthogonal Client Replacement. Ceccato Mariano 1, Mila Dalla Preda 2, Anirban Majumbar 3, Paolo Tonella 1.
Remote Entrusting by Orthogonal Client Replacement Ceccato Mariano, Mila Dalla Preda 2, Anirban Majumbar 3, Paolo Tonella Fondazione Bruno Kessler, Trento, Italy 2 University of Verona, Italy 3 University
More informationCSc 520 Principles of Programming Languages
CSc 520 Principles of Programming Languages 32: Procedures Inlining Christian Collberg collberg@cs.arizona.edu Department of Computer Science University of Arizona Copyright c 2005 Christian Collberg [1]
More information1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones
Overview Code Clones Definition and categories Clone detection Clone removal refactoring Spiros Mancoridis[1] Modified by Na Meng 2 Code Clones Code clone is a code fragment in source files that is identical
More informationCSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)
CSE 413 Languages & Implementation Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341) 1 Goals Representing programs as data Racket structs as a better way to represent
More informationObfuscating Transformations. What is Obfuscator? Obfuscation Library. Obfuscation vs. Deobfuscation. Summary
? Obfuscating Outline? 1? 2 of Obfuscating 3 Motivation: Java Virtual Machine? difference between Java and others? Most programming languages: Java: Source Code Machine Code Predefined Architecture Java
More informationKClone: A Proposed Approach to Fast Precise Code Clone Detection
KClone: A Proposed Approach to Fast Precise Code Clone Detection Yue Jia 1, David Binkley 2, Mark Harman 1, Jens Krinke 1 and Makoto Matsushita 3 1 King s College London 2 Loyola College in Maryland 3
More informationRemote Entrusting by Orthogonal Client Replacement
Remote Entrusting by Orthogonal Client Replacement Mariano Ceccato 1, Mila Dalla Preda 2, Anirban Majumdar 3, Paolo Tonella 1 1 Fondazione Bruno Kessler, Trento, Italy 2 University of Verona, Italy 3 University
More informationCode generation for modern processors
Code generation for modern processors Definitions (1 of 2) What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Instruction
More informationDetecting Self-Mutating Malware Using Control-Flow Graph Matching
Detecting Self-Mutating Malware Using Control-Flow Graph Matching Danilo Bruschi Lorenzo Martignoni Mattia Monga Dipartimento di Informatica e Comunicazione Università degli Studi di Milano {bruschi,martign,monga}@dico.unimi.it
More informationCode generation for modern processors
Code generation for modern processors What are the dominant performance issues for a superscalar RISC processor? Refs: AS&U, Chapter 9 + Notes. Optional: Muchnick, 16.3 & 17.1 Strategy il il il il asm
More informationRearranging the Order of Program Statements for Code Clone Detection
Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp
More informationCode Clone Detector: A Hybrid Approach on Java Byte Code
Code Clone Detector: A Hybrid Approach on Java Byte Code Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Software Engineering Submitted By
More informationCode Duplication. Harald Gall seal.ifi.uzh.ch/evolution
Code Duplication Harald Gall seal.ifi.uzh.ch/evolution Code is Copied Small Example from the Mozilla Distribution (Milestone 9) Extract from /dom/src/base/nslocation.cpp [432] NS_IMETHODIMP [467] NS_IMETHODIMP
More informationProgramming. We will be introducing various new elements of Python and using them to solve increasingly interesting and complex problems.
Plan for the rest of the semester: Programming We will be introducing various new elements of Python and using them to solve increasingly interesting and complex problems. We saw earlier that computers
More informationLecture Notes on Contracts
Lecture Notes on Contracts 15-122: Principles of Imperative Computation Frank Pfenning Lecture 2 August 30, 2012 1 Introduction For an overview the course goals and the mechanics and schedule of the course,
More informationIntermediate Representation (IR)
Intermediate Representation (IR) Components and Design Goals for an IR IR encodes all knowledge the compiler has derived about source program. Simple compiler structure source code More typical compiler
More informationCHAPTER 3. Register allocation
CHAPTER 3 Register allocation In chapter 1 we simplified the generation of x86 assembly by placing all variables on the stack. We can improve the performance of the generated code considerably if we instead
More information(How Not To Do) Global Optimizations
(How Not To Do) Global Optimizations #1 One-Slide Summary A global optimization changes an entire method (consisting of multiple basic blocks). We must be conservative and only apply global optimizations
More informationTREES AND ORDERS OF GROWTH 7
TREES AND ORDERS OF GROWTH 7 COMPUTER SCIENCE 61A October 17, 2013 1 Trees In computer science, trees are recursive data structures that are widely used in various settings. This is a diagram of a simple
More informationEECS 583 Class 3 Region Formation, Predicated Execution
EECS 583 Class 3 Region Formation, Predicated Execution University of Michigan September 14, 2011 Reading Material Today s class» Trace Selection for Compiling Large C Applications to Microcode, Chang
More informationType checking. Jianguo Lu. November 27, slides adapted from Sean Treichler and Alex Aiken s. Jianguo Lu November 27, / 39
Type checking Jianguo Lu November 27, 2014 slides adapted from Sean Treichler and Alex Aiken s Jianguo Lu November 27, 2014 1 / 39 Outline 1 Language translation 2 Type checking 3 optimization Jianguo
More informationSourcererCC -- Scaling Code Clone Detection to Big-Code
SourcererCC -- Scaling Code Clone Detection to Big-Code What did this paper do? SourcererCC a token-based clone detector, that can detect both exact and near-miss clones from large inter project repositories
More informationCS 360 Programming Languages Interpreters
CS 360 Programming Languages Interpreters Implementing PLs Most of the course is learning fundamental concepts for using and understanding PLs. Syntax vs. semantics vs. idioms. Powerful constructs like
More informationHierarchical Program Representation for Program Element Matching
Hierarchical Program Representation for Program Element Matching Fernando Berzal, Juan-Carlos Cubero, and Aída Jiménez Dept. Computer Science and Artificial Intelligence, ETSIIT - University of Granada,
More informationIn Our Last Exciting Episode
In Our Last Exciting Episode #1 Lessons From Model Checking To find bugs, we need specifications What are some good specifications? To convert a program into a model, we need predicates/invariants and
More informationFormal Semantics. Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1
Formal Semantics Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1 Formal Semantics At the beginning of the book we saw formal definitions of syntax with BNF And how to make a BNF that generates
More informationConditional Elimination through Code Duplication
Conditional Elimination through Code Duplication Joachim Breitner May 27, 2011 We propose an optimizing transformation which reduces program runtime at the expense of program size by eliminating conditional
More informationEECS 219C: Formal Methods Boolean Satisfiability Solving. Sanjit A. Seshia EECS, UC Berkeley
EECS 219C: Formal Methods Boolean Satisfiability Solving Sanjit A. Seshia EECS, UC Berkeley The Boolean Satisfiability Problem (SAT) Given: A Boolean formula F(x 1, x 2, x 3,, x n ) Can F evaluate to 1
More informationComputer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis
Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity
More informationCompiler Code Generation COMP360
Compiler Code Generation COMP360 Students who acquire large debts putting themselves through school are unlikely to think about changing society. When you trap people in a system of debt, they can t afford
More informationAutomating Big Refactorings for Componentization and the Move to SOA
Automating Big Refactorings for Componentization and the Move to SOA IBM Programming Languages and Development Environments Seminar 2008 Aharon Abadi, Ran Ettinger and Yishai Feldman Software Asset Management
More informationCOMP520 - GoLite Type Checking Specification
COMP520 - GoLite Type Checking Specification Vincent Foley April 8, 2018 1 Introduction This document presents the typing rules for all the language constructs (i.e. declarations, statements, expressions)
More informationFinding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing
Finding Similar Sets Applications Shingling Minhashing Locality-Sensitive Hashing Goals Many Web-mining problems can be expressed as finding similar sets:. Pages with similar words, e.g., for classification
More informationBoolean Representations and Combinatorial Equivalence
Chapter 2 Boolean Representations and Combinatorial Equivalence This chapter introduces different representations of Boolean functions. It then discusses the applications of these representations for proving
More informationCS152: Programming Languages. Lecture 7 Lambda Calculus. Dan Grossman Spring 2011
CS152: Programming Languages Lecture 7 Lambda Calculus Dan Grossman Spring 2011 Where we are Done: Syntax, semantics, and equivalence For a language with little more than loops and global variables Now:
More informationRacket: Macros. Advanced Functional Programming. Jean-Noël Monette. November 2013
Racket: Macros Advanced Functional Programming Jean-Noël Monette November 2013 1 Today Macros pattern-based macros Hygiene Syntax objects and general macros Examples 2 Macros (According to the Racket Guide...)
More informationThe Lambda Calculus. notes by Don Blaheta. October 12, A little bondage is always a good thing. sk
The Lambda Calculus notes by Don Blaheta October 2, 2000 A little bondage is always a good thing. sk We ve seen that Scheme is, as languages go, pretty small. There are just a few keywords, and most of
More informationCOMP 410 Lecture 1. Kyle Dewey
COMP 410 Lecture 1 Kyle Dewey About Me I research automated testing techniques and their intersection with CS education My dissertation used logic programming extensively This is my second semester at
More informationA Tour of the Cool Support Code
A Tour of the Cool Support Code 1 Introduction The Cool compiler project provides a number of basic data types to make the task of writing a Cool compiler tractable in the timespan of the course. This
More information15-780: Problem Set #3
15-780: Problem Set #3 March 31, 2014 1. Partial-Order Planning [15 pts] Consider a version of the milk//banana//drill shopping problem used in the second planning lecture (slides 17 26). (a) Modify the
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationTypes, Expressions, and States
8/27: solved Types, Expressions, and States CS 536: Science of Programming, Fall 2018 A. Why? Expressions represent values in programming languages, relative to a state. Types describe common properties
More informationCode Similarity Detection by Program Dependence Graph
2016 International Conference on Computer Engineering and Information Systems (CEIS-16) Code Similarity Detection by Program Dependence Graph Zhen Zhang, Hai-Hua Yan, Xiao-Wei Zhang Dept. of Computer Science,
More informationKeywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Detection of Potential
More informationCode Clone Detection on Specialized PDGs with Heuristics
2011 15th European Conference on Software Maintenance and Reengineering Code Clone Detection on Specialized PDGs with Heuristics Yoshiki Higo Graduate School of Information Science and Technology Osaka
More informationFormal Specification and Verification
Formal Specification and Verification Introduction to Promela Bernhard Beckert Based on a lecture by Wolfgang Ahrendt and Reiner Hähnle at Chalmers University, Göteborg Formal Specification and Verification:
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline
More informationProgramming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.
Programming Languages and Compilers Qualifying Examination Fall 2017 Answer 4 of 6 questions. GENERAL INSTRUCTIONS 1. Answer each question in a separate book. 2. Indicate on the cover of each book the
More informationAlgorithm classification
Types of Algorithms Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We ll talk about a classification scheme for algorithms This classification scheme
More informationMACHINE LEARNING FOR SOFTWARE MAINTAINABILITY
MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY Anna Corazza, Sergio Di Martino, Valerio Maggio Alessandro Moschitti, Andrea Passerini, Giuseppe Scanniello, Fabrizio Silverstri JIMSE 2012 August 28, 2012
More informationProcessor. Lecture #2 Number Rep & Intro to C classic components of all computers Control Datapath Memory Input Output
CS61C L2 Number Representation & Introduction to C (1) insteecsberkeleyedu/~cs61c CS61C : Machine Structures Lecture #2 Number Rep & Intro to C Scott Beamer Instructor 2007-06-26 Review Continued rapid
More informationPointers. Chapter 8. Decision Procedures. An Algorithmic Point of View. Revision 1.0
Pointers Chapter 8 Decision Procedures An Algorithmic Point of View D.Kroening O.Strichman Revision 1.0 Outline 1 Introduction Pointers and Their Applications Dynamic Memory Allocation Analysis of Programs
More informationDetection of Non Continguous Clones in Software using Program Slicing
Detection of Non Continguous Clones in Software using Program Slicing Er. Richa Grover 1 Er. Narender Rana 2 M.Tech in CSE 1 Astt. Proff. In C.S.E 2 GITM, Kurukshetra University, INDIA Abstract Code duplication
More informationPattern Matching and Abstract Data Types
Pattern Matching and Abstract Data Types Tom Murphy VII 3 Dec 2002 0-0 Outline Problem Setup Views ( Views: A Way For Pattern Matching To Cohabit With Data Abstraction, Wadler, 1986) Active Patterns (
More informationLecture 6 Binary Search
Lecture 6 Binary Search 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning One of the fundamental and recurring problems in computer science is to find elements in collections, such
More informationEECS 219C: Computer-Aided Verification Boolean Satisfiability Solving. Sanjit A. Seshia EECS, UC Berkeley
EECS 219C: Computer-Aided Verification Boolean Satisfiability Solving Sanjit A. Seshia EECS, UC Berkeley Project Proposals Due Friday, February 13 on bcourses Will discuss project topics on Monday Instructions
More informationCHAPTER 3. Register allocation
CHAPTER 3 Register allocation In chapter 1 we simplified the generation of x86 assembly by placing all variables on the stack. We can improve the performance of the generated code considerably if we instead
More informationIntellectual Property Protection using Obfuscation
Intellectual Property Protection using Obfuscation Stephen Drape Collaboration between Siemens AG, Munich and the University of Oxford Research Project sponsored by Siemens AG, Munich March 2009 Contents
More informationCOMPARISON AND EVALUATION ON METRICS
COMPARISON AND EVALUATION ON METRICS BASED APPROACH FOR DETECTING CODE CLONE D. Gayathri Devi 1 1 Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu dgayadevi@gmail.com Abstract
More informationAn Evaluation of Web Plagiarism Detection Systems for Student Essays
99 An Evaluation of Web Plagiarism Detection Systems for Student Essays Tuomo Kakkonen, Maxim Mozgovoy Department of Computer Science and Statistics, University of Joensuu, Finland tuomo.kakkonen@cs.joensuu.fi
More informationvoid insert( Type const & ) void push_front( Type const & )
6.1 Binary Search Trees A binary search tree is a data structure that can be used for storing sorted data. We will begin by discussing an Abstract Sorted List or Sorted List ADT and then proceed to describe
More informationCode Generation. Lecture 12
Code Generation Lecture 12 1 Lecture Outline Topic 1: Basic Code Generation The MIPS assembly language A simple source language Stack-machine implementation of the simple language Topic 2: Code Generation
More informationData Structures and Algorithms
Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,
More informationCODE GENERATION Monday, May 31, 2010
CODE GENERATION memory management returned value actual parameters commonly placed in registers (when possible) optional control link optional access link saved machine status local data temporaries A.R.
More informationSupercomputing in Plain English Part IV: Henry Neeman, Director
Supercomputing in Plain English Part IV: Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma Wednesday September 19 2007 Outline! Dependency Analysis! What is
More informationNUMERIC SYSTEMS USED IN NETWORKING
NUMERIC SYSTEMS USED IN NETWORKING Decimal - Binary - Hexadecimal Table ASCII Code 128 64 32 16 8 4 2 1 The Letter A 0 1 0 0 0 0 0 1 Data Units Base 10 Numbering System Base 2 Numbering System Decimal
More information4.1 Review - the DPLL procedure
Applied Logic Lecture 4: Efficient SAT solving CS 4860 Spring 2009 Thursday, January 29, 2009 The main purpose of these notes is to help me organize the material that I used to teach today s lecture. They
More informationCSE 331 Software Design and Implementation. Lecture 3 Loop Reasoning
CSE 331 Software Design and Implementation Lecture 3 Loop Reasoning Zach Tatlock / Spring 2018 Reasoning about loops So far, two things made all our examples much easier: 1. When running the code, each
More informationCS 245 Midterm Exam Winter 2014
CS 245 Midterm Exam Winter 2014 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have 70 minutes
More informationExperience with Software Watermarking. Jens Palsberg, Sowmya Krishnaswamy, Minseok Kwon, Di Ma, Qiuyun Shao, Yi Zhang
Experience with Software Watermarking Jens Palsberg, Sowmya Krishnaswamy, Minseok Kwon, Di Ma, Qiuyun Shao, Yi Zhang Properties of Watermarks Easy to create Easy to verify Difficult to remove Difficult
More informationCS164: Midterm I. Fall 2003
CS164: Midterm I Fall 2003 Please read all instructions (including these) carefully. Write your name, login, and circle the time of your section. Read each question carefully and think about what s being
More informationMachine-Independent Optimizations
Chapter 9 Machine-Independent Optimizations High-level language constructs can introduce substantial run-time overhead if we naively translate each construct independently into machine code. This chapter
More informationISSISP 2014 Code Obfuscation Verona, Italy
ISSISP 2014 Code Obfuscation Verona, Italy Christian Collberg University of Arizona www.cs.arizona.edu/ collberg c July 27, 2014 Christian Collberg Overview 3 / 109 Code obfuscation what is it? Informally,
More information2IS55 Software Evolution. Code duplication. Alexander Serebrenik
2IS55 Software Evolution Code duplication Alexander Serebrenik Assignments Assignment 2: Graded Assignment 3: Today 23:59. Assignment 4 already open. Code duplication Deadline: March 30, 2011, 23:59. /
More informationCSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures
CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures Dan Grossman Fall 2014 Hi! I m not Hal J I love this stuff and have taught
More informationThe Dynamic Typing Interlude
CHAPTER 6 The Dynamic Typing Interlude In the prior chapter, we began exploring Python s core object types in depth with a look at Python numbers. We ll resume our object type tour in the next chapter,
More informationBehavior models and verification Lecture 6
Behavior models and verification Lecture 6 http://d3s.mff.cuni.cz Jan Kofroň, František Plášil Model checking For a Kripke structure M = (S, I, R, L) over AP and a (state based) temporal logic formula
More informationFormal Systems II: Applications
Formal Systems II: Applications Functional Verification of Java Programs: Java Dynamic Logic Bernhard Beckert Mattias Ulbrich SS 2017 KIT INSTITUT FÜR THEORETISCHE INFORMATIK KIT University of the State
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationProblem Set 4. Danfei Xu CS 231A March 9th, (Courtesy of last year s slides)
Problem Set 4 Danfei Xu CS 231A March 9th, 2018 (Courtesy of last year s slides) Outline Part 1: Facial Detection via HoG Features + SVM Classifier Part 2: Image Segmentation with K-Means and Meanshift
More information2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS
Outline catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } Balanced Search Trees 2-3 Trees 2-3-4 Trees Slide 4 Why care about advanced implementations? Same entries, different insertion sequence:
More informationAssertions. Assertions - Example
References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 11/13/2003 1 Assertions Statements about input to a routine or state of a class Have two primary roles As documentation,
More informationCrit-bit Trees. Adam Langley (Version )
CRITBIT CWEB OUTPUT 1 Crit-bit Trees Adam Langley (agl@imperialviolet.org) (Version 20080926) 1. Introduction This code is taken from Dan Bernstein s qhasm and implements a binary crit-bit (alsa known
More informationSolving Problems Flow Control in C++ CS 16: Solving Problems with Computers I Lecture #3
Solving Problems Flow Control in C++ CS 16: Solving Problems with Computers I Lecture #3 Ziad Matni Dept. of Computer Science, UCSB A Word About Registration for CS16 FOR THOSE OF YOU NOT YET REGISTERED:
More informationApplications of Formal Verification
Applications of Formal Verification Model Checking: Introduction to PROMELA Bernhard Beckert Mattias Ulbrich SS 2017 KIT INSTITUT FÜR THEORETISCHE INFORMATIK KIT University of the State of Baden-Württemberg
More informationTwo-Level Logic Optimization ( Introduction to Computer-Aided Design) School of EECS Seoul National University
Two-Level Logic Optimization (4541.554 Introduction to Computer-Aided Design) School of EECS Seoul National University Minimization of Two-Level Functions Goals: Minimize cover cardinality Minimize number
More informationTable 1 lists the projects and teams. If you want to, you can switch teams with other students.
University of Arizona, Department of Computer Science CSc 620 Assignment 3 40% Christian Collberg August 27, 2008 1 Introduction This is your main project for the class. The project is worth 40% of your
More informationLexical and Syntax Analysis. Top-Down Parsing
Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax
More informationCSc 520 Principles of Programming Languages
CSc 520 Principles of Programming Languages 16 : Types Polymorphism Christian Collberg collberg+520@gmail.com Department of Computer Science University of Arizona Copyright c 2008 Christian Collberg [1]
More information