Parsing All of C by Taming the Preprocessor. Robert Grimm, New York University Joint Work with Paul Gazzillo
|
|
- Lenard Tate
- 5 years ago
- Views:
Transcription
1 Parsing All of C by Taming the Preprocessor Robert Grimm, New York University Joint Work with Paul Gazzillo
2 Our Research Goal Better Tools for C 2
3 The Good Operating systems ios, OS X, Linux, Free/Open/Net BSD Databases SQLite, Berkeley DB, IBM DB2 Internet servers Apache, Bind, Sendmail 3
4 The Bad > emacs main.c > gcc main.c >./a.out > gdb a.out 4
5 We Need Better Tools Source browsing 7,500+ compilation units, 5.5 million lines [x86 Linux ] 5
6 We Need Better Tools Source browsing 7,500+ compilation units, 5.5 million lines [x86 Linux ] Bug finding 1,000+ bugs in single static analysis study [Chou et al., SOSP 01] 6
7 We Need Better Tools Source browsing 7,500+ compilation units, 5.5 million lines [x86 Linux ] Bug finding 1,000+ bugs in single static analysis study [Chou et al., SOSP 01] Automated refactoring 150+ bugs due to manual API evolution [Padioleau et al., EuroSys 08] 7
8 We Need Better Tools Source browsing Cxref, LXR Bug finding Astrée, Coverity Automated refactoring Apple s Xcode, Eclipse CDT 8
9 All These Tools Need to Parse Code First... 9
10 But Code Is Written in Two Languages! 10
11 The Fall From Grace Preprocessor enriches C Static conditionals (#if, #ifdef, ) Capture variability Macros (#define) Enable code transformation File includes (#include) Provide rudimentary modularity 11
12 The Fall From Grace Preprocessor obfuscates C Static conditionals (#if, #ifdef, ) Code has many meanings Macros (#define) Code doesn t mean what s written File includes (#include) Code is incomplete 12
13 The Fall From Grace Preprocessor obfuscates C Static conditionals (#if, #ifdef, ) Code has many meanings Macros (#define) Code doesn t mean what s written File includes (#include) Code is incomplete Preprocessor only works on tokens, not C constructs 13
14 Back to Eden? One configuration at a time (preprocess first) Causes exponential explosion Incomplete heuristic parsing algorithm Works, if you don t write the wrong code Plug-in architecture for further CPP idioms Creates arms race: tool builders vs developers 14
15 More of the Same Adams et al., AOSD 09 Akers et al., WCRE 05 Badros and Notkin, SP&E 00 Baxter and Mehlich, WCRE 01 Favre, IWPC 97 Garrido and Johnson, ICSM 05 McCloskey and Brewer, ESEC 05 Padioleau, CC 09 Spinellis, TSE 03 Vittek, CSMR 03 15
16 Not Quite the Same MAPR by Platoff et al., ICSM 91 TypeChef by Kästner et al., OOPSLA 11 16
17 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 17
18 #include major.h static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 18
19 #define MISC_MAJOR 10 #include major.h static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 19
20 #define MISC_MAJOR 10 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 20
21 #define MISC_MAJOR 10 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 20
22 #define MISC_MAJOR 10 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 21
23 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 22
24 Fork subparsers on conditional static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 22
25 Fork subparsers on conditional Parse entire static int if-then-else mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 22
26 Fork subparsers on conditional Parse entire static int if-then-else mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) else i = 31; Parse just assignment i = iminor(inode) - 32; } 22
27 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) else i = 31; Merge and create static choice node i = iminor(inode) - 32; } Static Choice INPUT PSAUX If-Then-Else!INPUT PSAUX Assignment 23
28 Solution Approach 1. Lex to produce tokens (as before) 2. Preprocess while preserving conditionals 3. Parse across conditionals 24
29 Language Construct Lexer Layout Preprocessor Macro (Un)Definition Implementation Annotate tokens Surrounded by Contain Contain Multiply- Conditionals Conditionals Defined Macros Other Use conditional Add multiple entries Do not expand Trim infeasible entries macro table to macro table until invocation on redefinition Object-Like Expand all Ignore infeasible Expand nested Get ground truth for Macro Invocations definitions definitions macros built-ins from compiler Function-Like Expand all Ignore infeasible Hoist conditionals Expand nested Support di ering argument Macro Invocations definitions definitions around invocations macros numbers and variadics Token Pasting & Stringification File Includes Static Conditionals Conditional Expressions Error Directives Line, Warning, & Pragma Directives Parser Apply pasting & stringification Hoist conditionals around token pasting & stringification Include and Preprocess under Hoist conditionals Reinclude when guard preprocess files presence conditions around includes macro is not false Preprocess all branches Conjoin presence conditions Ignore infeasible definitions Evaluate presence Hoist conditionals Preserve order for nonconditions around expressions boolean expressions Ignore erroneous branches Treat as layout C Constructs Use FMLR Parser Fork and merge subparsers Typedef Names Use conditional Add multiple entries Fork subparsers on symbol table to symbol table ambiguous names 25
30 Three Key Surrounded by Techniques Contain Contain Multiply- Language Construct Implementation Other Conditionals Conditionals Defined Macros Lexer Layout Annotate tokens Preprocessor Macro (Un)Definition Use conditional Add multiple entries Do not expand Trim infeasible entries macro table to macro table until invocation on redefinition Track presence conditions Object-Like Expand all Ignore infeasible Expand nested Get ground truth for Macro Invocations definitions definitions macros built-ins from compiler Function-Like Expand all Ignore infeasible Hoist conditionals Expand nested Support di ering argument Macro Invocations definitions definitions around invocations macros numbers and variadics In preprocessor and parser Token Pasting & Hoist conditionals around Apply pasting & stringification Stringification token pasting & stringification File Includes Hoist conditionals Static Conditionals Include and Preprocess under Hoist conditionals Reinclude when guard preprocess files presence conditions around includes macro is not false Preprocess all branches Conjoin presence conditions Ignore infeasible definitions Evaluate presence Hoist conditionals Preserve order for nonconditions around expressions boolean expressions Conditional Expressions In preprocessor Error Directives Ignore erroneous branches Line, Warning, & Treat as Pragma Directives layout Parser Contain number of forked subparsers C Constructs Use FMLR Parser Fork and merge subparsers Typedef Names In parser Use conditional Add multiple entries Fork subparsers on symbol table to symbol table ambiguous names 25
31 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 26
32 Track Presence Conditions Do not evaluate conditional expressions Rather translate to boolean functions (BDDs) Enables reasoning about configurations Including equivalent and invalid branches Four base cases Constants, free macros, defined operator Straight-forward case analysis Arithmetic expressions Treat as opaque: no known efficient algorithm 27
33 Track Presence Conditions Maintain conditional and ordered macro table Contains all definitions of a macro Invocation is an implicit conditional #ifdef CONFIG_64BIT #define BITS_PER_LONG 64 #else #define BITS_PER_LONG 32 28
34 Track Presence Conditions Maintain conditional and ordered macro table Contains all definitions of a macro Invocation is an implicit conditional #ifdef CONFIG_64BIT #define BITS_PER_LONG 64 #else #define BITS_PER_LONG 32 BITS_PER _LONG 64BIT!64BIT
35 Hoist Conditionals Preprocessor invocations do not compose Directives are exactly one source line Operators only work on tokens le ## BITS_PER_LONG 29
36 Hoist Conditionals Preprocessor invocations do not compose Directives are exactly one source line Operators only work on tokens le ## BITS_PER_LONG Macro expands to conditional 29
37 Hoist Conditionals Preprocessor invocations do not compose Directives are exactly one source line Operators only work on tokens le ## BITS_PER_LONG le ## #ifdef CONFIG_64BIT 64 #else 32 29
38 Hoist Conditionals le ## BITS_PER_LONG le ## #ifdef CONFIG_64BIT 64 #else 32 30
39 Hoist Conditionals One operator, two operations le ## BITS_PER_LONG le ## #ifdef CONFIG_64BIT 64 #else 32 30
40 Hoist Conditionals le ## BITS_PER_LONG Hoist conditional around pasting le ## #ifdef CONFIG_64BIT 64 #else 32 #ifdef CONFIG_64BIT le ## 64 #else le ## 32 30
41 Algorithm 1 Hoisting Conditionals 1: procedure Hoist(c, ) 2: B Initialize a new conditional with an empty branch. 3: C [(c, )] 4: for all a 2 do 5: if a is a language token then 6: B Append a to all branches in C. 7: C [(c i, i a) (c i, i ) 2 C ] 8: else B a is a conditional. 9: B Recursively hoist conditionals in each branch. 10: B [ b b 2 Hoist(c i, i ) and (c i, i ) 2 a ] 11: B Combine with already hoisted conditionals. 12: C C B 13: end if 14: end for 15: return C 16: end procedure 31
42 Algorithm 1 Hoisting Conditionals The Power of Hoisting 1: procedure Hoist(c, ) 2: B Initialize a new conditional with an empty branch. 3: C [(c, )] 4: for all a 2 do 5: if a is a language token then 6: B Append a to all branches in C. 7: C [(c i, i a) (c i, i ) 2 C ] 8: else B a is a conditional. 9: B Recursively hoist conditionals in each branch. 10: B [ b b 2 Hoist(c i, i ) and (c i, i ) 2 a ] 11: B Combine with already hoisted conditionals. 12: C C B 13: end if 14: end for 15: return C 16: end procedure Iterates over conditional branches Recurses into nested conditionals Duplicates tokens across innermost branches Works for token pasting, stringification, includes, conditional expressions, macros 31
43 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 32
44 Review: LR Parser p Finite u Control S S u p e r C Input I Stack 33
45 Review: LR Parser p Finite u Control S u p e r C Input Shift token from input to stack S I Stack 33
46 Reduce top n elements to nonterminal Review: LR Parser p Finite u Control S u p e r C Input Shift token from input to stack S I Stack 33
47 Use table for actions and transitions Reduce top n elements to nonterminal Review: LR Parser p Finite u Control S u p e r C Input Shift token from input to stack S I Stack 33
48 Why Build on LR? Has explicit state and stack Straight-forward to fork and merge Is table-driven Can make it fast Supports left-recursion But shift-reduce and reduce-reduce conflicts 34
49 Fork-Merge LR (FMLR) Input: Preprocessed source code Ordinary tokens from C language Main data structure: Subparser Head, i.e., next token LR stack 35
50 Fork-Merge LR (FMLR) Input: Preprocessed source code Ordinary tokens from C language Entire conditionals Branches contain tokens and conditionals Main data structure: Subparser Presence condition Head, i.e., next token or conditional LR stack 36
51 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
52 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
53 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
54 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
55 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
56 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
57 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
58 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37
59 Fork-Merge LR Highlights Fork subparsers from other subparsers Have different heads, presence conditions But reuse LR stack frames in a DAG 38
60 Fork-Merge LR Highlights Fork subparsers from other subparsers Have different heads, presence conditions But reuse LR stack frames in a DAG Merge subparsers if same heads and stacks Disjoin presence conditions Priority queue ensures merging happens ASAP 39
61 Fork-Merge LR Highlights Fork subparsers from other subparsers Have different heads, presence conditions But reuse LR stack frames in a DAG Merge subparsers if same heads and stacks Disjoin presence conditions Priority queue ensures merging happens ASAP Perform ordinary LR on ordinary tokens Reuse grammars and table generators 40
62 The Merge Criterion Merge only if same heads and stacks Implies same derivation of nonterminals Which also ensures a well-formed AST But may parse some code more than once #ifdef CONFIG PSAUX if ( ) i = 31; else i = ; 41
63 The Merge Criterion Merge only if same heads and stacks Implies same derivation of nonterminals Which also ensures a well-formed AST But may parse some code more than once #ifdef CONFIG PSAUX if ( ) i = 31; else i = ; Parse twice and merge thereafter 41
64 The Merge Criterion Merge only if same heads and stacks Implies same derivation of nonterminals Which also ensures a well-formed AST But may parse some code more than once #ifdef CONFIG PSAUX if ( ) i = 31; else i = ; Static Choice INPUT PSAUX!INPUT PSAUX If-Then-Else Assignment 41
65 The Serpent Is in the Details Which subparsers to fork? 42
66 Which Subparsers to Fork? Naively, subparser for each conditional branch But conditionals may Be directly nested in each other Directly follow each other Creates many unnecessary subparsers! 43
67 Which Subparsers to Fork? Insight: Real (LR) work performed on tokens Therefore, create subparser for each token Directly reachable from current position Across all configurations 44
68 Which Subparsers to Fork? Insight: Real (LR) work performed on tokens Therefore, create subparser for each token Directly reachable from current position Across all configurations Encoded in token follow-set 45
69 struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46
70 6 subparsers struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46
71 6 subparsers 3 subparsers struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46
72 struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46
73 struct rq { SCHED_HRTICK && SMP #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46
74 struct rq { SCHED_HRTICK && SMP #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif! (SCHED_HRTICK && SMP) && SCHEDSTATS #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46
75 struct rq { SCHED_HRTICK && SMP #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif! (SCHED_HRTICK && SMP) && SCHEDSTATS #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; };! (SCHED_HRTICK && SMP) &! SCHEDSTATS 46
76 Algorithm 3 The Token Follow-Set 1: procedure Follow(c, a) 2: T ; B Initialize the follow-set. 3: procedure First(c, a) 4: loop 5: if a is a language token then 6: T T [ {(c, a)} 7: return false 8: else B a is a conditional. 9: c r false B Initialize remaining condition. 10: for all (c i, i ) 2 a do 11: if i = then 12: c r c r _ c ^ c i 13: else 14: c r c r _ First(c ^ c i, i (1)) 15: end if 16: end for 17: if c r = false or a is last element in branch then 18: return c r 19: end if 20: c c r 21: a next token or conditional after a 22: end if 23: end loop 24: end procedure 25: loop 26: c First(c, a) 27: if c = false then return T end if B Done. 28: a next token or conditional after a 29: end loop 30: end procedure 47
77 struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48
78 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48
79 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif Recurse into nested conditionals #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48
80 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; Recurse into nested conditionals # endif Continue after empty branches #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48
81 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; Recurse into nested conditionals # endif Continue after empty branches #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; Stop after all reachable tokens found 48
82 The Serpent Is in the Details There is more! 49
83 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50
84 Encodes binary number! static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50
85 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50
86 Shared Reduces: Reduce same stack for several heads static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50
87 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; Shared Reduces: Reduce same stack for several heads Lazy Shifts: Delay forking for several heads 50
88 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, }; // And so on, 16 more times NULL Shared Reduces: Reduce same stack for several heads Lazy Shifts: Delay forking for several heads Early Reduces: Pick reducing before shifting subparser 50
89 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; Uses only two subparsers! 50
90 Shared Reduces & Lazy Shifts Require multi-headed subparser Several tokens and their presence conditions But only one LR stack Prioritized by first head Fork by removing one head, presence condition Merge by disjoining presence conditions For same heads, LR stack 51
91 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 52
92 SuperC Language Java Preprocessor Built from scratch Parser formalism Grammar Forking & Merging FMLR Modified Roskind s, ran through Bison Automatic AST Grammar annotations Presence conditions BDDs Complete Yes 53
93 SuperC TypeChef Language Java Java & Scala Preprocessor Built from scratch Modified existing Parser formalism FMLR LL Grammar Modified Roskind s, ran through Bison Rewritten using combinator library Forking & Merging Automatic 7 merge combinators AST Grammar annotations Semantic actions Presence conditions BDDs SAT solver Complete Yes Still missing features 54
94 And MAPR? Preprocessor is not documented Parser boils down to naive FMLR Fork on every conditional branch No other optimizations Reimplemented in SuperC as option 55
95 Who Is Our Serpent? 56
96 Who Is Our Serpent? Linux! 56
97 The Linux OS Critical to government and business Jarring flexibility and performance requirements From smart phones to cloud computing farms Large and complex Many developers with varying styles, skills 57
98 Evaluation Plan Focus on x86 Linux kernel How complex is preprocessor usage? Compare to MAPR along the way How well does SuperC perform? Compare to TypeChef along the way 58
99 Linux Preprocessor Usage Breadth of conditionals Force forking of parser states Incidence of incomplete C constructs Prevent merging of parser states 59
100 Linux Preprocessor Usage Breadth of conditionals Force forking of parser states Incidence of incomplete C constructs Prevent merging of parser states Count subparsers per FMLR loop iteration! 60
101 Cumulative Distribution of Subparser # per Iteration Fraction % 100% Shared All Follow-Set MAPR: >16,000 on 98% of C files Number of Subparsers 61
102 Cumulative Distribution of Latency per C File Max: 10.4 Max: 931 Fraction 0.50 SuperC TypeChef Latency in Seconds 62
103 Sidebar Totals, averages, and medians are deceiving We really need the entire distribution! 63
104 Latency vs File Size 12 Parse, Preprocess, Lex Latency in Seconds Size in Millions of Bytes 64
105 Future Work Fully characterize variability in Linux For all tools (Kconfig, Kbuild, Preprocessor) Over history of kernel Analyze names and types across conditionals Foundation for all language processing tools Annotate AST with layout, macros, includes Refactorings need to print source code again Modulo intended changes 65
106 Conclusions Preprocessor makes it hard to parse all of C No viable solution for 40 years! We performed a systematic analysis of problem Identified solution strategies Notably: Conditional hoisting, FMLR parsing And implemented a real-world tool, SuperC On x86 Linux kernel, SuperC is fast and scales well 66
107 SuperC 67
108 SuperC 68
109 69
110 SuperC Context Management C syntax is context-sensitive: T * p; If T is a type name, declare p as a pointer Otherwise, calculate the product of T and p Worse, C with conditionals can be ambiguous Same name may be both a type and an object Under different presence conditions 70
111 SuperC Context Management C syntax is context-sensitive: T * p; If T is a type name, declare p as a pointer Otherwise, calculate the product of T and p Worse, C with conditionals can be ambiguous Same name may be both a type and an object Under different presence conditions Fork and merge subparsers based on context! 71
112 SuperC Context Management Provide context object for each subparser Customize forking and merging Reclassify Fork context May merge Merge contexts 72
113 SuperC Context Management Provide context object for each subparser Conditional, scoped symbol table Customize forking and merging Reclassify: change into or add type name token Fork context: duplicate symbol table reference May merge: check number of nested scopes Merge contexts: combine symbol table entries 73
Enabling Variability-Aware Software Tools. Paul Gazzillo New York University
Enabling Variability-Aware Software Tools Paul Gazzillo New York University We Need Better C Tools Linux and other critical systems written in C Need source code browsers 20,000+ compilation units, 10+
More informationImplementation of Fork-Merge Parsing in OpenRefactory/C. Kavyashree Krishnappa
Implementation of Fork-Merge Parsing in OpenRefactory/C by Kavyashree Krishnappa A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationxtc Robert Grimm Making C Safely Extensible New York University
xtc Making C Safely Extensible Robert Grimm New York University The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to
More informationTypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines
TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines Paolo G. Giarrusso 04 March 2011 Software product lines (SPLs) Feature selection SPL = 1 software project
More informationJavaCC Parser. The Compilation Task. Automated? JavaCC Parser
JavaCC Parser The Compilation Task Input character stream Lexer stream Parser Abstract Syntax Tree Analyser Annotated AST Code Generator Code CC&P 2003 1 CC&P 2003 2 Automated? JavaCC Parser The initial
More informationBetter Extensibility through Modular Syntax. Robert Grimm New York University
Better Extensibility through Modular Syntax Robert Grimm New York University Syntax Matters More complex syntactic specifications Extensions to existing programming languages Transactions, event-based
More informationLECTURE 3. Compiler Phases
LECTURE 3 Compiler Phases COMPILER PHASES Compilation of a program proceeds through a fixed series of phases. Each phase uses an (intermediate) form of the program produced by an earlier phase. Subsequent
More informationCS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017
CS 426 Fall 2017 1 Machine Problem 1 Machine Problem 1 CS 426 Compiler Construction Fall Semester 2017 Handed Out: September 6, 2017. Due: September 21, 2017, 5:00 p.m. The machine problems for this semester
More informationIntroduction to Parsing. Lecture 5
Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important
More informationKmax: Finding All Configurations of Kbuild Makefiles Statically Paul Gazzillo Stevens Institute. ESEC/FSE 2017 Paderborn, Germany
Kmax: Finding All Configurations of Kbuild Makefiles Statically Paul Gazzillo Stevens Institute ESEC/FSE 2017 Paderborn, Germany Let s Talk About Makefiles 2 Variability in Linux Kbuild Kbuild is Linux
More informationRYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS
RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 STUDENT ID: INSTRUCTIONS Please write your student ID on this page. Do not write it or your name
More informationLanguage Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.
Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation
More informationSyntax-Directed Translation. Lecture 14
Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik) 9/27/2006 Prof. Hilfinger, Lecture 14 1 Motivation: parser as a translator syntax-directed translation stream of tokens parser ASTs,
More informationCS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco
CS 326 Operating Systems C Programming Greg Benson Department of Computer Science University of San Francisco Why C? Fast (good optimizing compilers) Not too high-level (Java, Python, Lisp) Not too low-level
More informationAbout the Authors... iii Introduction... xvii. Chapter 1: System Software... 1
Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...
More informationCSE 401 Midterm Exam Sample Solution 2/11/15
Question 1. (10 points) Regular expression warmup. For regular expression questions, you must restrict yourself to the basic regular expression operations covered in class and on homework assignments:
More informationCSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)
CSE 413 Languages & Implementation Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341) 1 Goals Representing programs as data Racket structs as a better way to represent
More informationIntroduction to Parsing. Lecture 5
Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important
More informationCompilers and computer architecture From strings to ASTs (2): context free grammars
1 / 1 Compilers and computer architecture From strings to ASTs (2): context free grammars Martin Berger October 2018 Recall the function of compilers 2 / 1 3 / 1 Recall we are discussing parsing Source
More informationCompiler phases. Non-tokens
Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree
More informationAnalysis Tool Project
Tool Overview The tool we chose to analyze was the Java static analysis tool FindBugs (http://findbugs.sourceforge.net/). FindBugs is A framework for writing static analyses Developed at the University
More informationSyntax Errors; Static Semantics
Dealing with Syntax Errors Syntax Errors; Static Semantics Lecture 14 (from notes by R. Bodik) One purpose of the parser is to filter out errors that show up in parsing Later stages should not have to
More informationCOMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table
COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab
More informationCSE 374 Programming Concepts & Tools
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2017 Lecture 8 C: Miscellanea Control, Declarations, Preprocessor, printf/scanf 1 The story so far The low-level execution model of a process (one
More informationTime : 1 Hour Max Marks : 30
Total No. of Questions : 6 P4890 B.E/ Insem.- 74 B.E ( Computer Engg) PRINCIPLES OF MODERN COMPILER DESIGN (2012 Pattern) (Semester I) Time : 1 Hour Max Marks : 30 Q.1 a) Explain need of symbol table with
More informationChapter 3: Lexing and Parsing
Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding
More informationCOP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher
COP4020 ming Languages Compilers and Interpreters Robert van Engelen & Chris Lacher Overview Common compiler and interpreter configurations Virtual machines Integrated development environments Compiler
More informationDDMD AND AUTOMATED CONVERSION FROM C++ TO D
1 DDMD AND AUTOMATED CONVERSION FROM C++ TO D Daniel Murphy (aka yebblies ) ABOUT ME Using D since 2009 Compiler contributor since 2011 2 OVERVIEW Why convert the frontend to D What s so hard about it
More informationRobert Grimm New York University
xtc Towards extensible C Robert Grimm New York University The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to use C
More informationThe SPL Programming Language Reference Manual
The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming
More informationCS558 Programming Languages
CS558 Programming Languages Winter 2017 Lecture 7b Andrew Tolmach Portland State University 1994-2017 Values and Types We divide the universe of values according to types A type is a set of values and
More informationFaculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology
Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False
More informationImplementing a VSOP Compiler. March 20, 2018
Implementing a VSOP Compiler Cyril SOLDANI, Pr. Pierre GEURTS March 20, 2018 Part III Semantic Analysis 1 Introduction In this assignment, you will do the semantic analysis of VSOP programs, according
More informationPartial Preprocessing C Code for Variability Analysis
Partial Preprocessing C Code for Variability Analysis Christian Kästner, Paolo G. Giarrusso, and Klaus Ostermann Philipps University Marburg Marburg, Germany ABSTRACT The C preprocessor is commonly used
More informationUniversity of Passau Department of Informatics and Mathematics Chair for Programming. Bachelor Thesis. Andreas Janker
University of Passau Department of Informatics and Mathematics Chair for Programming Bachelor Thesis Implementing Conditional Compilation Preserving Refactorings on C Code Andreas Janker Date: March 21,
More informationEDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:
EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)
More informationSemantic Analysis. Lecture 9. February 7, 2018
Semantic Analysis Lecture 9 February 7, 2018 Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average
More informationCompilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam
Compilers Type checking Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Summary of parsing Parsing A solid foundation: context-free grammars A simple parser: LL(1) A more powerful parser:
More informationUnderstanding Linux Feature Distribution
Understanding Linux Feature Distribution FOSD Meeting 2012, Braunschweig Reinhard Tartler March 22, 2012 VAMOS Research Questions What s the cause of variability and variability implementations In Systems
More informationCS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find
CS1622 Lecture 15 Semantic Analysis CS 1622 Lecture 15 1 Semantic Analysis How to build symbol tables How to use them to find multiply-declared and undeclared variables. How to perform type checking CS
More informationCS 406/534 Compiler Construction Putting It All Together
CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy
More informationIntermediate Representations & Symbol Tables
Intermediate Representations & Symbol Tables Copyright 2014, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission
More informationFaculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology
Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4303 April 9, 2010 14.00-15.30 This exam (6 pages) consists of 52 True/False
More informationUsing an LALR(1) Parser Generator
Using an LALR(1) Parser Generator Yacc is an LALR(1) parser generator Developed by S.C. Johnson and others at AT&T Bell Labs Yacc is an acronym for Yet another compiler compiler Yacc generates an integrated
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationCS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).
CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language
More informationSemantic actions for declarations and expressions
Semantic actions for declarations and expressions Semantic actions Semantic actions are routines called as productions (or parts of productions) are recognized Actions work together to build up intermediate
More informationS Y N T A X A N A L Y S I S LR
LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number
More informationCSE 431S Final Review. Washington University Spring 2013
CSE 431S Final Review Washington University Spring 2013 What You Should Know The six stages of a compiler and what each stage does. The input to and output of each compilation stage (especially the back-end).
More informationClass Information ANNOUCEMENTS
Class Information ANNOUCEMENTS Third homework due TODAY at 11:59pm. Extension? First project has been posted, due Monday October 23, 11:59pm. Midterm exam: Friday, October 27, in class. Don t forget to
More informationAbout Codefrux While the current trends around the world are based on the internet, mobile and its applications, we try to make the most out of it. As for us, we are a well established IT professionals
More informationParsing II Top-down parsing. Comp 412
COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationCS558 Programming Languages
CS558 Programming Languages Fall 2016 Lecture 3a Andrew Tolmach Portland State University 1994-2016 Formal Semantics Goal: rigorous and unambiguous definition in terms of a wellunderstood formalism (e.g.
More informationflex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.
flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides
More informationSyntax-Directed Translation
Syntax-Directed Translation ALSU Textbook Chapter 5.1 5.4, 4.8, 4.9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 What is syntax-directed translation? Definition: The compilation
More informationThe PCAT Programming Language Reference Manual
The PCAT Programming Language Reference Manual Andrew Tolmach and Jingke Li Dept. of Computer Science Portland State University September 27, 1995 (revised October 15, 2002) 1 Introduction The PCAT language
More informationAbout the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design
i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target
More informationCS558 Programming Languages
CS558 Programming Languages Fall 2017 Lecture 3a Andrew Tolmach Portland State University 1994-2017 Binding, Scope, Storage Part of being a high-level language is letting the programmer name things: variables
More informationModules, Structs, Hashes, and Operational Semantics
CS 152: Programming Language Paradigms Modules, Structs, Hashes, and Operational Semantics Prof. Tom Austin San José State University Lab Review (in-class) Modules Review Modules from HW 1 (in-class) How
More informationCS164: Midterm I. Fall 2003
CS164: Midterm I Fall 2003 Please read all instructions (including these) carefully. Write your name, login, and circle the time of your section. Read each question carefully and think about what s being
More informationSemantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End
Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors
More informationCS5363 Final Review. cs5363 1
CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers
More informationCSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures
CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures Dan Grossman Fall 2014 Hi! I m not Hal J I love this stuff and have taught
More informationxtc Towards an extensbile Compiler Step 1: The Parser Robert Grimm New York University
xtc Towards an extensbile Compiler Step 1: The Parser Robert Grimm New York University Motivation Wide-spread need for extensible languages and compilers Systems researchers libasync to build asynchronous,
More informationCode Structure Visualization
TECHNISCHE UNIVERSITEIT EINDHOVEN Department of Mathematics and Computer Science MASTER S THESIS Code Structure Visualization by G.L.P.M. Lommerse Supervisor: Dr. Ir. A.C. Telea (TUE) Eindhoven, August
More informationLet us construct the LR(1) items for the grammar given below to construct the LALR parsing table.
MODULE 18 LALR parsing After understanding the most powerful CALR parser, in this module we will learn to construct the LALR parser. The CALR parser has a large set of items and hence the LALR parser is
More informationCSE P 501 Exam Sample Solution 12/1/11
Question 1. (10 points, 5 each) Regular expressions. Give regular expressions that generate the following sets of strings. You may only use the basic operations of concatenation, choice ( ), and repetition
More informationCompiler, Assembler, and Linker
Compiler, Assembler, and Linker Minsoo Ryu Department of Computer Science and Engineering Hanyang University msryu@hanyang.ac.kr What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents
More informationASTEC: A New Approach to Refactoring C
ASTEC: A New Approach to Refactoring C Bill McCloskey Computer Science Division, EECS University of California, Berkeley {billm,brewer}@cs.berkeley.edu Eric Brewer ABSTRACT The C language is among the
More informationUNIT-4 (COMPILER DESIGN)
UNIT-4 (COMPILER DESIGN) An important part of any compiler is the construction and maintenance of a dictionary containing names and their associated values, such type of dictionary is called a symbol table.
More information2068 (I) Attempt all questions.
2068 (I) 1. What do you mean by compiler? How source program analyzed? Explain in brief. 2. Discuss the role of symbol table in compiler design. 3. Convert the regular expression 0 + (1 + 0)* 00 first
More informationCS558 Programming Languages
CS558 Programming Languages Winter 2017 Lecture 4a Andrew Tolmach Portland State University 1994-2017 Semantics and Erroneous Programs Important part of language specification is distinguishing valid from
More informationLexical Considerations
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a
More informationCompiler construction in4020 lecture 5
Compiler construction in4020 lecture 5 Semantic analysis Assignment #1 Chapter 6.1 Overview semantic analysis identification symbol tables type checking CS assignment yacc LLgen language grammar parser
More informationCSE 401 Midterm Exam Sample Solution 11/4/11
Question 1. (12 points, 2 each) The front end of a compiler consists of three parts: scanner, parser, and (static) semantics. Collectively these need to analyze the input program and decide if it is correctly
More informationAlternatives for semantic processing
Semantic Processing Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies
More informationStructure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.
More detailed overview of compiler front end Structure of a compiler Today we ll take a quick look at typical parts of a compiler. This is to give a feeling for the overall structure. source program lexical
More informationError Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:
Error Detection in LALR Parsers In bottom-up, LALR parsers syntax errors are discovered when a blank (error) entry is fetched from the parser action table. Let s again trace how the following illegal CSX-lite
More informationRecitation: Cache Lab & C
15-213 Recitation: Cache Lab & C Jack Biggs 16 Feb 2015 Agenda Buffer Lab! C Exercises! C Conventions! C Debugging! Version Control! Compilation! Buffer Lab... Is due soon. So maybe do it soon Agenda Buffer
More informationUNIVERSITY OF CALIFORNIA
UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS164 Fall 1997 P. N. Hilfinger CS 164: Midterm Name: Please do not discuss the contents of
More informationSEMANTIC ANALYSIS TYPES AND DECLARATIONS
SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether
More informationCS 432 Fall Mike Lam, Professor. Code Generation
CS 432 Fall 2015 Mike Lam, Professor Code Generation Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00 00
More informationComputing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.
COMP 412 FALL 2017 Computing Inside The Parser Syntax-Directed Translation Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights
More informationCSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018
CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures Dan Grossman Autumn 2018 Typical workflow concrete syntax (string) "(fn x => x + x) 4" Parsing Possible errors / warnings
More informationCS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer
CS664 Compiler Theory and Design LIU 1 of 16 ANTLR Christopher League* 17 February 2016 ANTLR is a parser generator. There are other similar tools, such as yacc, flex, bison, etc. We ll be using ANTLR
More informationChapter 4. Abstract Syntax
Chapter 4 Abstract Syntax Outline compiler must do more than recognize whether a sentence belongs to the language of a grammar it must do something useful with that sentence. The semantic actions of a
More informationAgenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion
CS 61C: Great Ideas in Computer Architecture Lecture 2: Numbers & C Language Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Numbers wrap-up This is not on the exam! Break C Primer Administrivia,
More informationCS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz
CS 61C: Great Ideas in Computer Architecture Lecture 2: Numbers & C Language Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Numbers wrap-up This is not on the exam! Break C Primer Administrivia,
More information1 Lexical Considerations
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler
More informationLL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012
Predictive Parsers LL(k) Parsing Can we avoid backtracking? es, if for a given input symbol and given nonterminal, we can choose the alternative appropriately. his is possible if the first terminal of
More informationParser Tools: lex and yacc-style Parsing
Parser Tools: lex and yacc-style Parsing Version 5.0 Scott Owens June 6, 2010 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1 Creating
More informationGrammars & Parsing. Lecture 12 CS 2112 Fall 2018
Grammars & Parsing Lecture 12 CS 2112 Fall 2018 Motivation The cat ate the rat. The cat ate the rat slowly. The small cat ate the big rat slowly. The small cat ate the big rat on the mat slowly. The small
More informationConcepts of programming languages
Concepts of programming languages Lecture 7 Wouter Swierstra 1 Last time Relating evaluation and types How to handle variable binding in embedded languages? 2 DSLs: approaches A stand-alone DSL typically
More informationRYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 INSTRUCTIONS
RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 STUDENT ID: INSTRUCTIONS Please write your student ID on this page. Do not write it or your name
More informationCSE 12 Abstract Syntax Trees
CSE 12 Abstract Syntax Trees Compilers and Interpreters Parse Trees and Abstract Syntax Trees (AST's) Creating and Evaluating AST's The Table ADT and Symbol Tables 16 Using Algorithms and Data Structures
More informationAppendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.
A Appendix Grammar There is no worse danger for a teacher than to teach words instead of things. Marc Block Introduction keywords lexical conventions programs expressions statements declarations declarators
More informationCompiler Theory. (GCC the GNU Compiler Collection) Sandro Spina 2009
Compiler Theory (GCC the GNU Compiler Collection) Sandro Spina 2009 GCC Probably the most used compiler. Not only a native compiler but it can also cross-compile any program, producing executables for
More informationGujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)
Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov 2014 Compiler Design (170701) Question Bank / Assignment Unit 1: INTRODUCTION TO COMPILING
More informationUNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger
UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 164 Spring 2010 P. N. Hilfinger CS 164: Final Examination (revised) Name: Login: You have
More information