Parsing All of C by Taming the Preprocessor. Robert Grimm, New York University Joint Work with Paul Gazzillo

Size: px
Start display at page:

Download "Parsing All of C by Taming the Preprocessor. Robert Grimm, New York University Joint Work with Paul Gazzillo"

Transcription

1 Parsing All of C by Taming the Preprocessor Robert Grimm, New York University Joint Work with Paul Gazzillo

2 Our Research Goal Better Tools for C 2

3 The Good Operating systems ios, OS X, Linux, Free/Open/Net BSD Databases SQLite, Berkeley DB, IBM DB2 Internet servers Apache, Bind, Sendmail 3

4 The Bad > emacs main.c > gcc main.c >./a.out > gdb a.out 4

5 We Need Better Tools Source browsing 7,500+ compilation units, 5.5 million lines [x86 Linux ] 5

6 We Need Better Tools Source browsing 7,500+ compilation units, 5.5 million lines [x86 Linux ] Bug finding 1,000+ bugs in single static analysis study [Chou et al., SOSP 01] 6

7 We Need Better Tools Source browsing 7,500+ compilation units, 5.5 million lines [x86 Linux ] Bug finding 1,000+ bugs in single static analysis study [Chou et al., SOSP 01] Automated refactoring 150+ bugs due to manual API evolution [Padioleau et al., EuroSys 08] 7

8 We Need Better Tools Source browsing Cxref, LXR Bug finding Astrée, Coverity Automated refactoring Apple s Xcode, Eclipse CDT 8

9 All These Tools Need to Parse Code First... 9

10 But Code Is Written in Two Languages! 10

11 The Fall From Grace Preprocessor enriches C Static conditionals (#if, #ifdef, ) Capture variability Macros (#define) Enable code transformation File includes (#include) Provide rudimentary modularity 11

12 The Fall From Grace Preprocessor obfuscates C Static conditionals (#if, #ifdef, ) Code has many meanings Macros (#define) Code doesn t mean what s written File includes (#include) Code is incomplete 12

13 The Fall From Grace Preprocessor obfuscates C Static conditionals (#if, #ifdef, ) Code has many meanings Macros (#define) Code doesn t mean what s written File includes (#include) Code is incomplete Preprocessor only works on tokens, not C constructs 13

14 Back to Eden? One configuration at a time (preprocess first) Causes exponential explosion Incomplete heuristic parsing algorithm Works, if you don t write the wrong code Plug-in architecture for further CPP idioms Creates arms race: tool builders vs developers 14

15 More of the Same Adams et al., AOSD 09 Akers et al., WCRE 05 Badros and Notkin, SP&E 00 Baxter and Mehlich, WCRE 01 Favre, IWPC 97 Garrido and Johnson, ICSM 05 McCloskey and Brewer, ESEC 05 Padioleau, CC 09 Spinellis, TSE 03 Vittek, CSMR 03 15

16 Not Quite the Same MAPR by Platoff et al., ICSM 91 TypeChef by Kästner et al., OOPSLA 11 16

17 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 17

18 #include major.h static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 18

19 #define MISC_MAJOR 10 #include major.h static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 19

20 #define MISC_MAJOR 10 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 20

21 #define MISC_MAJOR 10 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == MISC_MAJOR) i = 31; else i = iminor(inode) - 32; } 20

22 #define MISC_MAJOR 10 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 21

23 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 22

24 Fork subparsers on conditional static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 22

25 Fork subparsers on conditional Parse entire static int if-then-else mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) i = 31; else i = iminor(inode) - 32; } 22

26 Fork subparsers on conditional Parse entire static int if-then-else mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) else i = 31; Parse just assignment i = iminor(inode) - 32; } 22

27 static int mousedev_open( ) { #ifdef CONFIG_INPUT_MOUSEDEV_PSAUX if (imajor(inode) == 10) else i = 31; Merge and create static choice node i = iminor(inode) - 32; } Static Choice INPUT PSAUX If-Then-Else!INPUT PSAUX Assignment 23

28 Solution Approach 1. Lex to produce tokens (as before) 2. Preprocess while preserving conditionals 3. Parse across conditionals 24

29 Language Construct Lexer Layout Preprocessor Macro (Un)Definition Implementation Annotate tokens Surrounded by Contain Contain Multiply- Conditionals Conditionals Defined Macros Other Use conditional Add multiple entries Do not expand Trim infeasible entries macro table to macro table until invocation on redefinition Object-Like Expand all Ignore infeasible Expand nested Get ground truth for Macro Invocations definitions definitions macros built-ins from compiler Function-Like Expand all Ignore infeasible Hoist conditionals Expand nested Support di ering argument Macro Invocations definitions definitions around invocations macros numbers and variadics Token Pasting & Stringification File Includes Static Conditionals Conditional Expressions Error Directives Line, Warning, & Pragma Directives Parser Apply pasting & stringification Hoist conditionals around token pasting & stringification Include and Preprocess under Hoist conditionals Reinclude when guard preprocess files presence conditions around includes macro is not false Preprocess all branches Conjoin presence conditions Ignore infeasible definitions Evaluate presence Hoist conditionals Preserve order for nonconditions around expressions boolean expressions Ignore erroneous branches Treat as layout C Constructs Use FMLR Parser Fork and merge subparsers Typedef Names Use conditional Add multiple entries Fork subparsers on symbol table to symbol table ambiguous names 25

30 Three Key Surrounded by Techniques Contain Contain Multiply- Language Construct Implementation Other Conditionals Conditionals Defined Macros Lexer Layout Annotate tokens Preprocessor Macro (Un)Definition Use conditional Add multiple entries Do not expand Trim infeasible entries macro table to macro table until invocation on redefinition Track presence conditions Object-Like Expand all Ignore infeasible Expand nested Get ground truth for Macro Invocations definitions definitions macros built-ins from compiler Function-Like Expand all Ignore infeasible Hoist conditionals Expand nested Support di ering argument Macro Invocations definitions definitions around invocations macros numbers and variadics In preprocessor and parser Token Pasting & Hoist conditionals around Apply pasting & stringification Stringification token pasting & stringification File Includes Hoist conditionals Static Conditionals Include and Preprocess under Hoist conditionals Reinclude when guard preprocess files presence conditions around includes macro is not false Preprocess all branches Conjoin presence conditions Ignore infeasible definitions Evaluate presence Hoist conditionals Preserve order for nonconditions around expressions boolean expressions Conditional Expressions In preprocessor Error Directives Ignore erroneous branches Line, Warning, & Treat as Pragma Directives layout Parser Contain number of forked subparsers C Constructs Use FMLR Parser Fork and merge subparsers Typedef Names In parser Use conditional Add multiple entries Fork subparsers on symbol table to symbol table ambiguous names 25

31 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 26

32 Track Presence Conditions Do not evaluate conditional expressions Rather translate to boolean functions (BDDs) Enables reasoning about configurations Including equivalent and invalid branches Four base cases Constants, free macros, defined operator Straight-forward case analysis Arithmetic expressions Treat as opaque: no known efficient algorithm 27

33 Track Presence Conditions Maintain conditional and ordered macro table Contains all definitions of a macro Invocation is an implicit conditional #ifdef CONFIG_64BIT #define BITS_PER_LONG 64 #else #define BITS_PER_LONG 32 28

34 Track Presence Conditions Maintain conditional and ordered macro table Contains all definitions of a macro Invocation is an implicit conditional #ifdef CONFIG_64BIT #define BITS_PER_LONG 64 #else #define BITS_PER_LONG 32 BITS_PER _LONG 64BIT!64BIT

35 Hoist Conditionals Preprocessor invocations do not compose Directives are exactly one source line Operators only work on tokens le ## BITS_PER_LONG 29

36 Hoist Conditionals Preprocessor invocations do not compose Directives are exactly one source line Operators only work on tokens le ## BITS_PER_LONG Macro expands to conditional 29

37 Hoist Conditionals Preprocessor invocations do not compose Directives are exactly one source line Operators only work on tokens le ## BITS_PER_LONG le ## #ifdef CONFIG_64BIT 64 #else 32 29

38 Hoist Conditionals le ## BITS_PER_LONG le ## #ifdef CONFIG_64BIT 64 #else 32 30

39 Hoist Conditionals One operator, two operations le ## BITS_PER_LONG le ## #ifdef CONFIG_64BIT 64 #else 32 30

40 Hoist Conditionals le ## BITS_PER_LONG Hoist conditional around pasting le ## #ifdef CONFIG_64BIT 64 #else 32 #ifdef CONFIG_64BIT le ## 64 #else le ## 32 30

41 Algorithm 1 Hoisting Conditionals 1: procedure Hoist(c, ) 2: B Initialize a new conditional with an empty branch. 3: C [(c, )] 4: for all a 2 do 5: if a is a language token then 6: B Append a to all branches in C. 7: C [(c i, i a) (c i, i ) 2 C ] 8: else B a is a conditional. 9: B Recursively hoist conditionals in each branch. 10: B [ b b 2 Hoist(c i, i ) and (c i, i ) 2 a ] 11: B Combine with already hoisted conditionals. 12: C C B 13: end if 14: end for 15: return C 16: end procedure 31

42 Algorithm 1 Hoisting Conditionals The Power of Hoisting 1: procedure Hoist(c, ) 2: B Initialize a new conditional with an empty branch. 3: C [(c, )] 4: for all a 2 do 5: if a is a language token then 6: B Append a to all branches in C. 7: C [(c i, i a) (c i, i ) 2 C ] 8: else B a is a conditional. 9: B Recursively hoist conditionals in each branch. 10: B [ b b 2 Hoist(c i, i ) and (c i, i ) 2 a ] 11: B Combine with already hoisted conditionals. 12: C C B 13: end if 14: end for 15: return C 16: end procedure Iterates over conditional branches Recurses into nested conditionals Duplicates tokens across innermost branches Works for token pasting, stringification, includes, conditional expressions, macros 31

43 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 32

44 Review: LR Parser p Finite u Control S S u p e r C Input I Stack 33

45 Review: LR Parser p Finite u Control S u p e r C Input Shift token from input to stack S I Stack 33

46 Reduce top n elements to nonterminal Review: LR Parser p Finite u Control S u p e r C Input Shift token from input to stack S I Stack 33

47 Use table for actions and transitions Reduce top n elements to nonterminal Review: LR Parser p Finite u Control S u p e r C Input Shift token from input to stack S I Stack 33

48 Why Build on LR? Has explicit state and stack Straight-forward to fork and merge Is table-driven Can make it fast Supports left-recursion But shift-reduce and reduce-reduce conflicts 34

49 Fork-Merge LR (FMLR) Input: Preprocessed source code Ordinary tokens from C language Main data structure: Subparser Head, i.e., next token LR stack 35

50 Fork-Merge LR (FMLR) Input: Preprocessed source code Ordinary tokens from C language Entire conditionals Branches contain tokens and conditionals Main data structure: Subparser Presence condition Head, i.e., next token or conditional LR stack 36

51 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

52 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

53 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

54 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

55 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

56 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

57 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

58 Algorithm 2 Fork-Merge LR Parsing 1: procedure Parse(a 0 ) 2: Q.init((true, a 0, s 0 )) B The initial subparser for a 0. 3: while Q, ; do 4: p Q.pull() B Step the next subparser. 5: T Follow(p.c, p.a) 6: if T = 1 then 7: B Do an LR action and reschedule the subparser. 8: Q.insert(LR(T(1), p)) 9: else B The follow-set contains several tokens. 10: B Fork subparsers and reschedule them. 11: Q.insertAll(Fork(T, p)) 12: end if 13: Q Merge(Q) 14: end while 15: end procedure 37

59 Fork-Merge LR Highlights Fork subparsers from other subparsers Have different heads, presence conditions But reuse LR stack frames in a DAG 38

60 Fork-Merge LR Highlights Fork subparsers from other subparsers Have different heads, presence conditions But reuse LR stack frames in a DAG Merge subparsers if same heads and stacks Disjoin presence conditions Priority queue ensures merging happens ASAP 39

61 Fork-Merge LR Highlights Fork subparsers from other subparsers Have different heads, presence conditions But reuse LR stack frames in a DAG Merge subparsers if same heads and stacks Disjoin presence conditions Priority queue ensures merging happens ASAP Perform ordinary LR on ordinary tokens Reuse grammars and table generators 40

62 The Merge Criterion Merge only if same heads and stacks Implies same derivation of nonterminals Which also ensures a well-formed AST But may parse some code more than once #ifdef CONFIG PSAUX if ( ) i = 31; else i = ; 41

63 The Merge Criterion Merge only if same heads and stacks Implies same derivation of nonterminals Which also ensures a well-formed AST But may parse some code more than once #ifdef CONFIG PSAUX if ( ) i = 31; else i = ; Parse twice and merge thereafter 41

64 The Merge Criterion Merge only if same heads and stacks Implies same derivation of nonterminals Which also ensures a well-formed AST But may parse some code more than once #ifdef CONFIG PSAUX if ( ) i = 31; else i = ; Static Choice INPUT PSAUX!INPUT PSAUX If-Then-Else Assignment 41

65 The Serpent Is in the Details Which subparsers to fork? 42

66 Which Subparsers to Fork? Naively, subparser for each conditional branch But conditionals may Be directly nested in each other Directly follow each other Creates many unnecessary subparsers! 43

67 Which Subparsers to Fork? Insight: Real (LR) work performed on tokens Therefore, create subparser for each token Directly reachable from current position Across all configurations 44

68 Which Subparsers to Fork? Insight: Real (LR) work performed on tokens Therefore, create subparser for each token Directly reachable from current position Across all configurations Encoded in token follow-set 45

69 struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46

70 6 subparsers struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46

71 6 subparsers 3 subparsers struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46

72 struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46

73 struct rq { SCHED_HRTICK && SMP #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46

74 struct rq { SCHED_HRTICK && SMP #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif! (SCHED_HRTICK && SMP) && SCHEDSTATS #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 46

75 struct rq { SCHED_HRTICK && SMP #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif! (SCHED_HRTICK && SMP) && SCHEDSTATS #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; };! (SCHED_HRTICK && SMP) &! SCHEDSTATS 46

76 Algorithm 3 The Token Follow-Set 1: procedure Follow(c, a) 2: T ; B Initialize the follow-set. 3: procedure First(c, a) 4: loop 5: if a is a language token then 6: T T [ {(c, a)} 7: return false 8: else B a is a conditional. 9: c r false B Initialize remaining condition. 10: for all (c i, i ) 2 a do 11: if i = then 12: c r c r _ c ^ c i 13: else 14: c r c r _ First(c ^ c i, i (1)) 15: end if 16: end for 17: if c r = false or a is last element in branch then 18: return c r 19: end if 20: c c r 21: a next token or conditional after a 22: end if 23: end loop 24: end procedure 25: loop 26: c First(c, a) 27: if c = false then return T end if B Done. 28: a next token or conditional after a 29: end loop 30: end procedure 47

77 struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48

78 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48

79 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; # endif Recurse into nested conditionals #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48

80 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; Recurse into nested conditionals # endif Continue after empty branches #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; 48

81 Find first token of each branch struct rq { #ifdef CONFIG_SCHED_HRTICK # ifdef CONFIG_SMP int hrtick_csd_pending; Recurse into nested conditionals # endif Continue after empty branches #ifdef CONFIG_SCHEDSTATS struct sched_info rq_sched_info; }; Stop after all reachable tokens found 48

82 The Serpent Is in the Details There is more! 49

83 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50

84 Encodes binary number! static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50

85 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50

86 Shared Reduces: Reduce same stack for several heads static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; 50

87 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; Shared Reduces: Reduce same stack for several heads Lazy Shifts: Delay forking for several heads 50

88 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, }; // And so on, 16 more times NULL Shared Reduces: Reduce same stack for several heads Lazy Shifts: Delay forking for several heads Early Reduces: Pick reducing before shifting subparser 50

89 static int (*check_part[])(struct *) = { #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ics, #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_powertec, // And so on, 16 more times NULL }; Uses only two subparsers! 50

90 Shared Reduces & Lazy Shifts Require multi-headed subparser Several tokens and their presence conditions But only one LR stack Prioritized by first head Fork by removing one head, presence condition Merge by disjoining presence conditions For same heads, LR stack 51

91 This Talk Introduction Problem and Solution Approach The Configuration-Preserving Preprocessor The Configuration-Preserving Parser Our Tool and Evaluation Conclusion 52

92 SuperC Language Java Preprocessor Built from scratch Parser formalism Grammar Forking & Merging FMLR Modified Roskind s, ran through Bison Automatic AST Grammar annotations Presence conditions BDDs Complete Yes 53

93 SuperC TypeChef Language Java Java & Scala Preprocessor Built from scratch Modified existing Parser formalism FMLR LL Grammar Modified Roskind s, ran through Bison Rewritten using combinator library Forking & Merging Automatic 7 merge combinators AST Grammar annotations Semantic actions Presence conditions BDDs SAT solver Complete Yes Still missing features 54

94 And MAPR? Preprocessor is not documented Parser boils down to naive FMLR Fork on every conditional branch No other optimizations Reimplemented in SuperC as option 55

95 Who Is Our Serpent? 56

96 Who Is Our Serpent? Linux! 56

97 The Linux OS Critical to government and business Jarring flexibility and performance requirements From smart phones to cloud computing farms Large and complex Many developers with varying styles, skills 57

98 Evaluation Plan Focus on x86 Linux kernel How complex is preprocessor usage? Compare to MAPR along the way How well does SuperC perform? Compare to TypeChef along the way 58

99 Linux Preprocessor Usage Breadth of conditionals Force forking of parser states Incidence of incomplete C constructs Prevent merging of parser states 59

100 Linux Preprocessor Usage Breadth of conditionals Force forking of parser states Incidence of incomplete C constructs Prevent merging of parser states Count subparsers per FMLR loop iteration! 60

101 Cumulative Distribution of Subparser # per Iteration Fraction % 100% Shared All Follow-Set MAPR: >16,000 on 98% of C files Number of Subparsers 61

102 Cumulative Distribution of Latency per C File Max: 10.4 Max: 931 Fraction 0.50 SuperC TypeChef Latency in Seconds 62

103 Sidebar Totals, averages, and medians are deceiving We really need the entire distribution! 63

104 Latency vs File Size 12 Parse, Preprocess, Lex Latency in Seconds Size in Millions of Bytes 64

105 Future Work Fully characterize variability in Linux For all tools (Kconfig, Kbuild, Preprocessor) Over history of kernel Analyze names and types across conditionals Foundation for all language processing tools Annotate AST with layout, macros, includes Refactorings need to print source code again Modulo intended changes 65

106 Conclusions Preprocessor makes it hard to parse all of C No viable solution for 40 years! We performed a systematic analysis of problem Identified solution strategies Notably: Conditional hoisting, FMLR parsing And implemented a real-world tool, SuperC On x86 Linux kernel, SuperC is fast and scales well 66

107 SuperC 67

108 SuperC 68

109 69

110 SuperC Context Management C syntax is context-sensitive: T * p; If T is a type name, declare p as a pointer Otherwise, calculate the product of T and p Worse, C with conditionals can be ambiguous Same name may be both a type and an object Under different presence conditions 70

111 SuperC Context Management C syntax is context-sensitive: T * p; If T is a type name, declare p as a pointer Otherwise, calculate the product of T and p Worse, C with conditionals can be ambiguous Same name may be both a type and an object Under different presence conditions Fork and merge subparsers based on context! 71

112 SuperC Context Management Provide context object for each subparser Customize forking and merging Reclassify Fork context May merge Merge contexts 72

113 SuperC Context Management Provide context object for each subparser Conditional, scoped symbol table Customize forking and merging Reclassify: change into or add type name token Fork context: duplicate symbol table reference May merge: check number of nested scopes Merge contexts: combine symbol table entries 73

Enabling Variability-Aware Software Tools. Paul Gazzillo New York University

Enabling Variability-Aware Software Tools. Paul Gazzillo New York University Enabling Variability-Aware Software Tools Paul Gazzillo New York University We Need Better C Tools Linux and other critical systems written in C Need source code browsers 20,000+ compilation units, 10+

More information

Implementation of Fork-Merge Parsing in OpenRefactory/C. Kavyashree Krishnappa

Implementation of Fork-Merge Parsing in OpenRefactory/C. Kavyashree Krishnappa Implementation of Fork-Merge Parsing in OpenRefactory/C by Kavyashree Krishnappa A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

xtc Robert Grimm Making C Safely Extensible New York University

xtc Robert Grimm Making C Safely Extensible New York University xtc Making C Safely Extensible Robert Grimm New York University The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to

More information

TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines

TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines Paolo G. Giarrusso 04 March 2011 Software product lines (SPLs) Feature selection SPL = 1 software project

More information

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser JavaCC Parser The Compilation Task Input character stream Lexer stream Parser Abstract Syntax Tree Analyser Annotated AST Code Generator Code CC&P 2003 1 CC&P 2003 2 Automated? JavaCC Parser The initial

More information

Better Extensibility through Modular Syntax. Robert Grimm New York University

Better Extensibility through Modular Syntax. Robert Grimm New York University Better Extensibility through Modular Syntax Robert Grimm New York University Syntax Matters More complex syntactic specifications Extensions to existing programming languages Transactions, event-based

More information

LECTURE 3. Compiler Phases

LECTURE 3. Compiler Phases LECTURE 3 Compiler Phases COMPILER PHASES Compilation of a program proceeds through a fixed series of phases. Each phase uses an (intermediate) form of the program produced by an earlier phase. Subsequent

More information

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017 CS 426 Fall 2017 1 Machine Problem 1 Machine Problem 1 CS 426 Compiler Construction Fall Semester 2017 Handed Out: September 6, 2017. Due: September 21, 2017, 5:00 p.m. The machine problems for this semester

More information

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5 Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

More information

Kmax: Finding All Configurations of Kbuild Makefiles Statically Paul Gazzillo Stevens Institute. ESEC/FSE 2017 Paderborn, Germany

Kmax: Finding All Configurations of Kbuild Makefiles Statically Paul Gazzillo Stevens Institute. ESEC/FSE 2017 Paderborn, Germany Kmax: Finding All Configurations of Kbuild Makefiles Statically Paul Gazzillo Stevens Institute ESEC/FSE 2017 Paderborn, Germany Let s Talk About Makefiles 2 Variability in Linux Kbuild Kbuild is Linux

More information

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 STUDENT ID: INSTRUCTIONS Please write your student ID on this page. Do not write it or your name

More information

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program. Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation

More information

Syntax-Directed Translation. Lecture 14

Syntax-Directed Translation. Lecture 14 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik) 9/27/2006 Prof. Hilfinger, Lecture 14 1 Motivation: parser as a translator syntax-directed translation stream of tokens parser ASTs,

More information

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco CS 326 Operating Systems C Programming Greg Benson Department of Computer Science University of San Francisco Why C? Fast (good optimizing compilers) Not too high-level (Java, Python, Lisp) Not too low-level

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

CSE 401 Midterm Exam Sample Solution 2/11/15

CSE 401 Midterm Exam Sample Solution 2/11/15 Question 1. (10 points) Regular expression warmup. For regular expression questions, you must restrict yourself to the basic regular expression operations covered in class and on homework assignments:

More information

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341) CSE 413 Languages & Implementation Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341) 1 Goals Representing programs as data Racket structs as a better way to represent

More information

Introduction to Parsing. Lecture 5

Introduction to Parsing. Lecture 5 Introduction to Parsing Lecture 5 1 Outline Regular languages revisited Parser overview Context-free grammars (CFG s) Derivations Ambiguity 2 Languages and Automata Formal languages are very important

More information

Compilers and computer architecture From strings to ASTs (2): context free grammars

Compilers and computer architecture From strings to ASTs (2): context free grammars 1 / 1 Compilers and computer architecture From strings to ASTs (2): context free grammars Martin Berger October 2018 Recall the function of compilers 2 / 1 3 / 1 Recall we are discussing parsing Source

More information

Compiler phases. Non-tokens

Compiler phases. Non-tokens Compiler phases Compiler Construction Scanning Lexical Analysis source code scanner tokens regular expressions lexical analysis Lennart Andersson parser context free grammar Revision 2011 01 21 parse tree

More information

Analysis Tool Project

Analysis Tool Project Tool Overview The tool we chose to analyze was the Java static analysis tool FindBugs (http://findbugs.sourceforge.net/). FindBugs is A framework for writing static analyses Developed at the University

More information

Syntax Errors; Static Semantics

Syntax Errors; Static Semantics Dealing with Syntax Errors Syntax Errors; Static Semantics Lecture 14 (from notes by R. Bodik) One purpose of the parser is to filter out errors that show up in parsing Later stages should not have to

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

CSE 374 Programming Concepts & Tools

CSE 374 Programming Concepts & Tools CSE 374 Programming Concepts & Tools Hal Perkins Fall 2017 Lecture 8 C: Miscellanea Control, Declarations, Preprocessor, printf/scanf 1 The story so far The low-level execution model of a process (one

More information

Time : 1 Hour Max Marks : 30

Time : 1 Hour Max Marks : 30 Total No. of Questions : 6 P4890 B.E/ Insem.- 74 B.E ( Computer Engg) PRINCIPLES OF MODERN COMPILER DESIGN (2012 Pattern) (Semester I) Time : 1 Hour Max Marks : 30 Q.1 a) Explain need of symbol table with

More information

Chapter 3: Lexing and Parsing

Chapter 3: Lexing and Parsing Chapter 3: Lexing and Parsing Aarne Ranta Slides for the book Implementing Programming Languages. An Introduction to Compilers and Interpreters, College Publications, 2012. Lexing and Parsing* Deeper understanding

More information

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher COP4020 ming Languages Compilers and Interpreters Robert van Engelen & Chris Lacher Overview Common compiler and interpreter configurations Virtual machines Integrated development environments Compiler

More information

DDMD AND AUTOMATED CONVERSION FROM C++ TO D

DDMD AND AUTOMATED CONVERSION FROM C++ TO D 1 DDMD AND AUTOMATED CONVERSION FROM C++ TO D Daniel Murphy (aka yebblies ) ABOUT ME Using D since 2009 Compiler contributor since 2011 2 OVERVIEW Why convert the frontend to D What s so hard about it

More information

Robert Grimm New York University

Robert Grimm New York University xtc Towards extensible C Robert Grimm New York University The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to use C

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 7b Andrew Tolmach Portland State University 1994-2017 Values and Types We divide the universe of values according to types A type is a set of values and

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False

More information

Implementing a VSOP Compiler. March 20, 2018

Implementing a VSOP Compiler. March 20, 2018 Implementing a VSOP Compiler Cyril SOLDANI, Pr. Pierre GEURTS March 20, 2018 Part III Semantic Analysis 1 Introduction In this assignment, you will do the semantic analysis of VSOP programs, according

More information

Partial Preprocessing C Code for Variability Analysis

Partial Preprocessing C Code for Variability Analysis Partial Preprocessing C Code for Variability Analysis Christian Kästner, Paolo G. Giarrusso, and Klaus Ostermann Philipps University Marburg Marburg, Germany ABSTRACT The C preprocessor is commonly used

More information

University of Passau Department of Informatics and Mathematics Chair for Programming. Bachelor Thesis. Andreas Janker

University of Passau Department of Informatics and Mathematics Chair for Programming. Bachelor Thesis. Andreas Janker University of Passau Department of Informatics and Mathematics Chair for Programming Bachelor Thesis Implementing Conditional Compilation Preserving Refactorings on C Code Andreas Janker Date: March 21,

More information

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised: EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

More information

Semantic Analysis. Lecture 9. February 7, 2018

Semantic Analysis. Lecture 9. February 7, 2018 Semantic Analysis Lecture 9 February 7, 2018 Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average

More information

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam Compilers Type checking Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Summary of parsing Parsing A solid foundation: context-free grammars A simple parser: LL(1) A more powerful parser:

More information

Understanding Linux Feature Distribution

Understanding Linux Feature Distribution Understanding Linux Feature Distribution FOSD Meeting 2012, Braunschweig Reinhard Tartler March 22, 2012 VAMOS Research Questions What s the cause of variability and variability implementations In Systems

More information

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find CS1622 Lecture 15 Semantic Analysis CS 1622 Lecture 15 1 Semantic Analysis How to build symbol tables How to use them to find multiply-declared and undeclared variables. How to perform type checking CS

More information

CS 406/534 Compiler Construction Putting It All Together

CS 406/534 Compiler Construction Putting It All Together CS 406/534 Compiler Construction Putting It All Together Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy

More information

Intermediate Representations & Symbol Tables

Intermediate Representations & Symbol Tables Intermediate Representations & Symbol Tables Copyright 2014, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4303 April 9, 2010 14.00-15.30 This exam (6 pages) consists of 52 True/False

More information

Using an LALR(1) Parser Generator

Using an LALR(1) Parser Generator Using an LALR(1) Parser Generator Yacc is an LALR(1) parser generator Developed by S.C. Johnson and others at AT&T Bell Labs Yacc is an acronym for Yet another compiler compiler Yacc generates an integrated

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F). CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

More information

Semantic actions for declarations and expressions

Semantic actions for declarations and expressions Semantic actions for declarations and expressions Semantic actions Semantic actions are routines called as productions (or parts of productions) are recognized Actions work together to build up intermediate

More information

S Y N T A X A N A L Y S I S LR

S Y N T A X A N A L Y S I S LR LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number

More information

CSE 431S Final Review. Washington University Spring 2013

CSE 431S Final Review. Washington University Spring 2013 CSE 431S Final Review Washington University Spring 2013 What You Should Know The six stages of a compiler and what each stage does. The input to and output of each compilation stage (especially the back-end).

More information

Class Information ANNOUCEMENTS

Class Information ANNOUCEMENTS Class Information ANNOUCEMENTS Third homework due TODAY at 11:59pm. Extension? First project has been posted, due Monday October 23, 11:59pm. Midterm exam: Friday, October 27, in class. Don t forget to

More information

About Codefrux While the current trends around the world are based on the internet, mobile and its applications, we try to make the most out of it. As for us, we are a well established IT professionals

More information

Parsing II Top-down parsing. Comp 412

Parsing II Top-down parsing. Comp 412 COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 3a Andrew Tolmach Portland State University 1994-2016 Formal Semantics Goal: rigorous and unambiguous definition in terms of a wellunderstood formalism (e.g.

More information

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides

More information

Syntax-Directed Translation

Syntax-Directed Translation Syntax-Directed Translation ALSU Textbook Chapter 5.1 5.4, 4.8, 4.9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 What is syntax-directed translation? Definition: The compilation

More information

The PCAT Programming Language Reference Manual

The PCAT Programming Language Reference Manual The PCAT Programming Language Reference Manual Andrew Tolmach and Jingke Li Dept. of Computer Science Portland State University September 27, 1995 (revised October 15, 2002) 1 Introduction The PCAT language

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2017 Lecture 3a Andrew Tolmach Portland State University 1994-2017 Binding, Scope, Storage Part of being a high-level language is letting the programmer name things: variables

More information

Modules, Structs, Hashes, and Operational Semantics

Modules, Structs, Hashes, and Operational Semantics CS 152: Programming Language Paradigms Modules, Structs, Hashes, and Operational Semantics Prof. Tom Austin San José State University Lab Review (in-class) Modules Review Modules from HW 1 (in-class) How

More information

CS164: Midterm I. Fall 2003

CS164: Midterm I. Fall 2003 CS164: Midterm I Fall 2003 Please read all instructions (including these) carefully. Write your name, login, and circle the time of your section. Read each question carefully and think about what s being

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

CS5363 Final Review. cs5363 1

CS5363 Final Review. cs5363 1 CS5363 Final Review cs5363 1 Programming language implementation Programming languages Tools for describing data and algorithms Instructing machines what to do Communicate between computers and programmers

More information

CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures

CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures Dan Grossman Fall 2014 Hi! I m not Hal J I love this stuff and have taught

More information

xtc Towards an extensbile Compiler Step 1: The Parser Robert Grimm New York University

xtc Towards an extensbile Compiler Step 1: The Parser Robert Grimm New York University xtc Towards an extensbile Compiler Step 1: The Parser Robert Grimm New York University Motivation Wide-spread need for extensible languages and compilers Systems researchers libasync to build asynchronous,

More information

Code Structure Visualization

Code Structure Visualization TECHNISCHE UNIVERSITEIT EINDHOVEN Department of Mathematics and Computer Science MASTER S THESIS Code Structure Visualization by G.L.P.M. Lommerse Supervisor: Dr. Ir. A.C. Telea (TUE) Eindhoven, August

More information

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table. MODULE 18 LALR parsing After understanding the most powerful CALR parser, in this module we will learn to construct the LALR parser. The CALR parser has a large set of items and hence the LALR parser is

More information

CSE P 501 Exam Sample Solution 12/1/11

CSE P 501 Exam Sample Solution 12/1/11 Question 1. (10 points, 5 each) Regular expressions. Give regular expressions that generate the following sets of strings. You may only use the basic operations of concatenation, choice ( ), and repetition

More information

Compiler, Assembler, and Linker

Compiler, Assembler, and Linker Compiler, Assembler, and Linker Minsoo Ryu Department of Computer Science and Engineering Hanyang University msryu@hanyang.ac.kr What is a Compilation? Preprocessor Compiler Assembler Linker Loader Contents

More information

ASTEC: A New Approach to Refactoring C

ASTEC: A New Approach to Refactoring C ASTEC: A New Approach to Refactoring C Bill McCloskey Computer Science Division, EECS University of California, Berkeley {billm,brewer}@cs.berkeley.edu Eric Brewer ABSTRACT The C language is among the

More information

UNIT-4 (COMPILER DESIGN)

UNIT-4 (COMPILER DESIGN) UNIT-4 (COMPILER DESIGN) An important part of any compiler is the construction and maintenance of a dictionary containing names and their associated values, such type of dictionary is called a symbol table.

More information

2068 (I) Attempt all questions.

2068 (I) Attempt all questions. 2068 (I) 1. What do you mean by compiler? How source program analyzed? Explain in brief. 2. Discuss the role of symbol table in compiler design. 3. Convert the regular expression 0 + (1 + 0)* 00 first

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 4a Andrew Tolmach Portland State University 1994-2017 Semantics and Erroneous Programs Important part of language specification is distinguishing valid from

More information

Lexical Considerations

Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Fall 2005 Handout 6 Decaf Language Wednesday, September 7 The project for the course is to write a

More information

Compiler construction in4020 lecture 5

Compiler construction in4020 lecture 5 Compiler construction in4020 lecture 5 Semantic analysis Assignment #1 Chapter 6.1 Overview semantic analysis identification symbol tables type checking CS assignment yacc LLgen language grammar parser

More information

CSE 401 Midterm Exam Sample Solution 11/4/11

CSE 401 Midterm Exam Sample Solution 11/4/11 Question 1. (12 points, 2 each) The front end of a compiler consists of three parts: scanner, parser, and (static) semantics. Collectively these need to analyze the input program and decide if it is correctly

More information

Alternatives for semantic processing

Alternatives for semantic processing Semantic Processing Copyright c 2000 by Antony L. Hosking. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies

More information

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler. More detailed overview of compiler front end Structure of a compiler Today we ll take a quick look at typical parts of a compiler. This is to give a feeling for the overall structure. source program lexical

More information

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id:

Error Detection in LALR Parsers. LALR is More Powerful. { b + c = a; } Eof. Expr Expr + id Expr id we can first match an id: Error Detection in LALR Parsers In bottom-up, LALR parsers syntax errors are discovered when a blank (error) entry is fetched from the parser action table. Let s again trace how the following illegal CSX-lite

More information

Recitation: Cache Lab & C

Recitation: Cache Lab & C 15-213 Recitation: Cache Lab & C Jack Biggs 16 Feb 2015 Agenda Buffer Lab! C Exercises! C Conventions! C Debugging! Version Control! Compilation! Buffer Lab... Is due soon. So maybe do it soon Agenda Buffer

More information

UNIVERSITY OF CALIFORNIA

UNIVERSITY OF CALIFORNIA UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS164 Fall 1997 P. N. Hilfinger CS 164: Midterm Name: Please do not discuss the contents of

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

CS 432 Fall Mike Lam, Professor. Code Generation

CS 432 Fall Mike Lam, Professor. Code Generation CS 432 Fall 2015 Mike Lam, Professor Code Generation Compilers "Back end" Source code Tokens Syntax tree Machine code char data[20]; int main() { float x = 42.0; return 7; } 7f 45 4c 46 01 01 01 00 00

More information

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target. COMP 412 FALL 2017 Computing Inside The Parser Syntax-Directed Translation Comp 412 source code IR IR target Front End Optimizer Back End code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights

More information

CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018

CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018 CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures Dan Grossman Autumn 2018 Typical workflow concrete syntax (string) "(fn x => x + x) 4" Parsing Possible errors / warnings

More information

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer

CS664 Compiler Theory and Design LIU 1 of 16 ANTLR. Christopher League* 17 February Figure 1: ANTLR plugin installer CS664 Compiler Theory and Design LIU 1 of 16 ANTLR Christopher League* 17 February 2016 ANTLR is a parser generator. There are other similar tools, such as yacc, flex, bison, etc. We ll be using ANTLR

More information

Chapter 4. Abstract Syntax

Chapter 4. Abstract Syntax Chapter 4 Abstract Syntax Outline compiler must do more than recognize whether a sentence belongs to the language of a grammar it must do something useful with that sentence. The semantic actions of a

More information

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion CS 61C: Great Ideas in Computer Architecture Lecture 2: Numbers & C Language Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Numbers wrap-up This is not on the exam! Break C Primer Administrivia,

More information

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz CS 61C: Great Ideas in Computer Architecture Lecture 2: Numbers & C Language Krste Asanović & Randy Katz http://inst.eecs.berkeley.edu/~cs61c Numbers wrap-up This is not on the exam! Break C Primer Administrivia,

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012 Predictive Parsers LL(k) Parsing Can we avoid backtracking? es, if for a given input symbol and given nonterminal, we can choose the alternative appropriately. his is possible if the first terminal of

More information

Parser Tools: lex and yacc-style Parsing

Parser Tools: lex and yacc-style Parsing Parser Tools: lex and yacc-style Parsing Version 5.0 Scott Owens June 6, 2010 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1 Creating

More information

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018 Grammars & Parsing Lecture 12 CS 2112 Fall 2018 Motivation The cat ate the rat. The cat ate the rat slowly. The small cat ate the big rat slowly. The small cat ate the big rat on the mat slowly. The small

More information

Concepts of programming languages

Concepts of programming languages Concepts of programming languages Lecture 7 Wouter Swierstra 1 Last time Relating evaluation and types How to handle variable binding in embedded languages? 2 DSLs: approaches A stand-alone DSL typically

More information

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 INSTRUCTIONS

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 INSTRUCTIONS RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 STUDENT ID: INSTRUCTIONS Please write your student ID on this page. Do not write it or your name

More information

CSE 12 Abstract Syntax Trees

CSE 12 Abstract Syntax Trees CSE 12 Abstract Syntax Trees Compilers and Interpreters Parse Trees and Abstract Syntax Trees (AST's) Creating and Evaluating AST's The Table ADT and Symbol Tables 16 Using Algorithms and Data Structures

More information

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things. A Appendix Grammar There is no worse danger for a teacher than to teach words instead of things. Marc Block Introduction keywords lexical conventions programs expressions statements declarations declarators

More information

Compiler Theory. (GCC the GNU Compiler Collection) Sandro Spina 2009

Compiler Theory. (GCC the GNU Compiler Collection) Sandro Spina 2009 Compiler Theory (GCC the GNU Compiler Collection) Sandro Spina 2009 GCC Probably the most used compiler. Not only a native compiler but it can also cross-compile any program, producing executables for

More information

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701) Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov 2014 Compiler Design (170701) Question Bank / Assignment Unit 1: INTRODUCTION TO COMPILING

More information

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division CS 164 Spring 2010 P. N. Hilfinger CS 164: Final Examination (revised) Name: Login: You have

More information