SGML and Exceptions z. Pekka Kilpelainen and Derick Wood y. June University of Helsinki. Helsinki. Finland

Size: px
Start display at page:

Download "SGML and Exceptions z. Pekka Kilpelainen and Derick Wood y. June University of Helsinki. Helsinki. Finland"

Transcription

1 SGML and Exceptions z Pekka Kilpelainen and Derick Wood y Technical Report HKUST-CS96-30 June 1996 Department of Computer Science University of Helsinki Helsinki Finland ydepartment of Computer Science Hong Kong University of Science & Technology Clear Water Bay, Kowloon Hong Kong Abstract The Standard Generalized Markup Language (SGML) allows users to dene document type denitions (DTDs), which are essentially extended context-free grammars in a notation that is similar to extended Backus{Naur form. The right-hand side of a production is called a content model and its semantics can be modied by exceptions. We give precise denitions of the semantics of exceptions and prove that they do not increase the expressive power of SGML. For each DTD with exceptions we can construct a structurally equivalent extended context-free grammar. On the other hand, exceptions are a powerful shorthand notation eliminating them may cause exponential growth in the size of a DTD. z The research of the rst author was supported by the Academy of Finland and the research of the second author was supported by grants from the Natural Sciences and Engineering Research Council of Canada and from the Information Technology Research Centre of Ontario. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science

2 SGML and Exceptions 1 Pekka Kilpelainen 2 Derick Wood 3 July 3, 1996 Abstract The Standard Generalized Markup Language (SGML) allows users to dene document type denitions (DTDs), which are essentially extended context-free grammars in a notation that is similar to extended Backus{Naur form. The right-hand side of a production is called a content model and its semantics can be modied by exceptions. We give precise denitions of the semantics of exceptions and prove that they do not increase the expressive power of SGML. For each DTD with exceptions we can construct a structurally equivalent extended context-free grammar. On the other hand, exceptions are a powerful shorthand notation eliminating them may cause exponential growth in the size of a DTD. 1 Introduction The Standard Generalized Markup Language (SGML) [9, 11] promotes the interchangeability and application-independent management of electronic documents by providing a syntactic metalanguage for the denition of textual 1 The research of the rst author was supported by the Academy of Finland and the research of the second author was supported by grants from the Natural Sciences and Engineering Research Council of Canada and from the Information Technology Research Centre of Ontario. 2 Department of Computer Science, University of Helsinki, Helsinki, Finland. kilpelai@cs.helsinki.fi. 3 Department of Computer Science, Hong Kong University of Science & Technology, Clear Water Bay, Kowloon, Hong Kong. dwood@cs.ust.hk. 1

3 markup systems. An SGML document consists of an SGML prolog and a marked-up document instance. The prolog contains a document type definition (DTD), which is an extended context-free grammar in which the right-hand sides of productions are both extended and restricted regular expressions called content models. Fig. 1 gives an example of a simple SGML DTD. <!DOCTYPE message [ <!ELEMENT message - - (head, body)> <!ELEMENT head - - (from & to & subject)> <!ELEMENT from - - (person)> <!ELEMENT to - - (person)+> <!ELEMENT person - - (alias j (forename?, surname))> <!ELEMENT body - - (paragraph)*> <!ELEMENT subject, alias, forename, surname, paragraph - - (#PCDATA)> ]> Figure 1: An example SGML DTD. The DTD in Fig. 1 denes a document type for messages, which consist of a head followed by a body. The element (or nonterminal) head consists of subelements from, to, and subject that can appear in any order. The element from is dened to be a person that can be denoted either by an alias or by an optional forename followed by a surname. The element to consists of a nonempty list of persons. The body of a message consists of a (possibly empty) sequence of paragraphs. Finally, the last element denition species that elements subject, alias, forename, surname, and paragraph are unstructured strings, denoted by the keyword #PCDATA. The structural elements of a document instance are made visible by enclosing them in matching pairs of start tags and end tags. A possible instance of the DTD of Fig. 1 is given in Fig. 2. The semantics of content models can be modied by what the Standard calls exceptions. Inclusion exceptions allow named elements to appear anywhere within a content model and exclusion exceptions preclude named elements from appearing in a content model. To dene the placement of sidebars, gures, equations, footnotes, and similar objects in a DTD using 2

4 <message> <head> <from><person><alias>boss</alias></person></from> <subject>tomorrow's meeting...</subject> <to><person><surname>franklin</surname></person> <person><alias>betty</alias><person></to> </head> <body><paragraph>..has been cancelled.</paragraph></body> </message> Figure 2: An SGML document instance. the usual grammatical approach is laborious; exceptions provide an alternative, concise, and formal mechanism. For example, with the DTD of Fig. 1, we might want to allow notes to appear anywhere in the bodies of messages, except within notes themselves. We could add the inclusion exception <!ELEMENT body - - (paragraph)* +(note)> to the denition of element body. This modication allows notes to appear within notes; therefore, to prevent such recursive appearances we add an exclusion exception to the denition of element type note: <!ELEMENT note - - (#PCDATA) -(note)>. Exclusion exceptions seem to be a useful concept, but their exact meaning is unclear from the Standard [11] and from Goldfarb's annotation of the Standard [9]. We give rigorous denitions for the meaning of exceptions. In the full paper [10], we also give algorithms for transforming grammars with exceptions to grammars without exceptions, as well as giving complete proofs of the results mentioned here. The correctness proofs of these methods imply that exceptions do not increase the expressiveness of SGML DTDs. An application that requires the elimination of exceptions from content models is the translation of DTDs into static database schemas. This method of integrating textual documents into an object-oriented database has been suggested by Christodes et al. [8]. 3

5 The SGML Standard requires content models to be unambiguous, meaning that each nonempty prex of an input string determines uniquely which symbols of the content model match the symbols of the prex. Our methods of eliminating exceptions preserve the unambiguity of the original content models. In this respect our work extends the work of Bruggemann-Klein and Wood [3, 4, 5, 6, 7]. The Standard gives rather vague restrictions on the applicability of exclusion exceptions. We propose a simple and rigorous denition for the applicability of exclusions; in the full paper [10], we also present an optimal algorithm for testing applicability. In this extended abstract we focus on the essential ideas underlying our approach. For this reason, we consider the removal of exceptions from only extended context-free grammars with exceptions, although we mention the problems of transferring this approach to DTDs. We refer the reader to the full paper [10] for more details. 2 Extended Context-Free Grammars with Exceptions We introduce extended context-free grammars as a model for SGML DTDs. We treat extended context-free grammars as context-free grammars in which the right-hand sides of productions are regular expressions. Let V be an alphabet. Then, we dene a regular expression over V and its language in the usual way [1, 12]. The symbol denotes the empty string. We denote by sym(e) the set of symbols of V that appear in a regular expression E. An extended context-free grammar G is specied by a tuple (N; ; P; S), where N and are disjoint nite alphabets of nonterminal symbols and terminal symbols, respectively, P is a nite set of production schemas, and the nonterminal S is the sentence symbol. Each production schema has the form A! E, where A is a nonterminal and E is a regular expression over V = N [. When = 1 A 2 2 V, A! E 2 P, and 2 L(E), the string 1 2 can be derived from the string and we denote this fact by writing ) 1 2. The language L(G) of an extended context-free grammar G is the set of terminal strings derivable from the sentence symbol of G. Formally, L(G) = fw 2 j S ) + wg, where ) + denotes the 4

6 transitive closure of the derivability relation. Even though a production schema may correspond to an innite number of ordinary context-free productions, it is known that extended and ordinary CFGs allow us to describe exactly the same languages; for example, see the text of Wood [12]. An extended context free grammar G with exceptions is specied by a tuple (N; ; P; S) and is similar to an extended context-free grammar except that the production schemas in P have the form A! E + I? X, where A is in N, E is a regular expressions over V = N [, and I and X are subsets of N. The intuitive idea is that the derivation of any string w from the nonterminal A using the production schema A! E + I? X must not involve any nonterminal in X yet w may contain, in any position, strings that are derivable from nonterminals in I. When a nonterminal is both included and excluded, its exclusion overrides its inclusion. We now dene the eect of inclusions and exclusions on languages. Let L be a language over the alphabet V and let I; X V. We dene a language L with inclusions I as the language L +I = fw 0 a 1 w 1 a n w n j a 1 a n 2 L; for n 0; and w i 2 I ; for i = 0; : : : ; ng: Thus, L +I consists of the strings in L with arbitrary strings from I inserted into them. The language L with exclusions X is dened as the language L?X that consists of the strings in L that do not contain any symbol in X. Notice that (L +I )?X (L?X ) +I, but the converse does not hold in general. In the sequel we will write L +I?X for (L +I )?X. We formally describe the global eect of exceptions by attaching exceptions to nonterminals and by dening derivations from nonterminals with exceptions. We denote a nonterminal A with inclusions I and exclusions X with the symbol A +I?X. Normally, we rewrite the nonterminal A, say, with a string, where A! E is the production schema for A and 2 L(E). But when A has inclusions I and exclusions X, and the production schema for A is A! E + I A? X A, we must cumulate the inclusions and exclusions in the string. Observe that I and X are the exceptions associated with A, whereas I A and X A are the exceptions to be applied to A's derived strings. We, therefore, replace A +I?X with (I[IA ;X[XA). This cumulation of inclusions and exclusions is described informally in the Standard. 5

7 AA 2 A 2 2 L(a 1 j A) +fa2 g?; : 2 We modify the standard denition of a derivation step in an extended context-free grammar as follows. For a string w over [ N, we denote by w (I;X) the string obtained from w by replacing every nonterminal A 2 sym(w) with A +I?X. Thus, we have attached the same inclusions and exclusions to every nonterminal in w. Let A +I?X be a string of nonterminal symbols with exceptions and terminal symbols. We say that the string 0 can be derived from A +I?X, when the following two conditions hold: 1. A! E + I A? X A is a production schema in P. 2. For some string in L(E) +(I[IA )?(X[XA), 0 = (I[IA ;X[XA). Observe that the second condition reects the idea that exceptions are propagated and cumulated by derivations. We illustrate these ideas with the following example grammar with exceptions. This grammar is also used to show that the exception-removal method we design can lead to an exponential blow-up in grammar size. Example 1 The example grammar is specied as follows: A! (A 1 j j A m ) + ;? ;; A 1! (a 1 j A) + fa 2 g? ;; A 2! (a 2 j A) + fa 3 g? ;;. A m! (a m j A) + fa 1 g? ;: We now demonstrate how exception propagation works. Consider a derivation step from A 1 with empty inclusions and empty exclusions (that is from A 1+;?; ). Now, A 1+;?; derives (AA 2 A 2 ) (fa2 g;;) = A +fa2 g?;a 2+fA2 g?;a 2+fA2 g?; since the production schema is in the grammar and A 1! (a 1 j A) + fa 2 g? ; 6

8 Finally, the language L(G) of an extended context-free grammar G with exceptions consists of the terminal strings derivable from the sentence symbol with empty inclusions and exclusions. Formally, L(G) = fw 2 j S +;?; ) + wg: Exceptions seem to be a context-dependent feature: Legal expansions of a nonterminal depend on the context in which the nonterminal appears. We show, however, that exceptions do not extend the descriptive power of extended context-free grammars by giving a transformation that produces an extended context-free grammar that is structurally equivalent to an extended context-free grammar with exceptions. The transformation propagates exceptions to production schemas and modies their associated regular expressions to capture the eect of exceptions. Step 1: We explain how to modify regular expressions to capture the eect of exceptions. Let E be a regular expression over V = [ N and let I = fi 1 ; : : : ; i k g be a set of inclusion exceptions. First, observe that we can remove the ; symbol from the regular expression E and maintain equivalence, if the language of the expression is not ;. We modify E to obtain a regular expression E +I such that L(E +I ) = L(E) +I by replacing each occurrence of a symbol a 2 sym(e) with and each occurrence of with (i 1 j i 2 j j i k ) a(i 1 j i 2 j j i k ) (i 1 j i 2 j j i k ) : For a set X of excluded elements, we obtain a regular expression E?X such that L(E?X ) = L(E)?X by replacing each occurrence of a symbol a 2 X in E with ;. Step 2: We describe an algorithm for eliminating exceptions from an extended context-free grammar G = (N; ; P; S) with exceptions. It propagates the exceptions in a production schema to nonterminals in the schema; see Fig. 3. The algorithm produces an extended context-free grammar G 0 = (N 0 ; 0 ; P 0 ; S 0 ) that is structurally equivalent to G. The nonterminals of G 0 have the form A +I?X, where A 2 N and I; X N. A derivation step using a new production schema A +I?X! E in P 0 corresponds to a derivation step 7

9 N 0 := fa +;?; j A 2 Ng; S 0 := S +;?; ; 0 := ; Q:= fa +;?;! E + I? X j A! E + I? X 2 P g; P 00 :=;; for all A +IA?XA! E + I? X 2 Q do for all (B 2 (sym(e) [ I)? X) and B +I?X 62 N 0 do N 0 := N 0 [ fb +I?X g; Q:= Q [ fb +I?X! E B + (I [ I B )? (X [ X B ) j B +;?;! E B + I B? X B 2 Qg od; Q := Q? fa +IA?XA! E + I? Xg; P 00 := P 00 [ fa +IA?XA! E + I? Xg od; P 0 := fa +IA?XA! E A j A +IA?XA! E + I? X 2 P 00 and E A = ((E +I )?X ) (I;X) g; Figure 3: Exception elimination from an extended context-free grammar (N; ; P; S) with exceptions. using an old production schema for nonterminal A under inclusions I and exclusions X. Termination: The algorithm terminates since it generates, from each nonterminal A, at most 2 2jN j new nonterminals of the form A +I?X. In the worst case the algorithm can exhibit this potentially exponential behavior. Given the grammar with exceptions that we dened in Example 1, the algorithm produces production schemas of the form A +I?;! E for every subset I fa 1 ; : : : ; A m g. We do not know whether this exponential behavior can be avoided. Is it always possible to obtain an extended context-free grammar G 0 without exceptions that is (structurally) equivalent to an extended context-free grammar G with exceptions such that the size of G 0 is bounded by a polynomial in the size of G? We conjecture that the answer is negative. 8

10 3 Exception-Removal for DTDs Document type denitions (DTDs) are, essentially, extended contextfree grammars that have restricted and generalized regular expressions on the right-hand sides of their productions called content models in the ISO Standard [9, 11]. The major dierence between regular expressions and content models is that content models have the additional operators: F &G, F?, and F +, where F &G F G j GF. The SGML Standard describes the basic meaning of inclusions as follows: \Elements named in an inclusion can occur anywhere within the content of the element being dened, including anywhere in the content of its subelements." The description is rened by the rule specifying that \: : :an element that can satisfy an element token in the content model is considered to do so, even if the element is also an inclusion." This renement means, for example, that given the content model (ajb) with inclusion a, baa is a valid string of the content model as one would expect intuitively; however, aab is not a valid string of the content model. The reason is that the rst a in aab must correspond to the a in the content model and then the sux ab cannot be obtained. On the other hand, the string aaa is a valid string of the content model. The Standard recommends that inclusions \: : :should be used only for elements that are not logically part of the content"; for example, neither for a nor for b in the preceding example. Since the diculty of understanding inclusions is caused, however, by the inclusion of elements that appear in the content model, we have to take them into account. The basic idea of compiling the inclusion of the set I = fi 1 ; : : : ; i k g of symbols in a content model E is to insert new subexpressions of the form (i 1 j ji k ) in E. Preserving the unambiguity of the content model requires some extra care. We dene the SGML eect of inclusions I on language L V, where V is an alphabet, as the language L I = fw 0 a 1 w n?1 a n w n j a 1 a n 2 L; n 0; where w i 2 (I? rst(tail(l; a 1 a i ))) ; i = 0; : : : ; ng; rst(l) = fa 2 V j au 2 L; for some u 2 V g 9

11 and tail(l; w) = fu 2 V j wu 2 Lg: For example, the language fab; bag fag consists of all strings of the forms a k ba l and ba k, where k 1 and l 0. We introduce the diculties caused by the & operator with the following example. Consider the content model E = a?&b, which is unambiguous. A content model that captures the inclusion of symbol a in E should match an arbitrary sequence of as after the b. A straightforward transformation would produce a content model of the form F &(ba ) or of the form (F &b)a, where a 2 rst(l(f )) and 2 L(F ). It easy to see that these content models are ambiguous since, in each case, any a following an initial b can be matched by both F and a. Our strategy to handle such problematic subexpressions F &G is rst to replace them by the equivalent subexpression (F GjGF ). (Notice that this substitution may not suce, since F GjGF can be ambiguous even if F &G is unambiguous. For example, the content model (a?bjba?) is ambiguous, whereas the context model a?&b is unambiguous.) Then, given a content model E and a set I of inclusions, we compute a new content model E I such that L(E I ) = L(E) I. Example 2 Let E = (a?&b?)c and I = fa; cg. We rst transform it into the content model (ab?jba?)?c and then into the content model (aa (ba )?jb(aa )?)?c(ajc) : In the full paper [10], we give a complete algorithm for computing the content model E I from a given content model E and a given set of inclusions I. Clause of the SGML Standard states that \: : :exclusions modify the eect of model groups to which they apply by precluding options that would otherwise have been available". The exact meaning of the phrase \precluding options" is not clear from the Standard. Our rst task is, therefore, to formalize the intuitive notion of exclusion. As a motivating example 2 10

12 consider excluding the symbol b from the content model E = a(bjc)c, which denes the language L(E) = fabc; accg. The element b is clearly an alternative to the rst occurrence of c, and we can realize its exclusion by modifying the content model to obtain E 0 = acc. Now, consider excluding b from the content model F = a(bcjcc). The case is not as clear since b appears in a seq subexpression. On the other hand, both E and F dene the same language. Let L V be a language and let X V. Motivated by the preceding examples, we dene the aect of excluding X from L, which we denote by L?X, to be the set of all strings in L that do not contain any symbol of X. As an example, the aect of excluding fbg from the language of the preceding content models E and F is L(E)?fbg = L(F )?fbg = faccg: Notice that an exclusion always species a subset of the original language. In the full paper [10], we show how to compute a content model E X such that L(E X) = L(E)?X from a given content model E and a given set X of exclusions. The modied content model E X is unambiguous if the original content model E is unambiguous and its computation takes time linear in the size of E. As a restriction of the applicability of exclusions the Standard states that \: : :an exclusion cannot aect a specication in a model group that indicates that an element is required." The Standard does not specify rigorously how a model group (a subexpression of a content model) indicates that an element is required. The intent of the Standard appears to be that when A is an element, then in the contexts A?, (AjB), and A, the A is optional, but in the contexts A, A +, A&B, it is required. Note that a content model cannot denote a language that is either ; or fg. The Standard gives a syntactic denition of applicability of exclusions, we prefer to give a semantic denition. Therefore, a reasonable requirement for the applicability of excluding X from a content model E is that L(E)?X 6 fg. Intuitively, E X ; or E X means that excluding X from E precludes all elements from the content of E. On the other hand, E X 6 ; and E X 6 fg means that X precludes only elements that are optional in L(E). We propose that the preceding requirement be the formalization of how a content model indicates that an element is required. Notice that computing E X is a reasonable and ecient test for the applicability of exclusions X to a content model E. 11

13 We are now in a position to consider the removal of exceptions from a DTD. Let G 1 = (N 1 ; ; P 1 ; S 1 ) be an extended context-free grammar with exceptions and let G 2 = (N 2 ; ; P 2 ; S 2 ) be the extended context-free grammar that results by eliminating exceptions from G using the algorithm in Fig. 3. If B +I?X 2 N 2, then there is a production schema B +I?X! E B in P 2 if and only if there is a production schema B! E + I B? X B in P 1 such that E B = (E +I[IB?X[XB ) (I[IB;X[XB). Lastly, we can apply the same idea to an SGML DTD with exceptions to obtain a structurally equivalent DTD without exceptions. 4 Concluding Remarks and Open Problems When we apply the exception removal transformation of Fig. 3 to an SGML DTD with exceptions, then we do indeed obtain a new DTD without exceptions. Unfortunately, the original DTD-document instances are not conformant to the new DTD since the new DTD has new elements and new tags that correspond to those elements that do not appear in the old DTD instances. Therefore, how useful are our results? First, the results are interesting in their own right as a contribution to the theory of extended context-free grammars and SGML DTDs. We can eliminate exceptions to give structurally equivalent grammars and DTDs while preserving their SGML unambiguity. Second, during the DTD design phase, it may be convenient to use exceptions. Our results imply that we can eliminate the exceptions and produce a nal DTD design without exceptions before any document instances are created. Third, rather than producing a new DTD, we can emulate it with an extended context-free grammar. We rst apply the exception-removal transformation to the extended context-free grammar with exceptions given by the original DTD with exceptions. We then modify its productions to explicitly include the old tags. For example, we transform a production of the form: into a production of the form: A +I?X! E A A +I?X! `< A >'E A`< =A >'; 12

14 where `< A >' and `< =A >' 2 0 are the start and end tags that the new grammar has to use as delimiters for the element A. The new productions can be applied to the old DTD instances. Lastly, we can attack the document-instance problem head on by translating old instances into new instances. A convenient technique is to use a generalization of syntax-directed translation grammars (see Aho and Ullman [1, 2] and Wood [12]) to give extended context-free transduction grammars and the corresponding transduction version of DTDs that we call \Document Type Transduction Denitions." We are currently investigating this approach which would also be applicable to the DTD database schema issue raised by Christodes et al. [8]. It could also be used to convert a document marked up according to one DTD into a document marked up according to a dierent, but related, DTD. Acknowledgements We would like to thank Anne Bruggemann-Klein and Gaston Gonnet for the discussions that encouraged us to continue our investigation of the exception problem in SGML. References [1] A.V. Aho and J.D. Ullman. The Theory of Parsing, Translation, and Compiling, Vol. I: Parsing. Prentice-Hall, Inc., Englewood Clis, NJ, [2] A.V. Aho and J.D. Ullman. The Theory of Parsing, Translation and Compiling, Vol. II: Compiling. Prentice-Hall, Inc., Englewood Clis, NJ, [3] A. Bruggemann-Klein. Unambiguity of extended regular expressions in SGML document grammars. In Th. Lengauer, editor, Algorithms ESA 93. Springer-Verlag, [4] A. Bruggemann-Klein. Regular expressions into nite automata. Theoretical Computer Science, 120:197{213,

15 [5] A. Bruggemann-Klein. Compiler-construction tools and techniques for SGML parsers: Diculties and solutions. To appear in EPODD, [6] A. Bruggemann-Klein and D. Wood. One-unambiguous regular languages. To appear in Information and Computation, [7] A. Bruggemann-Klein and D. Wood. The validation of SGML content models. To appear in Mathematical and Computer Modelling, [8] V. Christodes, S. Christodes, S. Cluet, and M. Scholl. From structured documents to novel query facilities. SIGMOD Record, 23(2):313{324, June (Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data). [9] C. F. Goldfarb. The SGML Handbook. Clarendon Press, Oxford, [10] P. Kilpelainen and D. Wood. Exceptions in SGML document grammars, Submitted for publication. [11] International Organization for Standardization. ISO 8879: Information Processing Text and Oce Systems Standard Generalized Markup Language (SGML), October [12] D. Wood. Theory of Computation. John Wiley, New York, NY,

HKUST Theoretical Computer Science Center Research Report HKUST-TCSC-99-01

HKUST Theoretical Computer Science Center Research Report HKUST-TCSC-99-01 HKUST Theoretical Computer Science Center Research Report HKUST-TCSC-99-01 SGML and XML Document Grammars and Exceptions Pekka Kilpelainen y January 25, 1999 Abstract Derick Wood z The Standard Generalized

More information

SGML and XML Document Grammars and Exceptions 1

SGML and XML Document Grammars and Exceptions 1 Information and Computation 169, 230 251 (2001) doi:10.1006/inco.2000.2964, available online at http://www.idealibrary.com on SGML and XML Document Grammars and Exceptions 1 Pekka Kilpeläinen Department

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

Synchronization Expressions: Characterization Results and. Implementation. Kai Salomaa y Sheng Yu y. Abstract

Synchronization Expressions: Characterization Results and. Implementation. Kai Salomaa y Sheng Yu y. Abstract Synchronization Expressions: Characterization Results and Implementation Kai Salomaa y Sheng Yu y Abstract Synchronization expressions are dened as restricted regular expressions that specify synchronization

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

COP 3402 Systems Software Syntax Analysis (Parser)

COP 3402 Systems Software Syntax Analysis (Parser) COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

Context-Free Grammars and Languages (2015/11)

Context-Free Grammars and Languages (2015/11) Chapter 5 Context-Free Grammars and Languages (2015/11) Adriatic Sea shore at Opatija, Croatia Outline 5.0 Introduction 5.1 Context-Free Grammars (CFG s) 5.2 Parse Trees 5.3 Applications of CFG s 5.4 Ambiguity

More information

CSCE 314 Programming Languages

CSCE 314 Programming Languages CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how

More information

Labelling Wheels for Minimum Sum Number. William F. SMYTH. Abstract. A simple undirected graph G is called a sum graph if there exists a

Labelling Wheels for Minimum Sum Number. William F. SMYTH. Abstract. A simple undirected graph G is called a sum graph if there exists a Labelling Wheels for Minimum Sum Number Mirka MILLER Department of Computer Science University of Newcastle, NSW 308, Australia e-mail: mirka@cs.newcastle.edu.au SLAMIN Department of Computer Science University

More information

Grammars and Parsing. Paul Klint. Grammars and Parsing

Grammars and Parsing. Paul Klint. Grammars and Parsing Paul Klint Grammars and Languages are one of the most established areas of Natural Language Processing and Computer Science 2 N. Chomsky, Aspects of the theory of syntax, 1965 3 A Language...... is a (possibly

More information

ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures ITEC2620 Introduction to Data Structures Lecture 9b Grammars I Overview How can a computer do Natural Language Processing? Grammar checking? Artificial Intelligence Represent knowledge so that brute force

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular So what happens to the languages which are not regular? Can we still come up with a language recognizer?

More information

Parsing II Top-down parsing. Comp 412

Parsing II Top-down parsing. Comp 412 COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

More information

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong.

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong. Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong SAR, China fyclaw,jleeg@cse.cuhk.edu.hk

More information

the application rule M : x:a: B N : A M N : (x:a: B) N and the reduction rule (x: A: B) N! Bfx := Ng. Their algorithm is not fully satisfactory in the

the application rule M : x:a: B N : A M N : (x:a: B) N and the reduction rule (x: A: B) N! Bfx := Ng. Their algorithm is not fully satisfactory in the The Semi-Full Closure of Pure Type Systems? Gilles Barthe Institutionen for Datavetenskap, Chalmers Tekniska Hogskola, Goteborg, Sweden Departamento de Informatica, Universidade do Minho, Braga, Portugal

More information

A Simplied NP-complete MAXSAT Problem. Abstract. It is shown that the MAX2SAT problem is NP-complete even if every variable

A Simplied NP-complete MAXSAT Problem. Abstract. It is shown that the MAX2SAT problem is NP-complete even if every variable A Simplied NP-complete MAXSAT Problem Venkatesh Raman 1, B. Ravikumar 2 and S. Srinivasa Rao 1 1 The Institute of Mathematical Sciences, C. I. T. Campus, Chennai 600 113. India 2 Department of Computer

More information

This book is licensed under a Creative Commons Attribution 3.0 License

This book is licensed under a Creative Commons Attribution 3.0 License 6. Syntax Learning objectives: syntax and semantics syntax diagrams and EBNF describe context-free grammars terminal and nonterminal symbols productions definition of EBNF by itself parse tree grammars

More information

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Semantics via Syntax. f (4) = if define f (x) =2 x + 55. 1 Semantics via Syntax The specification of a programming language starts with its syntax. As every programmer knows, the syntax of a language comes in the shape of a variant of a BNF (Backus-Naur Form)

More information

Syntax Analysis. Chapter 4

Syntax Analysis. Chapter 4 Syntax Analysis Chapter 4 Check (Important) http://www.engineersgarage.com/contributio n/difference-between-compiler-andinterpreter Introduction covers the major parsing methods that are typically used

More information

ONE-STACK AUTOMATA AS ACCEPTORS OF CONTEXT-FREE LANGUAGES *

ONE-STACK AUTOMATA AS ACCEPTORS OF CONTEXT-FREE LANGUAGES * ONE-STACK AUTOMATA AS ACCEPTORS OF CONTEXT-FREE LANGUAGES * Pradip Peter Dey, Mohammad Amin, Bhaskar Raj Sinha and Alireza Farahani National University 3678 Aero Court San Diego, CA 92123 {pdey, mamin,

More information

A Note on the Succinctness of Descriptions of Deterministic Languages

A Note on the Succinctness of Descriptions of Deterministic Languages INFORMATION AND CONTROL 32, 139-145 (1976) A Note on the Succinctness of Descriptions of Deterministic Languages LESLIE G. VALIANT Centre for Computer Studies, University of Leeds, Leeds, United Kingdom

More information

Principles of Programming Languages COMP251: Syntax and Grammars

Principles of Programming Languages COMP251: Syntax and Grammars Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2006

More information

for the MADFA construction problem have typically been kept as trade secrets (due to their commercial success in applications such as spell-checking).

for the MADFA construction problem have typically been kept as trade secrets (due to their commercial success in applications such as spell-checking). A Taxonomy of Algorithms for Constructing Minimal Acyclic Deterministic Finite Automata Bruce W. Watson 1 watson@openfire.org www.openfire.org University of Pretoria (Department of Computer Science) Pretoria

More information

Compiler Construction

Compiler Construction Compiler Construction Exercises 1 Review of some Topics in Formal Languages 1. (a) Prove that two words x, y commute (i.e., satisfy xy = yx) if and only if there exists a word w such that x = w m, y =

More information

The Stepping Stones. to Object-Oriented Design and Programming. Karl J. Lieberherr. Northeastern University, College of Computer Science

The Stepping Stones. to Object-Oriented Design and Programming. Karl J. Lieberherr. Northeastern University, College of Computer Science The Stepping Stones to Object-Oriented Design and Programming Karl J. Lieberherr Northeastern University, College of Computer Science Cullinane Hall, 360 Huntington Ave., Boston MA 02115 lieber@corwin.ccs.northeastern.edu

More information

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Lexical Analysis Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

TEMPORAL AND SPATIAL SEMANTIC MODELS FOR MULTIMEDIA PRESENTATIONS ABSTRACT

TEMPORAL AND SPATIAL SEMANTIC MODELS FOR MULTIMEDIA PRESENTATIONS ABSTRACT TEMPORAL AND SPATIAL SEMANTIC MODELS FOR MULTIMEDIA PRESENTATIONS Shu-Ching Chen and R. L. Kashyap School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47907-285, U.S.A.

More information

A Characterization of the Chomsky Hierarchy by String Turing Machines

A Characterization of the Chomsky Hierarchy by String Turing Machines A Characterization of the Chomsky Hierarchy by String Turing Machines Hans W. Lang University of Applied Sciences, Flensburg, Germany Abstract A string Turing machine is a variant of a Turing machine designed

More information

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction

Chapter 3. Semantics. Topics. Introduction. Introduction. Introduction. Introduction Topics Chapter 3 Semantics Introduction Static Semantics Attribute Grammars Dynamic Semantics Operational Semantics Axiomatic Semantics Denotational Semantics 2 Introduction Introduction Language implementors

More information

Parallel Rewriting of Graphs through the. Pullback Approach. Michel Bauderon 1. Laboratoire Bordelais de Recherche en Informatique

Parallel Rewriting of Graphs through the. Pullback Approach. Michel Bauderon 1. Laboratoire Bordelais de Recherche en Informatique URL: http://www.elsevier.nl/locate/entcs/volume.html 8 pages Parallel Rewriting of Graphs through the Pullback Approach Michel Bauderon Laboratoire Bordelais de Recherche en Informatique Universite Bordeaux

More information

Reconciling Dierent Semantics for Concept Denition (Extended Abstract) Giuseppe De Giacomo Dipartimento di Informatica e Sistemistica Universita di Ro

Reconciling Dierent Semantics for Concept Denition (Extended Abstract) Giuseppe De Giacomo Dipartimento di Informatica e Sistemistica Universita di Ro Reconciling Dierent Semantics for Concept Denition (Extended Abstract) Giuseppe De Giacomo Dipartimento di Informatica e Sistemistica Universita di Roma \La Sapienza" Via Salaria 113, 00198 Roma, Italia

More information

Introduction to Parsing

Introduction to Parsing Introduction to Parsing The Front End Source code Scanner tokens Parser IR Errors Parser Checks the stream of words and their parts of speech (produced by the scanner) for grammatical correctness Determines

More information

Advanced Algorithms and Computational Models (module A)

Advanced Algorithms and Computational Models (module A) Advanced Algorithms and Computational Models (module A) Giacomo Fiumara giacomo.fiumara@unime.it 2014-2015 1 / 34 Python's built-in classes A class is immutable if each object of that class has a xed value

More information

Nowadays, the SLR(1) parsing method has been almost totally supplanted by the LALR(1) and LR(1) parsing methods because they accept larger classes of

Nowadays, the SLR(1) parsing method has been almost totally supplanted by the LALR(1) and LR(1) parsing methods because they accept larger classes of RECURSIVE ASCENT-DESCENT PARSING R. Nigel Horspool Department of Computer Science, University of Victoria, P.O. Box 1700, Victoria, B.C. Canada V8W 2Y2 11 June 1991 Abstract Generalized left-corner parsing

More information

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator:

Consider a description of arithmetic. It includes two equations that define the structural types of digit and operator: Syntax A programming language consists of syntax, semantics, and pragmatics. We formalize syntax first, because only syntactically correct programs have semantics. A syntax definition of a language lists

More information

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University Compilation 2012 Parsers and Scanners Jan Midtgaard Michael I. Schwartzbach Aarhus University Context-Free Grammars Example: sentence subject verb object subject person person John Joe Zacharias verb asked

More information

LALR(1) Parsing Tables. Roberto da Silva Bigonha. and. Mariza Andrade da Silva Bigonha

LALR(1) Parsing Tables. Roberto da Silva Bigonha. and. Mariza Andrade da Silva Bigonha A Method for Ecient Compactation of LALR(1) Parsing Tables Roberto da Silva Bigonha (bigonha@dcc.ufmg.br) Departament of Computer Science, Federal University of Minas Gerais Caixa Postal, 702 30.161 -

More information

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars CMSC 330: Organization of Programming Languages Context Free Grammars Where We Are Programming languages Ruby OCaml Implementing programming languages Scanner Uses regular expressions Finite automata Parser

More information

The Lexical Structure of Verdi TR Mark Saaltink. Release date: July 1994

The Lexical Structure of Verdi TR Mark Saaltink. Release date: July 1994 The Lexical Structure of Verdi TR-94-5463-06 Mark Saaltink Release date: July 1994 ORA Canada 267 Richmond Road, Suite 100 Ottawa, Ontario K1Z 6X3 CANADA Verdi Compiler Project TR-94-5463-06 1 This report

More information

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal

CSE 311 Lecture 21: Context-Free Grammars. Emina Torlak and Kevin Zatloukal CSE 311 Lecture 21: Context-Free Grammars Emina Torlak and Kevin Zatloukal 1 Topics Regular expressions A brief review of Lecture 20. Context-free grammars Syntax, semantics, and examples. 2 Regular expressions

More information

Part 5 Program Analysis Principles and Techniques

Part 5 Program Analysis Principles and Techniques 1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

More information

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Lexical Analysis Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata Phase Ordering of Front-Ends Lexical analysis (lexer) Break input string

More information

Properties of Regular Expressions and Finite Automata

Properties of Regular Expressions and Finite Automata Properties of Regular Expressions and Finite Automata Some token patterns can t be defined as regular expressions or finite automata. Consider the set of balanced brackets of the form [[[ ]]]. This set

More information

Week 2: Syntax Specification, Grammars

Week 2: Syntax Specification, Grammars CS320 Principles of Programming Languages Week 2: Syntax Specification, Grammars Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 2: Syntax Specification, Grammars 1/ 62 Words and Sentences

More information

Principles of Programming Languages COMP251: Syntax and Grammars

Principles of Programming Languages COMP251: Syntax and Grammars Principles of Programming Languages COMP251: Syntax and Grammars Prof. Dekai Wu Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China Fall 2007

More information

Parser Tools: lex and yacc-style Parsing

Parser Tools: lex and yacc-style Parsing Parser Tools: lex and yacc-style Parsing Version 6.11.0.6 Scott Owens January 6, 2018 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1

More information

Introduction to Lexing and Parsing

Introduction to Lexing and Parsing Introduction to Lexing and Parsing ECE 351: Compilers Jon Eyolfson University of Waterloo June 18, 2012 1 Riddle Me This, Riddle Me That What is a compiler? 1 Riddle Me This, Riddle Me That What is a compiler?

More information

LOGIC AND DISCRETE MATHEMATICS

LOGIC AND DISCRETE MATHEMATICS LOGIC AND DISCRETE MATHEMATICS A Computer Science Perspective WINFRIED KARL GRASSMANN Department of Computer Science University of Saskatchewan JEAN-PAUL TREMBLAY Department of Computer Science University

More information

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions

More information

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD CAR-TR-728 CS-TR-3326 UMIACS-TR-94-92 Samir Khuller Department of Computer Science Institute for Advanced Computer Studies University of Maryland College Park, MD 20742-3255 Localization in Graphs Azriel

More information

Johns Hopkins Math Tournament Proof Round: Point Set Topology

Johns Hopkins Math Tournament Proof Round: Point Set Topology Johns Hopkins Math Tournament 2019 Proof Round: Point Set Topology February 9, 2019 Problem Points Score 1 3 2 6 3 6 4 6 5 10 6 6 7 8 8 6 9 8 10 8 11 9 12 10 13 14 Total 100 Instructions The exam is worth

More information

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Parsing Expression Grammars and Packrat Parsing. Aaron Moss Parsing Expression Grammars and Packrat Parsing Aaron Moss References > B. Ford Packrat Parsing: Simple, Powerful, Lazy, Linear Time ICFP (2002) > Parsing Expression Grammars: A Recognition- Based Syntactic

More information

Single-pass Static Semantic Check for Efficient Translation in YAPL

Single-pass Static Semantic Check for Efficient Translation in YAPL Single-pass Static Semantic Check for Efficient Translation in YAPL Zafiris Karaiskos, Panajotis Katsaros and Constantine Lazos Department of Informatics, Aristotle University Thessaloniki, 54124, Greece

More information

CMSC 330: Organization of Programming Languages. Context Free Grammars

CMSC 330: Organization of Programming Languages. Context Free Grammars CMSC 330: Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Analyzer Optimizer Code Generator Abstract Syntax Tree Front End Back End Compiler

More information

CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

More information

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata Formal Languages and Compilers Lecture IV: Regular Languages and Finite Automata Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

CS143 Midterm Fall 2008

CS143 Midterm Fall 2008 CS143 Midterm Fall 2008 Please read all instructions (including these) carefully. There are 4 questions on the exam, some with multiple parts. You have 75 minutes to work on the exam. The exam is closed

More information

A stack eect (type signature) is a pair of input parameter types and output parameter types. We also consider the type clash as a stack eect. The set

A stack eect (type signature) is a pair of input parameter types and output parameter types. We also consider the type clash as a stack eect. The set Alternative Syntactic Methods for Dening Stack Based Languages Jaanus Poial Institute of Computer Science University of Tartu, Estonia e-mail: jaanus@cs.ut.ee Abstract. Traditional formal methods of syntax

More information

Finite-State Transducers in Language and Speech Processing

Finite-State Transducers in Language and Speech Processing Finite-State Transducers in Language and Speech Processing Mehryar Mohri AT&T Labs-Research Finite-state machines have been used in various domains of natural language processing. We consider here the

More information

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing

Evaluation of Semantic Actions in Predictive Non- Recursive Parsing Evaluation of Semantic Actions in Predictive Non- Recursive Parsing José L. Fuertes, Aurora Pérez Dept. LSIIS School of Computing. Technical University of Madrid Madrid, Spain Abstract To implement a syntax-directed

More information

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations.

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations. A Framework for Embedded Real-time System Design? Jin-Young Choi 1, Hee-Hwan Kwak 2, and Insup Lee 2 1 Department of Computer Science and Engineering, Korea Univerity choi@formal.korea.ac.kr 2 Department

More information

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous. Section A 1. What do you meant by parser and its types? A parser for grammar G is a program that takes as input a string w and produces as output either a parse tree for w, if w is a sentence of G, or

More information

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1 CSE P 501 Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008 1/15/2008 2002-08 Hal Perkins & UW CSE C-1 Agenda for Today Parsing overview Context free grammars Ambiguous grammars Reading:

More information

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca The MANICURE Document Processing System Kazem Taghva, Allen Condit, Julie Borsack, John Kilburg, Changshi Wu, and Je Gilbreth Information Science Research Institute University of Nevada, Las Vegas ABSTRACT

More information

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1 1. Introduction Parsing is the task of Syntax Analysis Determining the syntax, or structure, of a program. The syntax is defined by the grammar rules

More information

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

More information

CMPT 755 Compilers. Anoop Sarkar.

CMPT 755 Compilers. Anoop Sarkar. CMPT 755 Compilers Anoop Sarkar http://www.cs.sfu.ca/~anoop Parsing source program Lexical Analyzer token next() Parser parse tree Later Stages Lexical Errors Syntax Errors Context-free Grammars Set of

More information

Parser Tools: lex and yacc-style Parsing

Parser Tools: lex and yacc-style Parsing Parser Tools: lex and yacc-style Parsing Version 5.0 Scott Owens June 6, 2010 This documentation assumes familiarity with lex and yacc style lexer and parser generators. 1 Contents 1 Lexers 3 1.1 Creating

More information

CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

More information

ISO/IEC INTERNATIONAL STANDARD. Information technology - Syntactic metalanguage - Extended BNF

ISO/IEC INTERNATIONAL STANDARD. Information technology - Syntactic metalanguage - Extended BNF INTERNATIONAL STANDARD ISO/IEC First edition 1996-l -l 5 Information technology - Syntactic metalanguage - Extended BNF Technologies de / information - Mbtalangage syntaxique - BNF &endu Reference number

More information

SORT INFERENCE \coregular" signatures, they derive an algorithm for computing a most general typing for expressions e which is only slightly more comp

SORT INFERENCE \coregular signatures, they derive an algorithm for computing a most general typing for expressions e which is only slightly more comp Haskell Overloading is DEXPTIME{complete Helmut Seidl Fachbereich Informatik Universitat des Saarlandes Postfach 151150 D{66041 Saarbrucken Germany seidl@cs.uni-sb.de Febr., 1994 Keywords: Haskell type

More information

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np Chapter 1: Introduction Introduction Purpose of the Theory of Computation: Develop formal mathematical models of computation that reflect real-world computers. Nowadays, the Theory of Computation can be

More information

Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University

Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor. Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University Methods of Regular Expression Neha 1, Abhishek Sharma 2 1 M.Tech, 2 Assistant Professor Department of Cse, Shri Balwant College of Engineering &Technology, Dcrust University Abstract - Regular expressions

More information

when a process of the form if be then p else q is executed and also when an output action is performed. 1. Unnecessary substitution: Let p = c!25 c?x:

when a process of the form if be then p else q is executed and also when an output action is performed. 1. Unnecessary substitution: Let p = c!25 c?x: URL: http://www.elsevier.nl/locate/entcs/volume27.html 7 pages Towards Veried Lazy Implementation of Concurrent Value-Passing Languages (Abstract) Anna Ingolfsdottir (annai@cs.auc.dk) BRICS, Dept. of Computer

More information

teacher research teach "Joe" "Joe"

teacher research teach Joe Joe On XML Integrity Constraints in the Presence of DTDs Wenfei Fan Bell Laboratories 600 Mountain Avenue Murray Hill, NJ 07974, USA wenfei@research.bell-labs.com Leonid Libkin Department of Computer Science

More information

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006. Context Free Grammars CS154 Chris Pollett Mar 1, 2006. Outline Formal Definition Ambiguity Chomsky Normal Form Formal Definitions A context free grammar is a 4-tuple (V, Σ, R, S) where 1. V is a finite

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

Theory and Compiling COMP360

Theory and Compiling COMP360 Theory and Compiling COMP360 It has been said that man is a rational animal. All my life I have been searching for evidence which could support this. Bertrand Russell Reading Read sections 2.1 3.2 in the

More information

University of Utrecht. 1992; Fokker, 1995), the use of monads to structure functional programs (Wadler,

University of Utrecht. 1992; Fokker, 1995), the use of monads to structure functional programs (Wadler, J. Functional Programming 1 (1): 1{000, January 1993 c 1993 Cambridge University Press 1 F U N C T I O N A L P E A R L S Monadic Parsing in Haskell Graham Hutton University of Nottingham Erik Meijer University

More information

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Grammars Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

ER E P M S S I TRANSLATION OF CONDITIONAL COMPIL DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

ER E P M S S I TRANSLATION OF CONDITIONAL COMPIL DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A S I N S UN I ER E P M TA S A S I T VER TRANSLATION OF CONDITIONAL COMPIL DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A-1997-13 UNIVERSITY OF TAMPERE DEPARTMENT OF COMPUTER SCIENCE SERIES

More information

CONTEXT FREE GRAMMAR. presented by Mahender reddy

CONTEXT FREE GRAMMAR. presented by Mahender reddy CONTEXT FREE GRAMMAR presented by Mahender reddy What is Context Free Grammar? Why we are using Context Free Grammar? Applications of Context free Grammar. Definition of CFG: A Context free grammar is

More information

Computability and Complexity

Computability and Complexity Computability and Complexity Turing Machines CAS 705 Ryszard Janicki Department of Computing and Software McMaster University Hamilton, Ontario, Canada janicki@mcmaster.ca Ryszard Janicki Computability

More information

Syntax. A. Bellaachia Page: 1

Syntax. A. Bellaachia Page: 1 Syntax 1. Objectives & Definitions... 2 2. Definitions... 3 3. Lexical Rules... 4 4. BNF: Formal Syntactic rules... 6 5. Syntax Diagrams... 9 6. EBNF: Extended BNF... 10 7. Example:... 11 8. BNF Statement

More information

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa NPFL092 Technology for Natural Language Processing Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa November 28, 2018 Charles Univeristy in Prague Faculty of Mathematics and Physics Institute of Formal

More information

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Context Free Grammars and Parsing 1 Recall: Architecture of Compilers, Interpreters Source Parser Static Analyzer Intermediate Representation Front End Back

More information

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics The Compositional C++ Language Denition Peter Carlin Mani Chandy Carl Kesselman March 12, 1993 Revision 0.95 3/12/93, Comments welcome. Abstract This document gives a concise denition of the syntax and

More information

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram CS6660 COMPILER DESIGN Question Bank UNIT I-INTRODUCTION TO COMPILERS 1. Define compiler. 2. Differentiate compiler and interpreter. 3. What is a language processing system? 4. List four software tools

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Syntax Analysis Parsing Syntax Or Structure Given By Determines Grammar Rules Context Free Grammar 1 Context Free Grammars (CFG) Provides the syntactic structure: A grammar is quadruple (V T, V N, S, R)

More information

Schemas for Integration and Translation of. Structured and Semi-Structured Data?

Schemas for Integration and Translation of. Structured and Semi-Structured Data? Schemas for Integration and Translation of Structured and Semi-Structured Data? Catriel Beeri 1 and Tova Milo 2 1 Hebrew University beeri@cs.huji.ac.il 2 Tel Aviv University milo@math.tau.ac.il 1 Introduction

More information

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters : Organization of Programming Languages Context Free Grammars 1 Architecture of Compilers, Interpreters Source Scanner Parser Static Analyzer Intermediate Representation Front End Back End Compiler / Interpreter

More information

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees Derivations of a CFG MACM 300 Formal Languages and Automata Anoop Sarkar http://www.cs.sfu.ca/~anoop strings grow on trees strings grow on Noun strings grow Object strings Verb Object Noun Verb Object

More information

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu Semantic Foundations of Commutativity Analysis Martin C. Rinard y and Pedro C. Diniz z Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 fmartin,pedrog@cs.ucsb.edu

More information