Evaluating the Role of Context in Syntax Directed Compression of XML Documents

Size: px
Start display at page:

Download "Evaluating the Role of Context in Syntax Directed Compression of XML Documents"

Transcription

1 Evaluating the Role of Context in Syntax Directed Compression of XML Documents S. Hariharan Priti Shankar Department of Computer Science and Automation Indian Institute of Science Bangalore 60012, India November 16, 0 Abstract We propose a new technique based on recursive finite state machines for tracking context to be used in a statistical code compression scheme for XML documents. We also study the tradeoffs between space and, by observing the effects of either using or ignoring root to leaf contexts for textual content in the associated tree structures. The advantage of our scheme is that it is syntax aware and the compressor and decompressor can be generated automatically from the Document Type Definition(DTD). 1 Introduction Extensible Markup Language(XML) [7] is a standard meta language used to describe a class of data objects, called XML documents and to specify how they are to be processed by computer programs. XML is rapidly becoming a standard for the creation and parsing of documents. However, a significant disadvantage is document size, which is a consequence of the overhead of markup information. Present day XML databases are massive and the need for compression is pressing. XML documents have their structure dictated by a Document Type Definition (DTD) which specifies the syntax of the documents. It is therefore natural to investigate the use of syntactic models for the compression of such data. We propose a syntax directed scheme for compression, which is totally automatic as the user is required to specify just the DTD, and the compressor and decompressor are generated from the syntactic specification. The model that is constructed mirrors the DTD, in that it tracks the structure of the document, and is able to make accurate predictions of the expected elements. Therefore whenever the predicted element is unique, there is no need to encode it at all as the decoder generates the same model from the DTD and is thus able to generate the unique expected symbol. Most markup symbols fall into this category of symbols. Character data associated with a single element can either be automatically directed to the same model for arithmetic compression irrespective of the instance of the element in the DTD, in which case the model is said to be path agnostic, or one may choose to have a separate 1

2 model for each root to leaf path in the underlying tree for the document, in which case the scheme is path sensitive. We evaluate both schemes in this paper. Since elements may be nested recursively, the set of models used is in general a set of mutually recursive automata. A stack is used to store root to leaf context in the underlying structure tree and operations on the stack are governed by syntax. We have run experiments on five large databases and compared the performance of our tool with that of two well known XMLaware compression schemes, XMill [9] and XMLPPM []. We do not address the problem of querying compressed documents in this paper. Section 2 describes related work. Section 3 provides the relevant background. Section 4 describes the new syntactic model proposed and used in our experiments. Section gives the results of experiments run using our tool and compares the results with those obtained using XMill and XMLPPM. Finally Section 6 concludes the paper. 2 Related Work The XML-specific compression schemes that we are aware of are XMLZIP [8], XMill [9] and XMLPPM []. The last two have tried to take advantage of the structure in XML data by either transforming the file after parsing, breaking up the tree into components (as in the case of XMill) or injecting hierarchical element structure symbols into a model that multiplexes several models based on the syntactic structure of XML (in the case of XMLPPM). They do not require the DTD to compress the document. XMLZIP parses XML data and creates the underlying tree. It then breaks up the tree into many components, the root component at depth d and a component for each of the subtrees at depth d. Each of the subtrees is compressed using Java s ZIP-DEFLATE archive library. The advantage of such a scheme is that it allows limited random access to parts of the document without the need to have the whole tree in main memory. XMill separates the structure from the content and compresses them separately. Data items are grouped into containers and each container is compressed separately. Different compressors are applied to compress different containers depending on the content. The criterion for grouping data into a container is not just the tag name but also the path from the root to the tag name. XMLPPM uses a modeling technique called Multiplexed Hierarchical Modeling (MHM), based on the SAX [4] encoding and on PPM [3] modeling. The technique employs two basic ideas: multiplexing several text compression models based on the syntactic structure of XML (one model for element structure, one for attributes, and so on), and injecting hierarchical element structure symbols into the multiplexed models (these are essentially root to leaf paths to the element). Multiplexing enables more effective hierarchical structure modeling. A common case for these dependencies is for the enclosing element tag to be strongly correlated with enclosed data. MHM exploits this by injecting the enclosing tag symbol into the element, attribute or string model immediately before an element, attribute or string is encoded. Injecting a symbol means telling the model that it has been seen but not explicitly encoding or decoding it. Earlier our tool XAUST(XML Compression with AUtomata and STack) was compared with XMLPPM and XMill using unbounded memory for PCDATA and now we evaluate the role of contexts in bounded memory case. 2

3 3 Background 3.1 Arithmetic Coding Arithmetic coding does not replace every input symbol with a specific code. Instead it processes a stream of input symbols and replaces it with a single number greater than or equal to 0 and less than 1. This single number can be uniquely decoded to create the exact stream of symbols that went into its construction. In order to construct the output number, the symbols being encoded need to have a set of probabilities assigned to them. Initially the range of the message is the interval [0, 1). As each symbol is processed, the range is narrowed to that portion of it allocated to the symbol. The range thus gets narrower and narrower requiring an increasing number of bits to represent it as successive symbols are encoded. At the end, a single number in the final interval encodes the stream. The decoder works in exactly the same manner and mimics the action of the encoder. For this scheme to be effective, the model should produce probabilities that deviate from a uniform distribution. The better the model is at making such predictions, the better the s will be Finite Context Modeling In a finite context scheme, the probabilities of each symbol are calculated based on the context the symbol appears in. In its traditional setting, the context is just the symbols that have been previously encountered. The order of the model refers to the number of previous symbols that make up the context. In an adaptive order k model, both the compressor and the decompresser start with the same model. The compressor encodes a symbol using the existing model and then updates the model to account for the new symbol. Typically a model is a set of frequency tables one for each context. After seeing a symbol the frequency counts in the tables are updated. The frequency counts are used to approximate the probabilities and the scheme is adaptive because this is being done as the symbols are being scanned. The decompresser similarly decodes a symbol using the existing model and then updates the model. Since there are potentially q k possibilities for level k contexts where q is the size of the symbol space, update can be a costly process, and the tables consume a large amount of space. This causes arithmetic coding to be somewhat slow. 3.2 XML Syntax XML documents contain element tags which include start tags like <name> and end tags like </name>. Elements can nest other elements and therefore a tree structure can be associated with an XML document. Elements can also contain plain text, comments and special processing instructions for XML processors. In addition, opening element tags can have attributes with values such as gender in <person gender= female >. Detailed specifications are given in [7]. XML documents have to conform to a specified syntax usually in the form of a DTD. Usually XML documents are parsed to ensure that only valid data reaches an application. Most XML parsing libraries use either the SAX interface or the DOM(Document Object Model) interface. SAX is an event based interface suitable for search tools and algorithms 3

4 that need one pass. The DOM model on the other hand is suitable for algorithms that have to make multiple passes. Since XML documents are stored as plain text files one possibility is to use standard compression tools like bzip2 or ppm*. Cheney[] has performed a study of the compression using such general purpose tools and observes that each general purpose compressor performs poorly on at least one document. Since XML documents are governed by a rather restrictive set of rules the obvious way to go is to try to use the rules to predict what symbols to expect. Further if the rules are already known a-priori then the compressor which is tuned to take advantage of the rules can be generated directly from the rules themselves. This is what we achieve with our tool XAUST(XML Compression with AUtomata and STack). The scheme proposed in this paper assumes that the DTD describing the data is known to both the sender and the receiver. Typically, an element of a DTD consists of distinct beginning and ending tags enclosing regular expressions over other elements. Elements can also contain plain text, comments and special instructions for XML processors ( processing instructions ). Opening element tags can have attributes with values. Example 1 Consider a DTD defined as follows: <!DOCTYPE addressbook[ <!ELEMENT addressbook(card*)> <!ELEMENT card((name (givenname,familyname)), , note?)> <!ELEMENT name(#pcdata)> <!ELEMENT givenname(#pcdata)> <!ELEMENT familyname(#pcdata)> <!ELEMENT (#pcdata)> <!ELEMENT note(#pcdata)> ]> Below is an instance of an XML document conforming to this DTD. <addressbook> <card> <givenname>hariharan</givenname> <familyname>iyer</familyname> < >hari@gmail.com</ > </card> <card> <name>priti Shankar</name> < >priti@gmail.com</ > <note>hariharan s advisor</note> </card> </addressbook> It can be seen that each rule has an element name followed by a regular expression involving elements. It is thus natural to associate a deterministic finite automaton (DFA) with an element definition in a rule. For example, the DFA in Figure 1 represents the rule 4

5 name note givenname familyname Figure 1: DFA for the element card in example 1 for the element card. There are two kinds of states in this automaton, those having a single output transition and those with multiple output transitions. Symbols that begin elements which label single output transitions need not be encoded as their occurence probability is 1. Thus encoding of symbols by the arithmetic compressor needs to be performed only at states with more than one outgoing transition. An arithmetic encoding procedure is called at each such state for each element. As we observed in Section 3, the arithmetic encoder maintains a set of tables of frequencies which it updates each time it encodes a symbol. Each element which has a #PCDATA attribute will result in a call to an arithmetic encoder which uses a common set of tables for all instances of that element attribute, whenever a path agnostic scheme is used. If a path sensitive scheme is used, different sets of tables are used for each state which has a transition labeled by that element. An example will illustrate the difference. Example 2 Consider the element below <!ELEMENT Project (date, date,...) > <!ELEMENT Employee (date,...) > <!ELEMENT date (#PCDATA)> XAUST provides the choice of either using a single set of tables for date or using the contexts Project, and Employee to route textual data associated with the element date, to two separate sets of tables. A typical sequence of actions is then as follows: Enter the start state of a DFA representing the right side of a rule; if there is only one edge out of the state do nothing; if that element has a #PCDATA attribute then encode the string of symbols using the frequency tables associated with that element; if there is more than one edge, encode the tag beginning the element labeling the edge taken, using an arithmetic encoder for that state, and transit to the the start state of the DFA for that element. The decoder mimics the action of the encoder generating symbols that are certain and using the arithmetic decoder for symbols that are not. We now define the model more formally. 4 The Recursive Finite State Machine We recall that the strings following each element declaration are just regular expressions over element names and therefore each of them can be associated with a deterministic finite automaton.

6 The collection of elements is described by a recursive finite state machine which we now define. Definition 4.1 A recursive finite state machine M over an alphabet Σ is specified by a tuple < M 1, M 2,... M k > where each element of the tuple is a finite state machine M i = (S i, Σ, δ i, s 0i, F i ) where S i is a finite set of states, Σ is the input alphabet, s 0i is the start state, F i is a subset of S i and is the set of final states; δ i is a mapping from S i Σ to S i. In the present setting, the members of Σ are the elements of the DTD and k is the number of elements. There is one finite state machine for each element. The recursive finite state machine maintains a stack during its operation. A configuration of M is a quadruple (index, state, stack, string) where index is the index of the current DFA which M is traversing, state is the state of the DFA where M is currently stationed, stack represents the content of the context stack, which is initially empty, and string represents the unconsumed suffix of the input string, namely, the XML document to be compressed. Assume that the current configuration of M is (i, s li, α, o m s), where o m is an open tag for element m and s is the suffix of the input string after o m. When an open tag is encountered for element m in the document, the pair (i, s li ) is stored on the calling stack and the start state s 0m of the DFA for the element m is entered. The current configuration of M now becomes (m, s 0m, α(i, s li ), s) where the pair (i, s li ) is concatenated with the stack contents. When the closing tag c m is encountered for element m, the stack is popped and the new configuration of M becomes (i, s l i, α, s ) where δ(s li, k) = s l i, and s is the suffix of the input following c m. We now indicate how to use the states of M to refine probability estimates. Each state of M is associated with a frequency table if there is more than one output transition from the state. The elements in the table are the labels of edges leaving that state. The frequencies are the frequencies with which the edges are taken. An order 0 arithmetic encoder is used at each state with the appropriate table to represent probabilities. The machine M begins in the start state of the first element, i.e. the element specified in the DOCTYPE statement. Each time it sees an opening tag o e, it takes the transition labeled with element e, pushes the current state and the index of the current machine M i on the context stack as described, and moves to the start state of the machine associated with element e. Each state initiates an encoding (or decoding, if decompression is being carried out) action. If there is a single transition out of that state then the element is not encoded as its probability is one and there is no need to maintain any table. If there is more than one transition, then an order 0 frequency table is maintained which gives the probabilities at that state. In the example below, we need not encode the tag D but we have to encode B and C. <!ELEMENT A ((B C), D)> We note here that there is an implicit transition out of every final state of every DFA M i to the state on top of the context stack. However such transitions depend on the calling context and are detected only at runtime (i.e. during compression or decompression). These transitions will be taken on encountering the closing tag of the element. If the element is associated with PCDATA then a path sensitive scheme uses the contents of the stack to route the compressor to the correct set of tables. This corresponds to possibly having a different model for PCDATA associated with each instance of the element in the DTD. In contrast, a 6

7 path insensitive scheme will route all PCDATA associated with any instance of that element in the DTD to the same set of tables. We implement both schemes in this paper to study space/compression-ratio tradeoffs. We note here that the stack contents denote the root to element path in the implied tree representation of the structure. Consider the element below. <!ELEMENT A ( (#PCDATA B)*)> There are two transitions from the start state of the DFA for element A. The first invokes the arithmetic model for PCDATA. The second invokes the DFA for element B after pushing the current pair on the stack. The pseudo-code for Encoder (Compressor) is given below. The pseudo-code for Decoder (Decompressor) is similar. Encoding attributes is similar to encoding PCDATA and hence not shown. void Encoder() { ExitLoop = true; //StateStruct is the pair of int(elementindex, StateIndex) //ElementIndex represents the automaton //StateIndex is the state in the above automaton StateStruct CurrState(0, 0); while(exitloop == false) { Type = GetNextType(FilePointer, ElementIndex); switch(type) { case OPENTAG: //Encode ElementIndex in CurrState context EncodeOpenTag(CurrState, ElementIndex); Stack.push(CurrState); CurrState = StateStruct(ElementIndex, 0); break; case CLOSETAG: //Encode CLOSETAG in CurrState context EncodeCloseTag(CurrState); if(stack.empty() == true) { ExitLoop = true; } else { CurrState = Stack.pop(); //Make state transition in CurrState.ElementIndex 7

8 Table 1: Sizes of XML documents that were compressed Name XMark1 113 XMark2 230 DBLP 302 UniRef 79 Size (in MB) } } //automaton and get the next state CurrState.StateIndex = MakeStateTransition(CurrState, ElementIndex); } break; case PCDATA: //Encode Pcdata in path sensitive or path agnostic context EncodePcdata(CurrState); CurrState.StateIndex = MakeStateTransition(CurrState, PCDATA); break; } Experimental Results We have experimented with allocation of a memory block of fixed size for runtime memory during compression. We have examined the performance of the tool in terms of the compression ratio as a function of two parameters. The first one is the strategy for flushing context when the maximum memory allocated for PCDATA is full. The three strategies implemented are, flushing out all context tables, flushing out context tables such that the size reduces to half of half the memory allocated, and flushing the largest table. The memory allocated for PCDATA varies from to 0 Mb and as an optimization strategy memory allocated for encoding states and attributes is unbounded. The sizes of these documents are displayed in Table 1. We define the Compression Ratio as the ratio of the size of the compressed document to the size of the original document expressed as a percentage. The s achieved for various sizes of the memory block alocated are displayed for each strategy. We have also measured the effect of using root to leaf context in a path sensitive scheme tables for PCDATA. In this case the range for the memory block allocated range is higher as the tables need more space. It is observed the path agnostic scheme seems to perform better under a limited block size constraint. When the results are compared with those of XMLPPM we see that ours is better for XMARK by 2.% and DBLP by 0.1% and XMLPPM is better for UniRef by 2.7%. 8

9 6 Conclusion and Future Work We present and evaluate new schemes for syntax directed compression of XML documents where the underlying context model for the compression of tags is a recursive finite state automaton generated directly from the DTD of the document. The model is automatically switched on transiting from one automaton to another storing enough information on the stack so that return to the right state is possible; this ensures that the correct model is always used for compression. (In fact it precisely achieves the multiplexing of models mentioned in XMLPPM in a completely natural manner). We have measured the effects of allocating a fixed size block of runtime memory for the compressor, as well as varying strategies for flushing out the context tables. We have also compared the path sensitive and path agnostic schemes for storing context for PCDATA. Our experiments indicate that path sensitive schemes are less effective in the fixed memory model. Future work will concentrate on modifying this scheme to facilitate simple tree queries on the XML text. The fact that the tree structure is implicit in the textual representation, and that function calls to elements may be augmented with parameters, make it feasible to handle tree queries which require only a forward pass over the implicit tree, while the document is being decompressed. References [1] DBLP Computer Science Bibliography, ley/db [2] Nelson, M.: (1991) Arithmetic Coding. Dr. Dobbs Journal [3] Teahan, W.J.: PPMD+, PPM* source code. wjt/. [4] Megginson, D.: SAX: A Simple API for XML. [] Ian H. Witten, Radford M.Neal, John G. Cleary.: Arithmetic Coding for Data Compression. Communications of the ACM, 30(6): -40, June [6] XMark - An XML Benchmark project. Efficient query evaluation over compressed XML data. In Proc. of EDBT 04. [7] Extensible Markup Language (XML) 1.0. W3C Recommendation Feb, Reference: REC-xml [8] XML Solutions. XMLZIP. [9] Hartmut Liefke, Dan Suciu.: XMill: an efficient compressor for XML data, Proceedings of ACM SIGMOD, 00. [] James Cheney.: Compressing XML with Multiplexed Hierarchical Models. Proceedings of the 01 IEEE Data Compression Conference, pp [11] UniProt(Universal Protein Resource). 9

10 XMark1 (no path context) XMark2 (no pathcontext) MB MB 1 MB 0 MB MB 1 MB DBLP (no path context) UniRef (no path context) MB MB 0 MB MB MB Figure 2: Statistics for Compression Ratios Versus Memory Usage for XAUST and Compression Ratios for XMLPPM (continued in next page)

11 XMark1 (path context) XMark2 (path context) MB 1 MB MB MB MB DBLP (path context) XMLPPM MB MB 1 0 DBLP XMark 0 XMark 0 UniRef Figure 3: Continued from previous page 11

Compressing XML Documents Using Recursive Finite State Automata

Compressing XML Documents Using Recursive Finite State Automata Compressing XML Documents Using Recursive Finite State Automata Hariharan Subramanian and Priti Shankar Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India

More information

Tradeoffs in XML Database Compression

Tradeoffs in XML Database Compression Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression Conference March 30, 2006 Tradeoffs in XML Database Compression p.1/22 XML Compression XML: a format for tree-structured

More information

Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2

Information Technology Department, PCCOE-Pimpri Chinchwad, College of Engineering, Pune, Maharashtra, India 2 Volume 5, Issue 5, May 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Adaptive Huffman

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY Rashmi Gadbail,, 2013; Volume 1(8): 783-791 INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK EFFECTIVE XML DATABASE COMPRESSION

More information

To Optimize XML Query Processing using Compression Technique

To Optimize XML Query Processing using Compression Technique To Optimize XML Query Processing using Compression Technique Lalita Dhekwar Computer engineering department Nagpur institute of technology,nagpur Lalita_dhekwar@rediffmail.com Prof. Jagdish Pimple Computer

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

The Effects of Data Compression on Performance of Service-Oriented Architecture (SOA)

The Effects of Data Compression on Performance of Service-Oriented Architecture (SOA) The Effects of Data Compression on Performance of Service-Oriented Architecture (SOA) Hosein Shirazee 1, Hassan Rashidi 2,and Hajar Homayouni 3 1 Department of Computer, Qazvin Branch, Islamic Azad University,

More information

Lecture 5: Suffix Trees

Lecture 5: Suffix Trees Longest Common Substring Problem Lecture 5: Suffix Trees Given a text T = GGAGCTTAGAACT and a string P = ATTCGCTTAGCCTA, how do we find the longest common substring between them? Here the longest common

More information

SFilter: A Simple and Scalable Filter for XML Streams

SFilter: A Simple and Scalable Filter for XML Streams SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,

More information

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6 Compiler Design 1 Bottom-UP Parsing Compiler Design 2 The Process The parse tree is built starting from the leaf nodes labeled by the terminals (tokens). The parser tries to discover appropriate reductions,

More information

.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar..

.. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar.. .. Cal Poly CPE/CSC 366: Database Modeling, Design and Implementation Alexander Dekhtyar.. XML in a Nutshell XML, extended Markup Language is a collection of rules for universal markup of data. Brief History

More information

The Xlint Project * 1 Motivation. 2 XML Parsing Techniques

The Xlint Project * 1 Motivation. 2 XML Parsing Techniques The Xlint Project * Juan Fernando Arguello, Yuhui Jin {jarguell, yhjin}@db.stanford.edu Stanford University December 24, 2003 1 Motivation Extensible Markup Language (XML) [1] is a simple, very flexible

More information

Decidable Problems. We examine the problems for which there is an algorithm.

Decidable Problems. We examine the problems for which there is an algorithm. Decidable Problems We examine the problems for which there is an algorithm. Decidable Problems A problem asks a yes/no question about some input. The problem is decidable if there is a program that always

More information

Advanced Aspects and New Trends in XML (and Related) Technologies

Advanced Aspects and New Trends in XML (and Related) Technologies NPRG039 Advanced Aspects and New Trends in XML (and Related) Technologies RNDr. Irena Holubová, Ph.D. holubova@ksi.mff.cuni.cz Lecture 10. XML Compression http://www.ksi.mff.cuni.cz/~svoboda/courses/171-nprg039/

More information

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer: Theoretical Part Chapter one:- - What are the Phases of compiler? Six phases Scanner Parser Semantic Analyzer Source code optimizer Code generator Target Code Optimizer Three auxiliary components Literal

More information

Assignment 4 CSE 517: Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing Assignment 4 CSE 517: Natural Language Processing University of Washington Winter 2016 Due: March 2, 2016, 1:30 pm 1 HMMs and PCFGs Here s the definition of a PCFG given in class on 2/17: A finite set

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Performance Evaluation of XHTML encoding and compression

Performance Evaluation of XHTML encoding and compression Performance Evaluation of XHTML encoding and compression Sathiamoorthy Manoharan Department of Computer Science, University of Auckland, Auckland, New Zealand Abstract. The wireless markup language (WML),

More information

The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet.

The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet. 1 2 3 The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet. That's because XML has emerged as the standard

More information

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 This examination is a three hour exam. All questions carry the same weight. Answer all of the following six questions.

More information

Optimizing Finite Automata

Optimizing Finite Automata Optimizing Finite Automata We can improve the DFA created by MakeDeterministic. Sometimes a DFA will have more states than necessary. For every DFA there is a unique smallest equivalent DFA (fewest states

More information

CT32 COMPUTER NETWORKS DEC 2015

CT32 COMPUTER NETWORKS DEC 2015 Q.2 a. Using the principle of mathematical induction, prove that (10 (2n-1) +1) is divisible by 11 for all n N (8) Let P(n): (10 (2n-1) +1) is divisible by 11 For n = 1, the given expression becomes (10

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

More information

Nondeterministic Finite Automata (NFA): Nondeterministic Finite Automata (NFA) states of an automaton of this kind may or may not have a transition for each symbol in the alphabet, or can even have multiple

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

arxiv: v2 [cs.it] 15 Jan 2011

arxiv: v2 [cs.it] 15 Jan 2011 Improving PPM Algorithm Using Dictionaries Yichuan Hu Department of Electrical and Systems Engineering University of Pennsylvania Email: yichuan@seas.upenn.edu Jianzhong (Charlie) Zhang, Farooq Khan and

More information

Modified SPIHT Image Coder For Wireless Communication

Modified SPIHT Image Coder For Wireless Communication Modified SPIHT Image Coder For Wireless Communication M. B. I. REAZ, M. AKTER, F. MOHD-YASIN Faculty of Engineering Multimedia University 63100 Cyberjaya, Selangor Malaysia Abstract: - The Set Partitioning

More information

Languages and Compilers

Languages and Compilers Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

Indexing Keys in Hierarchical Data

Indexing Keys in Hierarchical Data University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

8 Integer encoding. scritto da: Tiziano De Matteis

8 Integer encoding. scritto da: Tiziano De Matteis 8 Integer encoding scritto da: Tiziano De Matteis 8.1 Unary code... 8-2 8.2 Elias codes: γ andδ... 8-2 8.3 Rice code... 8-3 8.4 Interpolative coding... 8-4 8.5 Variable-byte codes and (s,c)-dense codes...

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design i About the Tutorial A compiler translates the codes written in one language to some other language without changing the meaning of the program. It is also expected that a compiler should make the target

More information

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2

Midterm Exam. CSCI 3136: Principles of Programming Languages. February 20, Group 2 Banner number: Name: Midterm Exam CSCI 336: Principles of Programming Languages February 2, 23 Group Group 2 Group 3 Question. Question 2. Question 3. Question.2 Question 2.2 Question 3.2 Question.3 Question

More information

MQEB: Metadata-based Query Evaluation of Bi-labeled XML data

MQEB: Metadata-based Query Evaluation of Bi-labeled XML data MQEB: Metadata-based Query Evaluation of Bi-labeled XML data Rajesh Kumar A and P Sreenivasa Kumar Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036, India.

More information

Automata-Theoretic LTL Model Checking. Emptiness of Büchi Automata

Automata-Theoretic LTL Model Checking. Emptiness of Büchi Automata Automata-Theoretic LTL Model Checking Graph Algorithms for Software Model Checking (based on Arie Gurfinkel s csc2108 project) Automata-Theoretic LTL Model Checking p.1 Emptiness of Büchi Automata An automation

More information

CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents

CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents CIT. Journal of Computing and Information Technology, Vol. 26, No. 2, June 2018, 99 114 doi: 10.20532/cit.2018.1003955 99 CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11 !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

AADL Graphical Editor Design

AADL Graphical Editor Design AADL Graphical Editor Design Peter Feiler Software Engineering Institute phf@sei.cmu.edu Introduction An AADL specification is a set of component type and implementation declarations. They are organized

More information

Navigation- vs. Index-Based XML Multi-Query Processing

Navigation- vs. Index-Based XML Multi-Query Processing Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com

More information

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,, CMPSCI 601: Recall From Last Time Lecture 5 Definition: A context-free grammar (CFG) is a 4- tuple, variables = nonterminals, terminals, rules = productions,,, are all finite. 1 ( ) $ Pumping Lemma for

More information

XCQ: A Queriable XML Compression System

XCQ: A Queriable XML Compression System Under consideration for publication in Knowledge and Information Systems XCQ: A Queriable XML Compression System Wilfred Ng 1, Wai-Yeung Lam 1, Peter T. Wood 2 and Mark Levene 2 1 Department of Computer

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Efficient subset and superset queries

Efficient subset and superset queries Efficient subset and superset queries Iztok SAVNIK Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 5000 Koper, Slovenia Abstract. The paper

More information

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa

Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa NPFL092 Technology for Natural Language Processing Introduction to XML Zdeněk Žabokrtský, Rudolf Rosa November 28, 2018 Charles Univeristy in Prague Faculty of Mathematics and Physics Institute of Formal

More information

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions.

Finite automata. We have looked at using Lex to build a scanner on the basis of regular expressions. Finite automata We have looked at using Lex to build a scanner on the basis of regular expressions. Now we begin to consider the results from automata theory that make Lex possible. Recall: An alphabet

More information

Compression of Probabilistic XML documents

Compression of Probabilistic XML documents Compression of Probabilistic XML documents Irma Veldman i.e.veldman@student.utwente.nl July 9, 2009 Abstract Probabilistic XML (PXML) files resulting from data integration can become extremely large, which

More information

Lecture 6: The Declarative Kernel Language Machine. September 13th, 2011

Lecture 6: The Declarative Kernel Language Machine. September 13th, 2011 Lecture 6: The Declarative Kernel Language Machine September 13th, 2011 Lecture Outline Computations contd Execution of Non-Freezable Statements on the Abstract Machine The skip Statement The Sequential

More information

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture

More information

simplefun Semantics 1 The SimpleFUN Abstract Syntax 2 Semantics

simplefun Semantics 1 The SimpleFUN Abstract Syntax 2 Semantics simplefun Semantics 1 The SimpleFUN Abstract Syntax We include the abstract syntax here for easy reference when studying the domains and transition rules in the following sections. There is one minor change

More information

Theory of Computation Dr. Weiss Extra Practice Exam Solutions

Theory of Computation Dr. Weiss Extra Practice Exam Solutions Name: of 7 Theory of Computation Dr. Weiss Extra Practice Exam Solutions Directions: Answer the questions as well as you can. Partial credit will be given, so show your work where appropriate. Try to be

More information

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata Outline 1 2 Regular Expresssions Lexical Analysis 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA 6 NFA to DFA 7 8 JavaCC:

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking

Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking Instructor: Tevfik Bultan Buchi Automata Language

More information

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou Administrative! Any questions about the syllabus?! Course Material available at www.cs.unic.ac.cy/ioanna! Next time reading assignment [ALSU07]

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

An Analysis of Approaches to XML Schema Inference

An Analysis of Approaches to XML Schema Inference An Analysis of Approaches to XML Schema Inference Irena Mlynkova irena.mlynkova@mff.cuni.cz Charles University Faculty of Mathematics and Physics Department of Software Engineering Prague, Czech Republic

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Organizing Spatial Data

Organizing Spatial Data Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the

More information

Lexical Analysis 1 / 52

Lexical Analysis 1 / 52 Lexical Analysis 1 / 52 Outline 1 Scanning Tokens 2 Regular Expresssions 3 Finite State Automata 4 Non-deterministic (NFA) Versus Deterministic Finite State Automata (DFA) 5 Regular Expresssions to NFA

More information

Variants of Turing Machines

Variants of Turing Machines November 4, 2013 Robustness Robustness Robustness of a mathematical object (such as proof, definition, algorithm, method, etc.) is measured by its invariance to certain changes Robustness Robustness of

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information

More information

Introduction to Computers & Programming

Introduction to Computers & Programming 16.070 Introduction to Computers & Programming Theory of computation 5: Reducibility, Turing machines Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT States and transition function State control A finite

More information

Lecture 7 February 26, 2010

Lecture 7 February 26, 2010 6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some

More information

MIDTERM EXAM (Solutions)

MIDTERM EXAM (Solutions) MIDTERM EXAM (Solutions) Total Score: 100, Max. Score: 83, Min. Score: 26, Avg. Score: 57.3 1. (10 pts.) List all major categories of programming languages, outline their definitive characteristics and

More information

Part V. Relational XQuery-Processing. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2007/08 297

Part V. Relational XQuery-Processing. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2007/08 297 Part V Relational XQuery-Processing Marc H Scholl (DBIS, Uni KN) XML and Databases Winter 2007/08 297 Outline of this part (I) 12 Mapping Relational Databases to XML Introduction Wrapping Tables into XML

More information

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017 CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive

More information

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation Language Implementation Methods The Design and Implementation of Programming Languages Compilation Interpretation Hybrid In Text: Chapter 1 2 Compilation Interpretation Translate high-level programs to

More information

M301: Software Systems & their Development. Unit 4: Inheritance, Composition and Polymorphism

M301: Software Systems & their Development. Unit 4: Inheritance, Composition and Polymorphism Block 1: Introduction to Java Unit 4: Inheritance, Composition and Polymorphism Aims of the unit: Study and use the Java mechanisms that support reuse, in particular, inheritance and composition; Analyze

More information

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages. Syntax Prof. Robert van Engelen COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

XML: some structural principles

XML: some structural principles XML: some structural principles Hayo Thielecke University of Birmingham www.cs.bham.ac.uk/~hxt October 18, 2011 1 / 25 XML in SSC1 versus First year info+web Information and the Web is optional in Year

More information

2. Syntax and Type Analysis

2. Syntax and Type Analysis Content of Lecture Syntax and Type Analysis Lecture Compilers Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type

More information

Embedded Rate Scalable Wavelet-Based Image Coding Algorithm with RPSWS

Embedded Rate Scalable Wavelet-Based Image Coding Algorithm with RPSWS Embedded Rate Scalable Wavelet-Based Image Coding Algorithm with RPSWS Farag I. Y. Elnagahy Telecommunications Faculty of Electrical Engineering Czech Technical University in Prague 16627, Praha 6, Czech

More information

Recognizing regular tree languages with static information

Recognizing regular tree languages with static information Recognizing regular tree languages with static information Alain Frisch (ENS Paris) PLAN-X 2004 p.1/22 Motivation Efficient compilation of patterns in XDuce/CDuce/... E.g.: type A = [ A* ] type B =

More information

XML Tree Structure Compression

XML Tree Structure Compression XML Tree Structure Compression Sebastian Maneth NICTA & University of NSW Joint work with N. Mihaylov and S. Sakr Melbourne, Nov. 13 th, 2008 Outline -- XML Tree Structure Compression 1. Motivation 2.

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

XML Filtering Technologies

XML Filtering Technologies XML Filtering Technologies Introduction Data exchange between applications: use XML Messages processed by an XML Message Broker Examples Publish/subscribe systems [Altinel 00] XML message routing [Snoeren

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

Point Enclosure and the Interval Tree

Point Enclosure and the Interval Tree C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 8 Date: March 3, 1993 Scribe: Dzung T. Hoang Point Enclosure and the Interval Tree Point Enclosure We consider the 1-D

More information

Heap Compression for Memory-Constrained Java

Heap Compression for Memory-Constrained Java Heap Compression for Memory-Constrained Java CSE Department, PSU G. Chen M. Kandemir N. Vijaykrishnan M. J. Irwin Sun Microsystems B. Mathiske M. Wolczko OOPSLA 03 October 26-30 2003 Overview PROBLEM:

More information

Pioneering Compiler Design

Pioneering Compiler Design Pioneering Compiler Design NikhitaUpreti;Divya Bali&Aabha Sharma CSE,Dronacharya College of Engineering, Gurgaon, Haryana, India nikhita.upreti@gmail.comdivyabali16@gmail.com aabha6@gmail.com Abstract

More information

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412 Midterm Exam: Thursday October 18, 7PM Herzstein Amphitheater Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412 COMP 412 FALL 2018 source code IR Front End Optimizer Back End IR target

More information

Security Based Heuristic SAX for XML Parsing

Security Based Heuristic SAX for XML Parsing Security Based Heuristic SAX for XML Parsing Wei Wang Department of Automation Tsinghua University, China Beijing, China Abstract - XML based services integrate information resources running on different

More information

PARALLEL XPATH QUERY EVALUATION ON MULTI-CORE PROCESSORS

PARALLEL XPATH QUERY EVALUATION ON MULTI-CORE PROCESSORS PARALLEL XPATH QUERY EVALUATION ON MULTI-CORE PROCESSORS A THESIS SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI I IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF

More information

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML is a markup language,

More information

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 Assured and security Deep-Secure XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 This technical note describes the extensible Data

More information

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward Lexical Analysis COMP 524, Spring 2014 Bryan Ward Based in part on slides and notes by J. Erickson, S. Krishnan, B. Brandenburg, S. Olivier, A. Block and others The Big Picture Character Stream Scanner

More information

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML Chapter 7 XML 7.1 Introduction extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML Lax syntactical rules Many complex features that are rarely used HTML

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

Syntax and Type Analysis

Syntax and Type Analysis Syntax and Type Analysis Lecture Compilers Summer Term 2011 Prof. Dr. Arnd Poetzsch-Heffter Software Technology Group TU Kaiserslautern Prof. Dr. Arnd Poetzsch-Heffter Syntax and Type Analysis 1 Content

More information

An Empirical Evaluation of XML Compression Tools

An Empirical Evaluation of XML Compression Tools An Empirical Evaluation of XML Compression Tools Sherif Sakr School of Computer Science and Engineering University of New South Wales 1 st International Workshop on Benchmarking of XML and Semantic Web

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

CSc 453 Lexical Analysis (Scanning)

CSc 453 Lexical Analysis (Scanning) CSc 453 Lexical Analysis (Scanning) Saumya Debray The University of Arizona Tucson Overview source program lexical analyzer (scanner) tokens syntax analyzer (parser) symbol table manager Main task: to

More information

EE-575 INFORMATION THEORY - SEM 092

EE-575 INFORMATION THEORY - SEM 092 EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------

More information