TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

Size: px
Start display at page:

Download "TRIE BASED METHODS FOR STRING SIMILARTIY JOINS"

Transcription

1 TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju # Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH TRACK FOR FIRST HALF

2 ABSTRACT: A string similarity join finds similar pairs between two collections of strings. It is a metric that measures similarity or dissimilarity between two strings for approximate string matching or comparison in fuzzy string search. Its major applications include data integration, data cleaning and matching searches in search engines. Conventional string similarity join approaches using trie-based algorithms the process is very efficient for short length of strings. The existing processes calculate the prefix nodes known as active nodes for every search operation and these get piled up when string are of larger lengths. My research work is a study of two different techniques that have been discussed in two different conference proceedings. The two efficient Trie- based string similarity join that have been studied and evaluated are Pre-join and Trie-join. Pre-join finds all similarity string pairs using a new active-node generation method, and dynamic preorder traversal of the Trie index.trie-join uses a trie structure to index the string and utilize the trie structure to efficiently find similar string pairs based on subtrie pruning. Additionally, trie-join algorithms and trie algorithms are used to gain higher performance. All these algorithms are discussed in this paper. Later the experiments performed are analyzed to get a clearer picture of these methodologies. INTRODUCTION TO TRIES: A Trie also known as digital data structure is an ordered data structure that is used to store a dynamic set or associative array where the keys are usually strings. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest. In the figure shown, keys are listed in the nodes and values below them. Each complete English word has an arbitrary integer value associated with it. A tree can be seen as a deterministic finite automation with loops. Though tries are most commonly keyed by character strings, they don t need to be.

3 Tries can be more advantageous in many scenarios when compared to Hash tables and Binary Search Trees. Unlike Binary Search Trees, no node in the trie stores the key associated with that node, instead, its position in the tree defines the key with which it is associated. There are no collisions of keys in tries that happen with hash tables. There is no need to provide a hash function or to change hash functions as more keys are added to a trie. Some of the major applications of tries are: 1. Dictionary Representation 2. Sorting 3. Full Text Search 4. Similarity Joins. INTRODUCTION TO SIMILARITY JOINS: Similarity measure is a metric that quantifies the similarity between two text strings. String similarity join is the task of finding similarities between two given texts my making the minimal number of alterations like additions, deletions, swaps so that the first given string becomes the second given string. String similarity join is an important operation in data integration and cleansing that finds similar pairs from two collections of strings. The vast range of applications of string similarity joins include fraud detection, fingerprint analysis, plagiarism detection, ontology merging, DNA & RNA analysis, image analysis, evidence-based machine learning, data mining, web interfaces, semantic knowledge integration. String similarity algorithms

4 commonly used are Levenshtein distance or edit- distance, Needleman- Wunsch distance, Smith- Waterman distance, Gotoh distance or Smith-Waterman-Gotoh distance, Block distance or L1 distance, or City block distance, Jaro- Winkler distance, Soundex distance, Dice;s coefficient, Tversky index, Overlap Coefficient, Variational distance, Skew Divergence, Confusion Probability, Maximal matches, Lee Distance, etc. These are the metric used for string similarity calculation. The widely used technique for measuring the similarity of strings is Edit distance. It operates between two input strings, returning a score equivalent to the number of subscriptions and deletions required to transform one input string to another input string. The conventional methods of Similarity joins have the following drawbacks that lead to the introduction of new techniques like Trie-join and Pre-join: 1. They are inefficient for the data sets with short strings. 2. They involve large indices. 3. They are expensive to support dynamic updates of data sets. 4. They create many active nodes that need to be removed again in future. Calculating the active node: Given two strings r = r1r2...rn and s = s1s2...sm,let D denote a matrix with n+1 rows and m +1 columns,and D(i, j) betheeditdistancebetween the prefix r1r2...ri and the prefix s1s2...sj. We use the dynamic-programming algorithm to compute the matrix: D(0, j) = j for 0 j n, and D(i, j) =min[d(i 1, j)+1, D(i, j 1)+1, D(i 1, j 1)+θ] where θ = 0 if ri = sj; otherwise, θ = 1. Here, D(I,j) is called an active entry if D(I,j) is τ. There is an example illustrated in (Feng, Wang, & Li, 2012) that gives a better calculation of active node entries for the given trie structure. It calculates the edit- distance for every element of the given input string and enters it into the (n+1*m+1) matrix.

5 The (n+1*m+1) matrix gives the values of edit distance between any two elements of the given input string. The active nodes can clearly be noted by picking the values that are below the given threshold. TRIE-JOIN Trie-join works basically on two observations, one being subtrie pruning and the other being dual subtrie pruning. Subrie pruning states that in a given trie and string s, node n in said to be active node of the string s if the edit distance, ED(s,n) τ. If n is not an active node of every prefix of String s, then all the contents of n cannot be similar to s. Dual subtrie pruning states a relation between two trieu and v, where the string under u and v cannot be similar to each other if u is not an active node of every ancestor of v and v is not an active node of every ancestor of u. There are three different algorithms that have been discussed in (Feng, Wang, & Li, 2012)- Trie- Trversal, Trie- Dynamic and Trie-Pathstack. There are other algorithms that have been explained that work for supporting dynamic data updates and also for similarity joins for two different sets. In addition, there are algorithms that have been developed for improving the already developed Trie- Pathstack algorithm to work for larger edit- distances. Trie-Traverse: To be more particular, Trie- Traverse has been developed in the intension of improving the standard Trie-Search algorithm. In general, Trie-Search computes duplicate active nodes and this computation overload is avoided in Trie-Traverse by dual subtrie pruning. The process of Trie- Traverse is that it first draws the trie structure for both the given input strings. After construction

6 of the tries, the active- node set is computed for every node only once in the whole process irrespective of the node being a prefix of large number of strings. The process of Trie-Traverse can be explained with the expansion of an algorithm. With the given string collection S and a given edit distance threshold, a new Trie is constructed for a given string. Active node set is computed for the root node r and the output variable P is united with a function findsimilarpair of child node c, root node r and active node set of root node r. Now findsimilarpair is described with an input of c, a tire node or a child node, the root node and active node set of root node. Active node set of the child node is computed and pruning of the set is performed. This is a recursive function until the current node on which operations are done is a leaf node. If the current node is a leaf node, then outputsimilarpair function is called on the current node and active node set of the current node. Pruning is performed again in the outstirngpair function which is called for the output pair representation. Trie-Dynamic: In this part, a new algorithm has been designed with the consideration of symmetry property of active nodes. The main idea is that if u is an active node of v then v must and should be an active node of u. If the computations are done under the basis of this property, unnecessary computations can be avoided. Trie-Dynamic is an algorithm with a collection of strings, given edit distance threshold and an output of similar string. A trie structure has been constructed and a tire node has been found among the given input stings which is the longest prefix among the given strings. A for loop has been devised for repeating the process until the current node of the trie reaches the child node. Active node set of the current node is computed and appended to the existing set. The function outputsimilarpair is called upon the leaf node that is when the loop reaches the leaf node. Trie-Pathstack: The above proposed two algorithms are effective but only in one dimension i.e. Trie-Traverse uses very little memory but computes unnecessary active nodes. In contrast Trie-Dynamic computes just required number of active nodes but uses more memory. Trie-Pathstack has been devised in such a way that it can overcome the above mention problems. It integrates both the ideas of Trie-Traverse and Trie-Dynamic. For achieving this, while traversing the trie nodes,

7 virtual partialsubtrie is maintained to keep a record of visited nodes. When active nodes are computed for unvisited nodes, they are first considered as visited nodes, and the active node set is computed by assuming it as a part of the virtual partialsubtrie which avoids the redundant computation. For less memory usage, we traverse the nodes in preorder and a stack is maintained for the nodes that need to be updated. For current node active node computation, the stack is visited as the stack contains active node set of all nodes from the parent node to the current node. For computing the active node set of current node, the active node set of its parent is viewed and it helps the active node computation. Since, the Trie- Pathstaclk uses the symmetry property of active nodes, it has the same time complexity of Tire-Dynamic and the space complexity is same as Trie- Traverse. Trie-Pathstack is an input of collection of strings nad a given edit- distance threshold. Trie is computed for the given collection of strings and a new stack is initialized. The root node r is considered and it is set as visited in the virtual partial and active node is computed and it is pushed into the stack. First child of the root node is considered as c and when stack is not empty and the current first child node is not null the top element of the stack are the parent node and the active node set of the parent node and it is set to visited and the active node set is calculated for the first child and pruning is done simultaneously. The active node set is updated in the stack and then the process is repeated for its child nodes till the child node is reached. Pruning Techniques: Dual subtrie pruning has been used to develop three trie based algorithms. Now to further improve the algorithms to reduce the active node set, three types of pruning have been further introduced: Length Pruning: Considering two sub tries u and v, a range is maintained for both the nodes with the length of the shortest string and the length of the largest string. If the difference between these lengths is greater than the edit-distance threshold, then pruning for second node vcan be done by using the active nodes for the first node u.

8 Single-Branch Pruning: Considering two nodes u and v, and say that v is an ancestor of u and their nodes have same leaf nodes, then second node v can be pruned from active node set of first node u, even if v is a node of u. Since, there is only a single branch form first node v to second node u, new active nodes are not generated for v. This is called single-branch pruning. Count Pruning: Giventwo nodes u andv,ifthereisonly onestringthathasbothnodes u andv asprefixes,node u can besafelyprunedfromav becausewecannotfindtwostrings in their subtries. There are three other algorithms that have been developed for improving the performance of the above mentioned Trie-Join algorithms. 1. Incremental Trie-Join Algorithm that supports dynamic data update. 2. Trie-Pathstack+ algorithm for two different data sets. 3. Incremental Trie-Pathstack+ pr BI-Patjstackalgorithm for larger edit distance thresholds. All the above mentioned algorithms have been explained with theorem, algorithm and example in (Feng, Wang, & Li, 2012). PRE-JOIN In general. Trie- based similarity join approaches do the computation of active nodes in a different phase and also the generated active nodes are false candidates. Both these issues turn out to be computation overload for the process. To overcome this, a new active node generation method has been devised that just computes the required active nodes and the pruning phase can be eliminated. This can be very helpful for larger string lengths. So, Prejoin is a combination of preorder traversal and the new active node generation method. There are three steps in Pre-join. First, the active node set is calculated for the siblings of the current node while it is computing the active node set of the next to-be visited node. Next, Prejoin has its own order of traversal in contrast to conventional preorder traversal. Finally, the new active node generation method is employed that avoids the adding false active nodes into the set.

9 Presjoin is an algorithm with a collection of strings and a given edit distance threshold. Trie is constructed for the given set. PreorderTravers is performed on the root node which imposes an order on the children and they are considered as set. If the current node is an EOS, then Out_Similar function of the current element, children of current node, set of active nodes of that node, its value and edit- distance threshold is called. If the element is leaf node, a function of Gen_ActiveNode of the same variables mentioned above is called. Pre Traverse of the current element is performed. In the Gen_ActiveNode() function, for each node belonging to the active node set at a given distance, if cth power of ith node is equal to cth power of the current node, then a Push down operation is performed on the task with the ith node, current node, distance, threshold and 1 else Push down operation is performed on ith element, current element, distance, threshold and 0. While computing the active nodes, false nodes can be avoided by imposing a couple of rules that are a part of the new active node generation method. Rule 1: Apply the symmetry property of edit distance early in the generation process. Rule 2: During the generation of the active node set from parent s active node set, remaining siblings are not added to the set. COMPARING THE EXPERIMENTAL RESULTS TRIE-JOIN: The experiment has been performed on the following datasets: 1. English Dict- Contains English words that have been derived from Aspell spell- Checker. 2. DBLP Author- Author names from DBLP dataset. 3. AOL Query Log- One million random distinct queries 4. DBLP Authors+Title- Strings are concatenation of Authors names and title of publication.

10 The experiments are performed on the four different algorithms: Trie- Search, Trie- Traversal, Trie- Dynamic, Trie- Pathstack. The latter three algorithms are compared to the standard Trie- Search algorithm. As observed in the statistics provided in (Feng, Wang, & Li, 2012), the three Trie- based algorithms work better than the standard Trie- Search. All these algorithms Trie- Dynamic and Trie- Pathstack use the symmetry property of active node computation, which lacks in Trie- Traversal. That is, the computation overload has been decreased in the Dynamic and Pathstack algorithms. Graphs stated also show that Trie- Traversal is approximately two times slower than Trie Dynamic and TriePathstack for this very reason. The number of active nodes for Trie-PathStack is smaller than that of Trie-Search and Trie-Traverse, since Trie- PathStack utilizes the symmetry property of two active nodes. The below graph shows the comparison of four algorithms.

11 The graphical representation of comparison of running time between Trie-Pathstack and Bi-Trie- Pathstack can be shown in the following figure. Bi-Tire-PathStack performed very well with higher edit-distance threshold for which it has been designed. Conventional Trie-Pathstack worked well for smaller edit-distance thresholds. The comparison of three algorithms namely Ed-join, Trie-Path Stack and Bi-Trie-PathStack have been represented in graphs on the basis of Length of strings and Time scale with different edit distance thresholds. With different, edit distance thresholds, the algorithms performed similarly except for small changes. Ed-Join and Bi-Tire_PathStack were more efficient with larger string sizes and less efficient for smaller string sizes. The performance of Trie-PathStack has been very good with smaller string size and its performance degraded with larger string size by gradual increase in time.

12

13 PRE-JOIN: The experiment has been performed on the following datasets:dblp Authors, DBLP Auhors+Title, AOL Query Log. The Pre-join algorithm with the novel active node generation algorithm is compared against the Trie- traverse algorithm and Trie- Pathstack algorithm for different edit distance thresholds. The graphs have been drawn from (Gouda & Rashad, 2012) and they give a clearer picture of the experimental results. Comparing Pre-Join, Trie- Pathstack and Trie- Traverse on DBLP Authors dataset with different edit distance thresholds 1, 2 and 3. Comparing Pre-Join, Trie- Pathstack and Trie- Traverse on AOL Query Log dataset with different edit distance thresholds 1, 2 and 3.

14 Comparing Pre-Join, Trie- Pathstack and Trie- Traverse on DBLP Authors+Title dataset with different edit distance thresholds 1, 2 and 3. If all the above graphs are observed, it clearly shows that Pre-join works very efficiently by taking the minimum of time among Pre-join, Trie- Pathstack and Trie- Traverse. As the string size increases, the latter two algorithms started to work less efficiently whereas Pre-join is working at the shortest time. CONCLUSION This paper analyzed the techniques followed by two different groups of researchers. Both these techniques have studied the problem of string similarity joins with edit-distance constraints. Triejoin states a trie- based similarity join framework and also introduced many pruning techniques to enhance the performance of the state algorithms. To effectively serve the scenarios with large thresholds, improved algorithms have been devised with Trie- Search. Experimental Results show that the Trie- Join algorithms like Trie- traversal, Trie- dynamic, Trie- Pathstack work more efficiently compared to standard Trie- Search algorithm showing their superiority over the latter one. Pre- join proposes techniques that finds all similar pairs using a new active-node set generation method which helps in reducing the computation overload of active nodes in conventional trie based algorithms. It also proposes a dynamic preorder traversal of the Trie index. The experimental results show that the Pre-join serves very well for large and small datasets and works very efficiently for large edit distance threshold. Both these techniques work very well on paper and look interesting when the results produced my different research groups have been analyzed. No real time applications have been found that have used these methodologies to get interesting results. References Feng, J., Wang, J., & Li, G. (2012). Trie-join: a trie- based method for efficient string similarity joins. The VLDB Journal The International Journal on Very Large Data Bases, Gouda, K., & Rashad, M. (2012). PreJoin: An Efficient Trie- based String Similarity Join algorithm. Informatics and Systems (INFOS), th International Conference (pp. DE-37 - DE-43). Cairo: IEEE. Jiang, Y., Li, G., Feng, J., & Li, W.-S. (2014). String Similarity Joins: An experimental Evaluation. Proceedings of the VLDB Endowement, (pp ).

15

Trie-join: a trie-based method for efficient string similarity joins

Trie-join: a trie-based method for efficient string similarity joins The VLDB Journal DOI 0.007/s00778-0-05-8 REGULAR PAPER Trie-join: a trie-based method for efficient string similarity joins Jianhua Feng Jiannan Wang Guoliang Li Received: 4 January 0 / Revised: 0 June

More information

EFFICIENT STRING EDIT SIMILARITY JOIN ALGORITHM. Karam Gouda. Metwally Rashad

EFFICIENT STRING EDIT SIMILARITY JOIN ALGORITHM. Karam Gouda. Metwally Rashad Computing and Informatics, Vol. 36, 207, 683 704, doi:.449/cai 207 3 683 EFFICIENT STRING EDIT SIMILARITY JOIN ALGORITHM Karam Gouda Faculty of Computers and Informatics Benha University, Benha, Egypt

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 12: Indexing and Hashing (Cnt(

Chapter 12: Indexing and Hashing (Cnt( Chapter 12: Indexing and Hashing (Cnt( Cnt.) Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Algorithm Design (8) Graph Algorithms 1/2

Algorithm Design (8) Graph Algorithms 1/2 Graph Algorithm Design (8) Graph Algorithms / Graph:, : A finite set of vertices (or nodes) : A finite set of edges (or arcs or branches) each of which connect two vertices Takashi Chikayama School of

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

TREES. Trees - Introduction

TREES. Trees - Introduction TREES Chapter 6 Trees - Introduction All previous data organizations we've studied are linear each element can have only one predecessor and successor Accessing all elements in a linear sequence is O(n)

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Binary Trees, Binary Search Trees

Binary Trees, Binary Search Trees Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

UNIT IV -NON-LINEAR DATA STRUCTURES 4.1 Trees TREE: A tree is a finite set of one or more nodes such that there is a specially designated node called the Root, and zero or more non empty sub trees T1,

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang)

Bioinformatics Programming. EE, NCKU Tien-Hao Chang (Darby Chang) Bioinformatics Programming EE, NCKU Tien-Hao Chang (Darby Chang) 1 Tree 2 A Tree Structure A tree structure means that the data are organized so that items of information are related by branches 3 Definition

More information

(2,4) Trees. 2/22/2006 (2,4) Trees 1

(2,4) Trees. 2/22/2006 (2,4) Trees 1 (2,4) Trees 9 2 5 7 10 14 2/22/2006 (2,4) Trees 1 Outline and Reading Multi-way search tree ( 10.4.1) Definition Search (2,4) tree ( 10.4.2) Definition Search Insertion Deletion Comparison of dictionary

More information

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology Trees : Part Section. () (2) Preorder, Postorder and Levelorder Traversals Definition: A tree is a connected graph with no cycles Consequences: Between any two vertices, there is exactly one unique path

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

UNIT III TREES. A tree is a non-linear data structure that is used to represents hierarchical relationships between individual data items.

UNIT III TREES. A tree is a non-linear data structure that is used to represents hierarchical relationships between individual data items. UNIT III TREES A tree is a non-linear data structure that is used to represents hierarchical relationships between individual data items. Tree: A tree is a finite set of one or more nodes such that, there

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle  holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/887/2976 holds various files of this Leiden University dissertation. Author: Schraagen, Marijn Paul Title: Aspects of record linkage Issue Date: 24-- Chapter

More information

Trees. (Trees) Data Structures and Programming Spring / 28

Trees. (Trees) Data Structures and Programming Spring / 28 Trees (Trees) Data Structures and Programming Spring 2018 1 / 28 Trees A tree is a collection of nodes, which can be empty (recursive definition) If not empty, a tree consists of a distinguished node r

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE TED (10)-3071 Reg. No.. (REVISION-2010) Signature. FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours (Maximum marks: 100)

More information

COMP 182: Algorithmic Thinking Dynamic Programming

COMP 182: Algorithmic Thinking Dynamic Programming Luay Nakhleh 1 Formulating and understanding the problem The LONGEST INCREASING SUBSEQUENCE, or LIS, Problem is defined as follows. Input: An array A[0..n 1] of integers. Output: A sequence of indices

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Navigation- vs. Index-Based XML Multi-Query Processing

Navigation- vs. Index-Based XML Multi-Query Processing Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees Computer Science 0 Data Structures Siena College Fall 08 Topic Notes: Trees We ve spent a lot of time looking at a variety of structures where there is a natural linear ordering of the elements in arrays,

More information

The Adaptive Radix Tree

The Adaptive Radix Tree Department of Informatics, University of Zürich MSc Basismodul The Adaptive Radix Tree Rafael Kallis Matrikelnummer: -708-887 Email: rk@rafaelkallis.com September 8, 08 supervised by Prof. Dr. Michael

More information

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See   for conditions on re-use Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

DDS Dynamic Search Trees

DDS Dynamic Search Trees DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

[ DATA STRUCTURES ] Fig. (1) : A Tree

[ DATA STRUCTURES ] Fig. (1) : A Tree [ DATA STRUCTURES ] Chapter - 07 : Trees A Tree is a non-linear data structure in which items are arranged in a sorted sequence. It is used to represent hierarchical relationship existing amongst several

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Database index structures

Database index structures Database index structures From: Database System Concepts, 6th edijon Avi Silberschatz, Henry Korth, S. Sudarshan McGraw- Hill Architectures for Massive DM D&K / UPSay 2015-2016 Ioana Manolescu 1 Chapter

More information

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today XML Query Processing CPS 216 Advanced Database Systems Announcements (March 31) 2 Course project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text) Lec 17 April 8 Topics: binary Trees expression trees Binary Search Trees (Chapter 5 of text) Trees Linear access time of linked lists is prohibitive Heap can t support search in O(log N) time. (takes O(N)

More information

Friday Four Square! 4:15PM, Outside Gates

Friday Four Square! 4:15PM, Outside Gates Binary Search Trees Friday Four Square! 4:15PM, Outside Gates Implementing Set On Monday and Wednesday, we saw how to implement the Map and Lexicon, respectively. Let's now turn our attention to the Set.

More information

Final Examination CSE 100 UCSD (Practice)

Final Examination CSE 100 UCSD (Practice) Final Examination UCSD (Practice) RULES: 1. Don t start the exam until the instructor says to. 2. This is a closed-book, closed-notes, no-calculator exam. Don t refer to any materials other than the exam

More information

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- MARCH, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- MARCH, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours TED (10)-3071 Reg. No.. (REVISION-2010) (Maximum marks: 100) Signature. FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- MARCH, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours PART

More information

Tree Structures. A hierarchical data structure whose point of entry is the root node

Tree Structures. A hierarchical data structure whose point of entry is the root node Binary Trees 1 Tree Structures A tree is A hierarchical data structure whose point of entry is the root node This structure can be partitioned into disjoint subsets These subsets are themselves trees and

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

Module 2: Classical Algorithm Design Techniques

Module 2: Classical Algorithm Design Techniques Module 2: Classical Algorithm Design Techniques Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Module

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

TwigList: Make Twig Pattern Matching Fast

TwigList: Make Twig Pattern Matching Fast TwigList: Make Twig Pattern Matching Fast Lu Qin, Jeffrey Xu Yu, and Bolin Ding The Chinese University of Hong Kong, China {lqin,yu,blding}@se.cuhk.edu.hk Abstract. Twig pattern matching problem has been

More information

Backtracking. Chapter 5

Backtracking. Chapter 5 1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

CS8391-DATA STRUCTURES QUESTION BANK UNIT I

CS8391-DATA STRUCTURES QUESTION BANK UNIT I CS8391-DATA STRUCTURES QUESTION BANK UNIT I 2MARKS 1.Define data structure. The data structure can be defined as the collection of elements and all the possible operations which are required for those

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

PART IV. Given 2 sorted arrays, What is the time complexity of merging them together?

PART IV. Given 2 sorted arrays, What is the time complexity of merging them together? General Questions: PART IV Given 2 sorted arrays, What is the time complexity of merging them together? Array 1: Array 2: Sorted Array: Pointer to 1 st element of the 2 sorted arrays Pointer to the 1 st

More information

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to

More information

Efficiently Mining Frequent Trees in a Forest

Efficiently Mining Frequent Trees in a Forest Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki Computer Science Department, Rensselaer Polytechnic Institute, Troy NY 8 zaki@cs.rpi.edu, http://www.cs.rpi.edu/ zaki ABSTRACT Mining frequent

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

TABLE OF CONTENTS PAGE TITLE NO.

TABLE OF CONTENTS PAGE TITLE NO. TABLE OF CONTENTS CHAPTER PAGE TITLE ABSTRACT iv LIST OF TABLES xi LIST OF FIGURES xii LIST OF ABBREVIATIONS & SYMBOLS xiv 1. INTRODUCTION 1 2. LITERATURE SURVEY 14 3. MOTIVATIONS & OBJECTIVES OF THIS

More information

Chapter 20: Binary Trees

Chapter 20: Binary Trees Chapter 20: Binary Trees 20.1 Definition and Application of Binary Trees Definition and Application of Binary Trees Binary tree: a nonlinear linked list in which each node may point to 0, 1, or two other

More information

Trees. Truong Tuan Anh CSE-HCMUT

Trees. Truong Tuan Anh CSE-HCMUT Trees Truong Tuan Anh CSE-HCMUT Outline Basic concepts Trees Trees A tree consists of a finite set of elements, called nodes, and a finite set of directed lines, called branches, that connect the nodes

More information

DATA STRUCTURES AND ALGORITHMS

DATA STRUCTURES AND ALGORITHMS LECTURE 11 Babeş - Bolyai University Computer Science and Mathematics Faculty 2017-2018 In Lecture 10... Hash tables Separate chaining Coalesced chaining Open Addressing Today 1 Open addressing - review

More information

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures. Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Trees in java.util. A set is an object that stores unique elements In Java, two implementations are available:

Trees in java.util. A set is an object that stores unique elements In Java, two implementations are available: Trees in java.util A set is an object that stores unique elements In Java, two implementations are available: The class HashSet implements the set with a hash table and a hash function The class TreeSet,

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Introduction to Algorithms I

Introduction to Algorithms I Summer School on Algorithms and Optimization Organized by: ACM Unit, ISI and IEEE CEDA. Tutorial II Date: 05.07.017 Introduction to Algorithms I (Q1) A binary tree is a rooted tree in which each node has

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

Chapter 12 Digital Search Structures

Chapter 12 Digital Search Structures Chapter Digital Search Structures Digital Search Trees Binary Tries and Patricia Multiway Tries C-C Tsai P. Digital Search Tree A digital search tree is a binary tree in which each node contains one element.

More information

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004

Computer Science 136 Spring 2004 Professor Bruce. Final Examination May 19, 2004 Computer Science 136 Spring 2004 Professor Bruce Final Examination May 19, 2004 Question Points Score 1 10 2 8 3 15 4 12 5 12 6 8 7 10 TOTAL 65 Your name (Please print) I have neither given nor received

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Introduction and Overview

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Introduction and Overview Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Introduction and Overview Welcome to Analysis of Algorithms! What is an Algorithm? A possible definition: a step-by-step

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

On Label Stream Partition for Efficient Holistic Twig Join

On Label Stream Partition for Efficient Holistic Twig Join On Label Stream Partition for Efficient Holistic Twig Join Bo Chen 1, Tok Wang Ling 1,M.TamerÖzsu2, and Zhenzhou Zhu 1 1 School of Computing, National University of Singapore {chenbo, lingtw, zhuzhenz}@comp.nus.edu.sg

More information

Lecture Notes on Tries

Lecture Notes on Tries Lecture Notes on Tries 15-122: Principles of Imperative Computation Thomas Cortina, Frank Pfenning, Rob Simmons, Penny Anderson Lecture 22 June 20, 2014 1 Introduction In the data structures implementing

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

Lecture Notes on Tries

Lecture Notes on Tries Lecture Notes on Tries 15-122: Principles of Imperative Computation Thomas Cortina Notes by Frank Pfenning Lecture 24 April 19, 2011 1 Introduction In the data structures implementing associative arrays

More information

UNIT 5 GRAPH. Application of Graph Structure in real world:- Graph Terminologies:

UNIT 5 GRAPH. Application of Graph Structure in real world:- Graph Terminologies: UNIT 5 CSE 103 - Unit V- Graph GRAPH Graph is another important non-linear data structure. In tree Structure, there is a hierarchical relationship between, parent and children that is one-to-many relationship.

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 221 Data Structures and Algorithms Chapter 4: Trees (Binary) Text: Read Weiss, 4.1 4.2 Izmir University of Economics 1 Preliminaries - I (Recursive) Definition: A tree is a collection of nodes. The

More information

Recursion Problems. Warm Ups. Enumeration 1 / 7

Recursion Problems. Warm Ups. Enumeration 1 / 7 1 / 7 Recursion Problems Warm Ups 1. Write a recursive implementation of the factorial function. Recall that n! = 1 2 n, with the special case that 0! = 1. 2. Write a recursive function that, given a number

More information

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. In linked list the elements are necessarily to be contiguous In linked list the elements may locate at far positions

More information

Tree: non-recursive definition. Trees, Binary Search Trees, and Heaps. Tree: recursive definition. Tree: example.

Tree: non-recursive definition. Trees, Binary Search Trees, and Heaps. Tree: recursive definition. Tree: example. Trees, Binary Search Trees, and Heaps CS 5301 Fall 2013 Jill Seaman Tree: non-recursive definition Tree: set of nodes and directed edges - root: one node is distinguished as the root - Every node (except

More information

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes Greedy Algorithms CLRS Chapters 16.1 16.3 Introduction to greedy algorithms Activity-selection problem Design of data-compression (Huffman) codes (Minimum spanning tree problem) (Shortest-path problem)

More information

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures

More information

Cpt S 122 Data Structures. Data Structures Trees

Cpt S 122 Data Structures. Data Structures Trees Cpt S 122 Data Structures Data Structures Trees Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Motivation Trees are one of the most important and extensively

More information

First Semester - Question Bank Department of Computer Science Advanced Data Structures and Algorithms...

First Semester - Question Bank Department of Computer Science Advanced Data Structures and Algorithms... First Semester - Question Bank Department of Computer Science Advanced Data Structures and Algorithms.... Q1) What are some of the applications for the tree data structure? Q2) There are 8, 15, 13, and

More information

Basic Search Algorithms

Basic Search Algorithms Basic Search Algorithms Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract The complexities of various search algorithms are considered in terms of time, space, and cost

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Based on the notes for the U of Toronto course CSC 364 Dynamic Programming Algorithms The setting is as follows. We wish to find a solution to a given problem which optimizes some quantity Q of interest;

More information

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this Lecture Notes Array Review An array in C++ is a contiguous block of memory. Since a char is 1 byte, then an array of 5 chars is 5 bytes. For example, if you execute the following C++ code you will allocate

More information

a) State the need of data structure. Write the operations performed using data structures.

a) State the need of data structure. Write the operations performed using data structures. Important Instructions to examiners: 1) The answers should be examined by key words and not as word-to-word as given in the model answer scheme. 2) The model answer and the answer written by candidate

More information

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false.

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false. Name: Class: Date: CSCI-401 Examlet #5 True/False Indicate whether the sentence or statement is true or false. 1. The root node of the standard binary tree can be drawn anywhere in the tree diagram. 2.

More information

Top-k String Similarity Search with Edit-Distance Constraints

Top-k String Similarity Search with Edit-Distance Constraints String Similarity Search with Edit-Distance Constraints Dong Deng Guoliang Li Jianhua Feng Wen-Syan Li Department of Computer Science, Tsinghua National Laboratory for Information Science and Technology,

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information