Automated Generation of Object Summaries from Relational Databases: A Novel Keyword Searching Paradigm GEORGIOS FAKAS
|
|
- Moris Harris
- 5 years ago
- Views:
Transcription
1 Automated Generation of Object Summaries from Relational Databases: A Novel Keyword Searching Paradigm GEORGIOS FAKAS Department of Computing and Mathematics, Manchester Metropolitan University Manchester, UK. g.fakas@mmu.ac.uk
2 Related Work: Web Search Engines: Keyword Search Kw Search: Peacock Result: A ranked set of web pages
3 Related Work: Web Search Engines: Keyword Search Kw Search: Peacock Result: A ranked set of web pages
4 Related Work: Keyword Search in Relational DBs Full-text Search (e.g. Oracle 9i Text) Kw Searching in Relational DB (DISCOVER, BANKS) Kw Search: Leverling, Peacock Result: e3-o2-c2 e4-06-c2
5 A Novel Keyword Searching Paradigm: Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs
6 A Novel Keyword Searching Paradigm: Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs Problems-Challenges: How can we automatically (1) Generate, (2) size-l OSs and (3) Rank OSs liberating users from knowledge of: (1) Schema and (2) Query Language?
7 A Novel Keyword Searching Paradigm: Object Summaries (OSs) 1.Automated Generation of OSs Affinity 2.Generation of size-l OS Efficient greedy algorithms ValueRank, a PageRank inspired ranking system
8 OS Generation - Methodology t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. R DS the corresponding central Relation; similarly Relations around contain additional information.
9 OS Generation - Methodology KW-ID = Janet Leverling Territories t1 t2 t3 t4 Employees e1 e2 e3 e4 Orders o1 o2 Region r1 r2 Customers c1 c2 c3 EmployeeTerritories et1 et2 et3 et4 t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. o3 o4 Shippers o5 o6 s1 o7 Order Details Products s2 s3 od1 od2 od3 od4 od5 od6 p1 p2 Suppliers su1 Categories ca1 R DS the corresponding central Relation; similarly Relations around contain additional information.
10 OS Generation - Methodology KW-ID = Janet Leverling Territories t1 t2 t3 t4 Employees e1 e2 e3 e4 Orders o1 o2 Region r1 r2 Customers c1 c2 c3 EmployeeTerritories et1 et2 et3 et4 t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. o3 o4 Shippers o5 o6 s1 o7 Order Details Products s2 s3 od1 od2 od3 od4 od5 od6 p1 p2 Suppliers su1 Categories ca1 R DS the corresponding central Relation; similarly Relations around contain additional information.
11 OS Generation - Methodology KW-ID = Janet Leverling Territories t1 t2 t3 t4 Employees e1 e2 e3 e4 Orders o1 o2 Region r1 r2 Customers c1 c2 c3 EmployeeTerritories et1 et2 et3 et4 t DS a central tuple containing the Kw; tuples around t DS contain additional information about the Data Subject. o3 o4 Shippers o5 o6 s1 o7 Order Details Products s2 s3 od1 od2 od3 od4 od5 od6 p1 p2 Suppliers su1 Categories ca1 R DS the corresponding central Relation; similarly Relations around contain additional information.
12 OS Generation - Methodology KW-ID = Janet Leverling Territories Region t1 t2 r1 r2 t3 t4 Employees EmployeeTerritories e1 e2 e3 e4 Orders Customers et1 et2 et3 et4 c1 o1 o2 o3 o4 c2 c3 Shippers o5 o6 s1 o7 Order Details Products s2 s3 od1 od2 od3 od4 od5 od6 p1 p2 Suppliers su1 Categories ca1 G DS
13 OS Generation - Methodology G DS Problem: Not all Relations in G DS are relevant: How do I decide 1) What relations to select or not 2) When to Stop Traversing Solution: Investigate Relational Semantics: Schema Connectivity, Cardinality, Related Cardinality etc. Quantify Affinity of Relations
14 Af : Affinity of Relations to R DS in G DS DS R i R Distance Physical (fd), Logical (ld), ld=fd- M:N
15 Af : Affinity of Relations to R DS in G DS DS R i R Distance Physical (fd), Logical (ld), ld=fd- M:N E.g. Orders closer than Customer and CustomerDemo to Employees
16 Af : Affinity of Relations to R DS in G DS DS R i R Distance Physical (fd), Logical (ld), ld=fd- M:N E.g. Orders closer than Customer and CustomerDemo to Employees Hubs: spurious shortcuts Rather irrelevant or lateral information RC(R1, R2) R DS... N1: R hub 1:M R 2
17 Af : Affinity of Relations to R DS in G DS DS R i R Connectivity Schema Connectivity (Co i ) Data-graph Connectivity: Relative Cardinality (RC i j ), i.e. the average number of tuples of R i that are connected with each tuple from R j for 1:M RC i j = Ri / Rj for M:1 RC i j =1 Reverse Relative Cardinality (RRCi j) is the reverse of RC i j i.e. RRC i j =RC i j ).
18 Af DS R i R : Affinity of Relations to R DS in G DS DAf(Ri)={(m1, w1), (m2, w2),.. (mn, wn)} m1=f1(ldi), m2=f1(log(10*rci), m3=f1(log(10*rrci), m4=f1(log(10*coi) f1(α)=(11- α)/10 For a hub-child m1=f1(ldi *hi) and m2=f1(rci) Formula 1 (Semantic Affinity): The affinity of R i to R DS, denoted as Af DS, with respect to a schema R i R and a database conforming to the schema, can be calculated with the following formula: Af R R i DS = m j j w j Af R Parent R DS Where AfR Parent R DS is the affinity of the R i s Parent to R DS or is 1 if R Parent R DS.
19 Af DS R i R : Affinity of Relations to R DS in G DS G DS (θ)
20 Experimental Evaluation MS Northwind and TPC-H DBs Precision, Recall, F-Score Compare G DS s and OSs produced by G DS (θ) v G DS (h) G DS (h) was proposed by 10 participants G DS : average F-score 86.77, OS aver F-score 83 G DS Precision, Recall and F-score (Averages) <0.5, 0.4, 0.05, 0.05> OSs Precision, Recall and F-score (Averages) <0.5, 0.4, 0.05, 0.05> Precision Recall F-Score Precision Recall F-Score Customers Employees Suppliers Shippers Northwind Orders Products Customer Supplier Parts Orders TPC-H Nation Region 0 Customers Employees Suppliers Shippers Northwind Orders Products Customer Supplier Parts Orders TPC-H Nation Region
21 Affinity Ranking Correctness (Average) Affinity Ranking Correctness (Averages) Customers Employees Suppliers Shippers Orders Products Customer Supplier Parts Orders Nation Region Northwind TPC-H 100 * 100 d ( r i Af h Ri, r Ri )
22 A Novel Keyword Searching Paradigm: Object Summaries (OSs) Kw Search: Peacock Result: A Ranked set of OSs
23 Generation of Size-l Object Summaries Definition: A size-l OS Keyword Query is (1) a set of keywords and (2) a value for l; Example: Faloutsos with l=15. Query Result: a partial OS comprised only by l tuples and meet the following two criteria: (1) All l tuples are connected with the root of the OS tree and (2) The Importance of the size-l OS is the maximum. Importance of a Size-l OS Im(OS-size-l)=Σ(Im(OS, ti) Local Importance of a Tuple (Im(OS, ti)) Im(OS, ti)= Im(ti)*Af(ti)
24 Generation of Size-l Object Summaries 1. Brute-Force Algorithm generates firstly the complete OS (i.e. OS extractions of tuples I/O) then considers all candidate size-l OSs in order to find the optimal size-l OS (exponential in-memory operations). Very Expensive solution!!!
25 Generation of Size-l Object Summaries Greedy Size-l OS Generation Algorithms OS Property 1. Im(OS,ti) usually decreases with depth from tds. 2.1 Bottom-Up Pruning Size-l Algorithm Firstly generates the complete OS (similarly to the brute-force algorithm) And then prunes out from the bottom of the tree the k-l leaf nodes with the current smallest Im(OS, ti). Lemma 1: When the nodes of an OS have monotonically decreasing local Importance scores to their distance from the root (i.e. the score of an ancestor is always greater than its children s), then it returns the optimal size-l OS. Efficiency characteristics: OS I/O but only loglinear in memory very efficient when k is not significantly bigger than l, since fewer operations will be required (i.e. k-l is smaller). considerably cheaper than the brute force algorithm. Correctness: Very good approximations of the optimal size-l OS.
26 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
27 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
28 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
29 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
30 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
31 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
32 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
33 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
34 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
35 Size-l OS Generation Bottom-Up Pruning Size-l Algorithm
36 Generation of Size-l Object Summaries Greedy Size-l OS Generation Algorithms OS Property 1. Im(OS,ti) usually decreases with depth from tds. 2.1 Bottom-Up Pruning Size-l Algorithm Firstly generates the complete OS (similarly to the brute-force algorithm) And then prunes out from the bottom of the tree the k-l leaf nodes with the current smallest Im(OS, ti). Lemma 1: When the nodes of an OS have monotonically decreasing local Importance scores to their distance from the root (i.e. the score of an ancestor is always greater than its children s), then it returns the optimal size-l OS. Efficiency characteristics: OS I/O but only loglinear in memory very efficient when k is not significantly bigger than l, since fewer operations will be required (i.e. k-l is smaller). considerably cheaper than the brute force algorithm. Correctness: Very good approximations of the optimal size-l OS.
37 Generation of Size-l Object Summaries Greedy Size-l OS Generation Algorithms 2.1 Top-Down Size-l Algorithm Uses a Priority Queue to build the OS by expanding on the current tuple with the biggest local Importance score. Lemma 2: When the nodes of an OS have monotonically decreasing local Importance scores to their distance from the root, then the Top-Down Algorithm returns the optimal size-l OS. Efficiency characteristics: more efficient than both aforementioned algorithms when l is significantly smaller than k. less I/O operations (no need for the complete OS) and also less in memory operations. On the other hand, when k is not very big in comparison to l, this algorithm becomes more expensive than the Bottom-Up Pruning. Correctness: less effective because expanding on the best current local Importance value will not always lead us to good (near) optimal solution.
38 Size-l OS Generation Top-Down Size-l Algorithm
39 Size-l OS Generation Top-Down Size-l Algorithm PQ
40 Size-l OS Generation Top-Down Size-l Algorithm PQ
41 Size-l OS Generation Top-Down Size-l Algorithm PQ
42 Size-l OS Generation Top-Down Size-l Algorithm PQ
43 Size-l OS Generation Top-Down Size-l Algorithm PQ
44 Size-l OS Generation Top-Down Size-l Algorithm PQ
45 Size-l OS Generation Top-Down Size-l Algorithm PQ
46 Size-l OS Generation Top-Down Size-l Algorithm PQ
47 Size-l OS Generation Top-Down Size-l Algorithm PQ
48 Generation of Size-l Object Summaries Greedy Size-l OS Generation Algorithms 2.1 Top-Down Size-l Algorithm Uses a Priority Queue to build the OS by expanding on the current tuple with the biggest local Importance score. Lemma 2: When the nodes of an OS have monotonically decreasing local Importance scores to their distance from the root, then the Top-Down Algorithm returns the optimal size-l OS. Efficiency characteristics: more efficient than both aforementioned algorithms when l is significantly smaller than k. less I/O operations (no need for the complete OS) and also less in memory operations. On the other hand, when k is not very big in comparison to l, this algorithm becomes more expensive than the Bottom-Up Pruning. Correctness: less effective because expanding on the best current local Importance value will not always lead us to good (near) optimal solution.
49 Top-Down v Bottom Up Pruning Size-l Algorithm 1. Evidently for small OSs, the Bottom-Up Pruning is the best choice, since it always achieves better correctness and at the same time requires equal or even less time than the Top-Down Algorithm. 2. On the other hand for larger OSs (e.g. for OS >300), there are two alternatives: (1) speed (Top-Down is faster at least twice for any l<50) (2) or correctness (Bottom-Up achieves at least 10% better correctness).
50 Experimental Evaluation Database Cardinalities Size (MB) DBLP 2,959, TPC-H 8,661,245 1,100 Northwind 3, Parameter G A Range G A1, G A2, G A3 d (d 1, d 2, d 3 ) 0.85, 0.10, 0.99 All G DS (θ)s were generated with a common weight i.e. w i =0.25 and θ=0.70 and normalized Affinity.
51 Experimental Evaluation Efficiency of the two Size-l Algorithms DBLP Author (Aver OS =486) DBLP Paper (Aver OS =377) Time (s) 6 Time (s) Top-Dow n Bottom-Up Pruning l TPC-H Customer (Aver OS =179) (a) 2 Top-Dow n Bottom-Up Pruning TPC-H Supplier (Aver OS =1426) 24 l (b) Time (s) 2 Time (s) 8 1 Top-Dow n Bottom-Up Prunning l (c) 4 Top-Dow n Bottom-Up Prunning l (d)
52 Experimental Evaluation Correctness of the two greedy algorithms 100 DBLP Author (Aver( OS) =364) 100 DBLP Paper (Aver( OS) =279) 100 TPC-H Customer (Aver( OS) =179) Correctness 70 Correctness 70 Correctness Top-Dow n Bottom-Up Prunning 50 Top-Dow n Bottom-Up Prunning 50 Top-Dow n Bottom-Up Prunning l (a) l (b) l (c) 100 TPC-H Supplier (Aver( OS )=1425) 100 DBLP Author ( OS =67) 100 DBLP Author (Aver( OS) =364) Correctness Correctness Correctness Top-Dow n Bottom-Up Prunning l (d) 50 Top-Dow n Bottom-Up Prunning l (e) Top-Dow n Bottom-Up Prunning GA1-d1 GA2-d1 GA3-d1 GA1-d2 GA1-d3 (f) Settings that produced global Importance
53 Experimental Evaluation Effectiveness of Size-l OS for Northwind 100 DBLP Author DBLP Paper Northwind Employee Effectiveness GA1-d1 GA2-d1 GA3-d1 GA1-d2 GA1-d l (a) Effectiveness GA1-d1 GA2-d1 GA3-d1 30 GA1-d2 GA1-d l (b) Effectiveness GA1-d1 GA2-d1 GA3-d1 GA1-d2 GA1-d (c) l 100 Northwind Order 100 Size-15 OS 100 Size-30 OS Effectiveness GA1-d1 GA2-d1 GA3-d1 GA1-d2 GA1-d3 Effectiveness GA1-d2 30 GA1-d Author Paper Employee Order Author Paper Employee Order l (d) (e) (f) GA1-d1 GA2-d1 GA3-d1 Effectiveness GA1-d1 GA2-d1 GA3-d1 GA1-d2 GA1-d3
54 Conclusions -Novel Contributions The formal definition of the novel Search Paradigm which automatically produces OSs for a Data Subject. minimum contribution from the user (i.e. only a Kw) no prior knowledge of the DB schema or query language needed. Excellent Precision, Recall and F-score results The formal definition and quantification of Relation s Affinity in the context of G DS consider both Schema Design and Data distributions Generation of Size-l OS Efficient algorithms are proposed
55 Preliminaries: ObjectRank The ObjectRank of a node v i can be calculated: r = dar + (1 s d) S where A ij =α(e) if there is an edge e=(v i v j ) in E A D and 0 otherwise, d controls the Base Set importance and s=[s1,,sn] T is the Base Set vector for S. 0.7 cites Conference Year Paper Author 0 cited
56 Global Ranking of Tuples (Im(ti)): ValueRank The ValueRank of a node v i can be calculated using the same formula: r s = dar + (1 d) S The s i of a node v i in S can be calculated with the formula: s i =α+β f(v i ) The Authority Transfer Edges, either forward or backward denoted as a(e), can be calculated with the formula: Territories (3) Region (4) Employees (9) 0.2 Categories (8) Orders (830) OrderDetails (2155) s i = *f(UnitPrice*Quantity) Customers (91) *f(Price*Quantity) * *f(Price) f(price*quantity) Products (77) s i = *f(Price) α(e)=γ+δ f(v i v j ) *f(Freights) *f(Price) 0.3 Shippers (3) Suppliers (29) where α, β, γ and δ are tuning constants such that that α+β 1 and γ+δ 1 and f(.) is a normalisation function of the values of vi and vj (in the range [0, 1] rather than just 1 as in the case of ObjectRank).
57 Preliminary Evaluation: ValueRank v ObjecRank Tuple ID ObjectRank ValueRank Total Orders {UnitPrice*Quantity, Freight, Price } Employee Employee Shipper Product Product Customer SAVEA Customer QUICK Supplier Supplier ObjectRank connectivity ValueRanks values+connectivity Maximum values per relation are indicated in bold.
58 Local Ranking of Tuples (Im(OS, ti)) The local Importance of each tuple t i of an OS can be calculated with: Im(OS, t i )= Im(t i ) α *Af(t i ) β where Im(t i ) is the global Importance of t i (e.g. its ValueRank or ObjectRank), Af(t i ) is the Affinity of t i to the t DS, α and β are tuning constants. The product of Im(t i ) with AfR(t i ) actually reduces the Importance contribution of each tuple towards the overall Im(OS).
59 Inter-Relation Tuple Ranking Summary of ValueRank of Northwind Northwind R i Minimum Median Maximum Employees Territories Region Orders Customer Shipper OrderDetails Product Supplier Categories The results are based on GA_northwind and d=0.85 The earlier work ObjectRank did not investigate interrelation ranking of tuples in depth.
60 Inter-Relation Tuple Ranking ValueRank v ObjecRank Tuple ID ValueRank ObjectRank Total Orders {UnitPrice*Quantity, Freight, Price } Customer SAVEA ,673.4 Customer QUICK , Shipper ,185.3 Shipper Product ,984.2 Product ,296.0 ObjectRank connectivity ValueRanks values+connectivity Maximum values per relation are indicated in bold.. Employee ,187.4 Employee , Supplier Supplier
61 Af DS R i R : Affinity of Relations to R DS in G DS R DS ld i, RC i, Employees Customer Order Shipper m 1..m 4 Af Ri Af Ri (r Ri ) Af Ri (r Ri ) Af Ri (r Ri ) R i RC, Co i i Employees R DS R DS (3) 0.97 (4) 0.82 (4) Employees (ReportsTo) 1, 1, 0.9, 4 1, 1, 1, (5) 0.91 (5) 0.73 (5) Employees (ReportedBy) 1, 0.9, 1, 4 1, 1, 1, (7) 0.85 (7) 0.66 (7) Territories 1, 5.4, 1, 2 1, 0.9, 1, (10) 0.66 (10) 0.51 (10) Region 2, 1, 13.2, 1 0.9, 1, 0.88, (11) 0.59 (11) 0.43 (11) Order 1, 92.2, 1, 4 1, 0.8, 1, (1) 1 (R DS ) 0.89 (1) Customer 2, 1, 9.1, 2 0.9, 1, 0.9, (R DS ) 0.99 (1) 0.83 (2) Shipper 2, 1, 276.6, 1 0.9, 1, 0.75, (2) 0.98 (2) 1 (R DS ) OrderDetails 2, 2.5, 1, 2 0.9, 0.96, 1, (4) 0.97 (3) 0.82 (3) Product 3, 1, 43.9, 4 0.8, 1, 0.83, (6) 0.91 (6) 0.73 (6) Supplier 4, 1, 1.6, 1 0.7, 1, 0.9, (8) 0.82 (8) 0.62 (8) Categories 4, 1, 6.1, 1 0.7, 1, 0.92, (9) 0.81 (9) 0.61 (9) CustDemographics 3, null, null, 1 0.8, null, null, 1 Null Null Null Null
Chapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationMining XML Functional Dependencies through Formal Concept Analysis
Mining XML Functional Dependencies through Formal Concept Analysis Viorica Varga May 6, 2010 Outline Definitions for XML Functional Dependencies Introduction to FCA FCA tool to detect XML FDs Finding XML
More informationAn Appropriate Search Algorithm for Finding Grid Resources
An Appropriate Search Algorithm for Finding Grid Resources Olusegun O. A. 1, Babatunde A. N. 2, Omotehinwa T. O. 3,Aremu D. R. 4, Balogun B. F. 5 1,4 Department of Computer Science University of Ilorin,
More informationMining Frequently Changing Substructures from Historical Unordered XML Documents
1 Mining Frequently Changing Substructures from Historical Unordered XML Documents Q Zhao S S. Bhowmick M Mohania Y Kambayashi Abstract Recently, there is an increasing research efforts in XML data mining.
More informationEstimating the Quality of Databases
Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality
More informationTRIE BASED METHODS FOR STRING SIMILARTIY JOINS
TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationEfficient pebbling for list traversal synopses
Efficient pebbling for list traversal synopses Yossi Matias Ely Porat Tel Aviv University Bar-Ilan University & Tel Aviv University Abstract 1 Introduction 1.1 Applications Consider a program P running
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More informationCentralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationTree. A path is a connected sequence of edges. A tree topology is acyclic there is no loop.
Tree A tree consists of a set of nodes and a set of edges connecting pairs of nodes. A tree has the property that there is exactly one path (no more, no less) between any pair of nodes. A path is a connected
More informationInternational Journal of Advance Engineering and Research Development. Performance Enhancement of Search System
Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade
More informationA Graph Method for Keyword-based Selection of the top-k Databases
This is the Pre-Published Version A Graph Method for Keyword-based Selection of the top-k Databases Quang Hieu Vu 1, Beng Chin Ooi 1, Dimitris Papadias 2, Anthony K. H. Tung 1 hieuvq@nus.edu.sg, ooibc@comp.nus.edu.sg,
More informationQueries with Order-by Clauses and Aggregates on Factorised Relational Data
Queries with Order-by Clauses and Aggregates on Factorised Relational Data Tomáš Kočiský Magdalen College University of Oxford A fourth year project report submitted for the degree of Masters of Mathematics
More informationEfficient LCA based Keyword Search in XML Data
Efficient LCA based Keyword Search in XML Data Yu Xu Teradata San Diego, CA yu.xu@teradata.com Yannis Papakonstantinou University of California, San Diego San Diego, CA yannis@cs.ucsd.edu ABSTRACT Keyword
More informationBinary Trees
Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what
More informationLecture: Analysis of Algorithms (CS )
Lecture: Analysis of Algorithms (CS583-002) Amarda Shehu Fall 2017 1 Binary Search Trees Traversals, Querying, Insertion, and Deletion Sorting with BSTs 2 Example: Red-black Trees Height of a Red-black
More informationNotes on Binary Dumbbell Trees
Notes on Binary Dumbbell Trees Michiel Smid March 23, 2012 Abstract Dumbbell trees were introduced in [1]. A detailed description of non-binary dumbbell trees appears in Chapter 11 of [3]. These notes
More informationkd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )
kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against
More informationTo calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.
3: Summary Statistics Notation Consider these 10 ages (in years): 1 4 5 11 30 50 8 7 4 5 The symbol n represents the sample size (n = 10). The capital letter X denotes the variable. x i represents the
More informationEffective Top-k Keyword Search in Relational Databases Considering Query Semantics
Effective Top-k Keyword Search in Relational Databases Considering Query Semantics Yanwei Xu 1,2, Yoshiharu Ishikawa 1, and Jihong Guan 2 1 Graduate School of Information Science, Nagoya University, Japan
More informationOutline. Other Use of Triangle Inequality Algorithms for Nearest Neighbor Search: Lecture 2. Orchard s Algorithm. Chapter VI
Other Use of Triangle Ineuality Algorithms for Nearest Neighbor Search: Lecture 2 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology Outline
More informationTwo hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE
COMP 62421 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Querying Data on the Web Date: Wednesday 24th January 2018 Time: 14:00-16:00 Please answer all FIVE Questions provided. They amount
More informationGreedy Approach: Intro
Greedy Approach: Intro Applies to optimization problems only Problem solving consists of a series of actions/steps Each action must be 1. Feasible 2. Locally optimal 3. Irrevocable Motivation: If always
More informationBacktracking. Chapter 5
1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for
More informationComputational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs
Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in
More informationCentrality Measures. Computing Closeness and Betweennes. Andrea Marino. Pisa, February PhD Course on Graph Mining Algorithms, Università di Pisa
Computing Closeness and Betweennes PhD Course on Graph Mining Algorithms, Università di Pisa Pisa, February 2018 Centrality measures The problem of identifying the most central nodes in a network is a
More informationKEYWORD search is a well known method for extracting
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 7, JULY 2014 1657 Efficient Duplication Free and Minimal Keyword Search in Graphs Mehdi Kargar, Student Member, IEEE, Aijun An, Member,
More informationAdvanced Crawling Techniques. Outline. Web Crawler. Chapter 6. Selective Crawling Focused Crawling Distributed Crawling Web Dynamics
Chapter 6 Advanced Crawling Techniques Outline Selective Crawling Focused Crawling Distributed Crawling Web Dynamics Web Crawler Program that autonomously navigates the web and downloads documents For
More informationDiscovering Frequently Changing Structures from Historical Structural Deltas of Unordered XML
Discovering Frequently Changing Structures from Historical Structural Deltas of Unordered XML Qiankun Zhao Sourav S Bhowmick Mukesh Mohania Yahiko Kambayashi CAIS, Nanyang Technological University, Singapore.
More informationIncremental Query Optimization
Incremental Query Optimization Vipul Venkataraman Dr. S. Sudarshan Computer Science and Engineering Indian Institute of Technology Bombay Outline Introduction Volcano Cascades Incremental Optimization
More informationOptimization I : Brute force and Greedy strategy
Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean
More informationStructural and Syntactic Pattern Recognition
Structural and Syntactic Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent
More informationLecture 6: Analysis of Algorithms (CS )
Lecture 6: Analysis of Algorithms (CS583-002) Amarda Shehu October 08, 2014 1 Outline of Today s Class 2 Traversals Querying Insertion and Deletion Sorting with BSTs 3 Red-black Trees Height of a Red-black
More informationOptimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C
Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of California, San Diego CA 92093{0114, USA Abstract. We
More informationHeaps Outline and Required Reading: Heaps ( 7.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic
1 Heaps Outline and Required Reading: Heaps (.3) COSC 2011, Fall 2003, Section A Instructor: N. Vlajic Heap ADT 2 Heap binary tree (T) that stores a collection of keys at its internal nodes and satisfies
More informationComputing optimal total vertex covers for trees
Computing optimal total vertex covers for trees Pak Ching Li Department of Computer Science University of Manitoba Winnipeg, Manitoba Canada R3T 2N2 Abstract. Let G = (V, E) be a simple, undirected, connected
More informationDATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305
Q.1 If h is any hashing function and is used to hash n keys in to a table of size m, where n
More informationAutomated Generation of Personal Data Reports from Relational Databases
Journal of Information & Knowledge Management, Vol. 10, No. 2 (2011 193 208 #.c World Scienti c Publishing Co. DOI: 10.1142/S0219649211002936 Automated Generation of Personal Data Reports from Relational
More informationHeaps Goodrich, Tamassia. Heaps 1
Heaps Heaps 1 Recall Priority Queue ADT A priority queue stores a collection of entries Each entry is a pair (key, value) Main methods of the Priority Queue ADT insert(k, x) inserts an entry with key k
More informationLECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS
Department of Computer Science University of Babylon LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS By Faculty of Science for Women( SCIW), University of Babylon, Iraq Samaher@uobabylon.edu.iq
More informationKeyword search in databases: the power of RDBMS
Keyword search in databases: the power of RDBMS 1 Introduc
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationst-orientations September 29, 2005
st-orientations September 29, 2005 Introduction Let G = (V, E) be an undirected biconnected graph of n nodes and m edges. The main problem this chapter deals with is different algorithms for orienting
More informationFastQRE: Fast Query Reverse Engineering
FastQRE: Fast Query Reverse Engineering Dmitri V. Kalashnikov AT&T Labs Research dvk@research.att.com Laks V.S. Lakshmanan University of British Columbia laks@cs.ubc.ca Divesh Srivastava AT&T Labs Research
More informationThe Threshold Algorithm: from Middleware Systems to the Relational Engine
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.?, NO.?,?? 1 The Threshold Algorithm: from Middleware Systems to the Relational Engine Nicolas Bruno Microsoft Research nicolasb@microsoft.com Hui(Wendy)
More informationFinding k-paths in Cycle Free Graphs
Finding k-paths in Cycle Free Graphs Aviv Reznik Under the Supervision of Professor Oded Goldreich Department of Computer Science and Applied Mathematics Weizmann Institute of Science Submitted for the
More informationAnswering Aggregate Queries Over Large RDF Graphs
1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo
More informationFDB: A Query Engine for Factorised Relational Databases
FDB: A Query Engine for Factorised Relational Databases Nurzhan Bakibayev, Dan Olteanu, and Jakub Zavodny Oxford CS Christan Grant cgrant@cise.ufl.edu University of Florida November 1, 2013 cgrant (UF)
More informationHOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery
HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,
More informationKnowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey
Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya
More informationInformation Retrieval. Wesley Mathew
Information Retrieval Wesley Mathew 30-11-2012 Introduction and motivation Indexing methods B-Tree and the B+ Tree R-Tree IR- Tree Location-aware top-k text query 2 An increasing amount of trajectory data
More informationLink Structure Analysis
Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score
More informationBidirectional Expansion For Keyword Search on Graph Databases
Bidirectional Expansion For Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan Rushi Desai Hrishikesh Karambelkar Indian Institute of Technology, Bombay varunk@acm.org
More informationReverse Engineering Aggregation Queries
Reverse Engineering Aggregation Queries Wei Chit Tan Meihui Zhang Hazem Elmeleegy 2 Divesh Srivastava 3 Singapore University of Technology and Design, 2 Turn Inc., 3 AT&T Labs-Research weichit tan@mymail.sutd.edu.sg,
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationMining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports
Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,
More informationBasant Group of Institution
Basant Group of Institution Visual Basic 6.0 Objective Question Q.1 In the relational modes, cardinality is termed as: (A) Number of tuples. (B) Number of attributes. (C) Number of tables. (D) Number of
More informationHeaps 2. Recall Priority Queue ADT. Heaps 3/19/14
Heaps 3// Presentation for use with the textbook Data Structures and Algorithms in Java, th edition, by M. T. Goodrich, R. Tamassia, and M. H. Goldwasser, Wiley, 0 Heaps Heaps Recall Priority Queue ADT
More informationExploring Econometric Model Selection Using Sensitivity Analysis
Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover
More informationΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου
Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs
More informationLECTURE 11 TREE TRAVERSALS
DATA STRUCTURES AND ALGORITHMS LECTURE 11 TREE TRAVERSALS IMRAN IHSAN ASSISTANT PROFESSOR AIR UNIVERSITY, ISLAMABAD BACKGROUND All the objects stored in an array or linked list can be accessed sequentially
More informationMahathma Gandhi University
Mahathma Gandhi University BSc Computer science III Semester BCS 303 OBJECTIVE TYPE QUESTIONS Choose the correct or best alternative in the following: Q.1 In the relational modes, cardinality is termed
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationlooking ahead to see the optimum
! Make choice based on immediate rewards rather than looking ahead to see the optimum! In many cases this is effective as the look ahead variation can require exponential time as the number of possible
More informationLecture 8 13 March, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 8 13 March, 2012 1 From Last Lectures... In the previous lecture, we discussed the External Memory and Cache Oblivious memory models.
More informationBinary Trees, Binary Search Trees
Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)
More informationLOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor
LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES Theodoros Rekatsinas, Amol Deshpande, Lise Getoor Motivation Probabilistic databases store, manage and query uncertain data Numerous applications
More informationKeyword query interpretation over structured data
Keyword query interpretation over structured data Advanced Methods of IR Elena Demidova Materials used in the slides: Jeffrey Xu Yu, Lu Qin, Lijun Chang. Keyword Search in Databases. Synthesis Lectures
More informationLecture 6: Suffix Trees and Their Construction
Biosequence Algorithms, Spring 2007 Lecture 6: Suffix Trees and Their Construction Pekka Kilpeläinen University of Kuopio Department of Computer Science BSA Lecture 6: Intro to suffix trees p.1/46 II:
More informationCS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics
CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to
More informationNED: An Inter-Graph Node Metric Based On Edit Distance
NED: An Inter-Graph Node Metric Based On Edit Distance Haohan Zhu Facebook Inc. zhuhaohan@fb.com Xianrui Meng Apple Inc. xmeng@apple.com George Kollios Boston University gkollios@cs.bu.edu ABSTRACT Node
More informationN N Sudoku Solver. Sequential and Parallel Computing
N N Sudoku Solver Sequential and Parallel Computing Abdulaziz Aljohani Computer Science. Rochester Institute of Technology, RIT Rochester, United States aaa4020@rit.edu Abstract 'Sudoku' is a logic-based
More informationPriority Queues. Priority Queues Trees and Heaps Representations of Heaps Algorithms on Heaps Building a Heap Heapsort.
Priority Queues Trees and Heaps Representations of Heaps Algorithms on Heaps Building a Heap Heapsort Philip Bille Priority Queues Trees and Heaps Representations of Heaps Algorithms on Heaps Building
More information01/01/2017. Chapter 5: The Relational Data Model and Relational Database Constraints: Outline. Chapter 5: Relational Database Constraints
Chapter 5: The Relational Data Model and Relational Database Constraints: Outline Ramez Elmasri, Shamkant B. Navathe(2017) Fundamentals of Database Systems (7th Edition),pearson, isbn 10: 0-13-397077-9;isbn-13:978-0-13-397077-7.
More informationSearch Techniques for Fourier-Based Learning
Search Techniques for Fourier-Based Learning Adam Drake and Dan Ventura Computer Science Department Brigham Young University {acd2,ventura}@cs.byu.edu Abstract Fourier-based learning algorithms rely on
More information8. Relational Calculus (Part II)
8. Relational Calculus (Part II) Relational Calculus, as defined in the previous chapter, provides the theoretical foundations for the design of practical data sub-languages (DSL). In this chapter, we
More informationState Space Search. Many problems can be represented as a set of states and a set of rules of how one state is transformed to another.
State Space Search Many problems can be represented as a set of states and a set of rules of how one state is transformed to another. The problem is how to reach a particular goal state, starting from
More informationConstraint Satisfaction Problems
Constraint Satisfaction Problems Search and Lookahead Bernhard Nebel, Julien Hué, and Stefan Wölfl Albert-Ludwigs-Universität Freiburg June 4/6, 2012 Nebel, Hué and Wölfl (Universität Freiburg) Constraint
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationBalanced BST. Balanced BSTs guarantee O(logN) performance at all times
Balanced BST Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees are about the same simple BST are O(N) in the worst case Categories of BSTs AVL, SPLAY trees:
More informationA model of navigation history
A model of navigation history Connor G. Brewster Alan Jeffrey August, 6 arxiv:68.5v [cs.se] 8 Aug 6 Abstract: Navigation has been a core component of the web since its inception: users and scripts can
More informationLearning Goals. CS221: Algorithms and Data Structures Lecture #3 Mind Your Priority Queues. Today s Outline. Back to Queues. Priority Queue ADT
CS: Algorithms and Data Structures Lecture # Mind Your Priority Queues Steve Wolfman 0W Learning Goals Provide examples of appropriate applications for priority queues. Describe efficient implementations
More informationChapter 6. Dynamic Programming
Chapter 6 Dynamic Programming CS 573: Algorithms, Fall 203 September 2, 203 6. Maximum Weighted Independent Set in Trees 6..0. Maximum Weight Independent Set Problem Input Graph G = (V, E) and weights
More informationEfficient Non-Sequential Access and More Ordering Choices in a Search Tree
Efficient Non-Sequential Access and More Ordering Choices in a Search Tree Lubomir Stanchev Computer Science Department Indiana University - Purdue University Fort Wayne Fort Wayne, IN, USA stanchel@ipfw.edu
More informationSPARK: Top-k Keyword Query in Relational Database
SPARK: Top-k Keyword Query in Relational Database Wei Wang University of New South Wales Australia 20/03/2007 1 Outline Demo & Introduction Ranking Query Evaluation Conclusions 20/03/2007 2 Demo 20/03/2007
More informationComposition Systems. Composition Systems. Contents. Contents. What s s in this paper. Introduction. On-Line Character Recognition.
Contents S. Geman, D.F. Potter, Z. Chi Presented by Haibin Ling 12. 2. 2003 Definition: Compositionality refers to the evident ability of humans to represent entities as hierarchies of parts,, with these
More informationTrees 2: Linked Representation, Tree Traversal, and Binary Search Trees
Trees 2: Linked Representation, Tree Traversal, and Binary Search Trees Linked representation of binary tree Again, as with linked list, entire tree can be represented with a single pointer -- in this
More informationScalable Evaluation of k-nn Queries on Large Uncertain Graphs
Scalable Evaluation of k-nn Queries on Large Uncertain Graphs Xiaodong Li 1, Reynold Cheng 1, Yixiang Fang 1, Jiafeng Hu 1, Silviu Maniu 2 1 The University of Hong Kong, China, 2 Université Paris-Sud,
More informationTREES cs2420 Introduction to Algorithms and Data Structures Spring 2015
TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015 1 administrivia 2 -assignment 7 due Thursday at midnight -asking for regrades through assignment 5 and midterm must be complete by
More informationChapter 5. Binary Trees
Chapter 5 Binary Trees Definitions and Properties A binary tree is made up of a finite set of elements called nodes It consists of a root and two subtrees There is an edge from the root to its children
More informationImplementation of Skyline Sweeping Algorithm
Implementation of Skyline Sweeping Algorithm BETHINEEDI VEERENDRA M.TECH (CSE) K.I.T.S. DIVILI Mail id:veeru506@gmail.com B.VENKATESWARA REDDY Assistant Professor K.I.T.S. DIVILI Mail id: bvr001@gmail.com
More informationOn The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches
On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches E. Miller R. Libeskind-Hadas D. Barnard W. Chang K. Dresner W. M. Turner
More information