itrails: Pay-as-you-go Information Integration in Dataspaces

Size: px
Start display at page:

Download "itrails: Pay-as-you-go Information Integration in Dataspaces"

Transcription

1 itrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich Shant Karakashian Olivier Girard Lukas Blunschi ETH Zurich VLDB 2007

2 Outline Motivation itrails Experiments Conclusions and Future Work 2

3 Problem: Querying Several Sources Query What is the impact of global warming in Zurich????? Systems Data Sources Laptop Server Web Server DB Server 3

4 Solution 1: Use a Search Engine Job! Query global warming zurich System Graph IR Search Engine Drawback: Query semantics are not precise! TopX [VLDB05], FleXPath [SIGMOD04], XSearch [VLDB03], XRank [SIGMOD03] Data Sources Laptop text, links text, links Server text, links Web Server text, links DB Server 4

5 Solution 2: Use an Information Integration System //Temperatures/*[city = zurich ] Query... Laptop Server Information Integration System Temps Cities Drawback: Too much... CO 2 effort to provide Sunspots schema mappings!. GAV (e.g. [ICDE95]), LAV (e.g. [VLDB96]), GLAV [AAAI99], P2P (e.g. [SIGMOD04]) missing schema mapping. missing schema mapping Web Server schema mapping DB Server schema mapping System Data Sources 5

6 Research Challenge: Is There an Integration Solution in-between These Two Extremes? global warming zurich global warming zurich //Temperatures/*[city = zurich ] Graph IR Search Engine Dataspace System Temps CO 2. Cities Sunspots Information Integration System Pay-as-you-go Information text, links Integration text, links? text, links text, links text, links full-blown schema mappings Data Sources Laptop Server Web Server DB Server Data Sources Dataspace Vision by Franklin, Halevy, and Maier [SIGMOD Record 05] 6

7 Outline Motivation itrails Experiments Conclusions and Future Work 7

8 itrails Core Idea: Add Integration Hints Incrementally Step 1: Provide a search service over all the data Use a general graph data model (see VLDB 2006) Works for unstructured documents, XML, and relations Step 2: Add integration semantics via hints (trails) on top of the graph Works across data sources, not only between sources Step 3: If more semantics needed, go back to step 2 Impact: Smooth transition between search and data integration Semantics added incrementally improve precision / recall 8

9 itrails: Defining Trails Basic Form of a Trail Q L [.C L ] Q R [.C R ] Queries: NEXI-like keyword and path expressions Attribute projections Intuition: When I query for Q L [.C L ], you should also query for Q R [.C R ] 9

10 Trail Examples: Global Warming Zurich global warming zurich Temperatures date 24-Sep 24-Sep 25-Sep 26-Sep city Bern Uster Zurich Zurich region BE ZH ZH ZH celsius Trail for Implicit Meaning: When I query for global warming, you should also query for Temperature data above 10 degrees global warming //Temperatures/*[celsius > 10] Trail for an Entity: When I query for zurich, you should also query for references of zurich as a region DB Server zurich //*[region = ZH ] 10

11 Trail Example: Deep Web Bookmarks train home Web Server Trail for a Bookmark: When I query for train home, you should also query for the TrainCompany s website with origin at ETH Uni and destination at Seilbahn Rigiblick train home //traincompany.com//*[origin= ETH Uni and dest = Seilbahn Rigiblick ] 11

12 Trail Examples: Thesauri, Dictionaries, Language-agnostic Search Laptop Server car auto Trail for Thesauri: When I query for car, you should also query for auto car auto car carro Trails for Dictionary: When I query for car, you should also query for carro and vice-versa car carro carro car 12

13 Trail Examples: Schema Equivalences Employee empid empname salary Trail for schema match on names: When I query for Employee.empName, you should also query for Person.name DB Server Person //Employee//*.tuple.empName //Person//*.tuple.name SSN name age income Trail for schema match on salaries: When I query for Employee.salary, you should also query for Person.income //Employee//*.tuple.salary //Person//*.tuple.income 13

14 Outline Motivation itrails Experiments Conclusion and Future Work Core Idea Trail Examples How are Trails Created? Uncertainty and Trails Rewriting Queries with Trails Recursive Matches 14

15 How are Trails Created? Given by the user Explicitly Via Relevance Feedback (Semi-)Automatically Information extraction techniques Automatic schema matching Ontologies and thesauri (e.g., wordnet) User communities (e.g., trails on gene data, bookmarks) 15

16 Uncertainty and Trails Probabilistic Trails: model uncertain trails probabilities used to rank trails Q L [.C L ] Q R [.C R ], 0 p 1 p Example: car auto p =

17 Certainty and Trails Scored Trails: give higher value to certain trails scoring factors used to boost scores of query results obtained by the trail Examples: Q L [.C L ] Q R [.C R ], sf > 1 - T 1 :weather //Temperatures/* sf p = 0.9, sf = 2 - T 2 :yesterday //*[date = today() 1] p = 1, sf = 3 17

18 Rewriting Queries with Trails Query U weather yesterday T 2 matches U (3) Merging weather U //*[date = today() 1] yesterday Trail T 2 : yesterday //*[date = today() 1] (1) Matching (2) Transformation 18

19 Replacing Trails Trails that use replace instead of union semantics Query U U (3) Merging weather yesterday weather //*[date = today() 1] T 2 matches Trail T 2 : yesterday //*[date = today() 1] (1) Matching (2) Transformation 19

20 Problem: Recursive Matches (1/2) U weather yesterday T 2 matches U //*[date = today() 1] New query still matches T 2, so T 2 could be applied again U T 2 : yesterday //*[date = today() 1] T 2 matches weather U //*[date = today() 1] U //*[date = today() 1]... U yesterday //*[date = today() 1] U... //*[date = today() 1] Infinite recursion 20

21 Problem: Recursive Matches (2/2) U weather yesterday T 3 matches U Trails may be //*[date = today() 1] mutually recursive U T 3 : //*.tuple.date //*.tuple.modified weather U yesterday U //*[modified = today() 1] //*[date = today() 1] T 10 matches T 10 : //*.tuple.modified //*.tuple.date U weather U yesterday U and enter an infinite loop //*[date = today() 1] U //*[date = today() 1] //*[modified = today() 1] We again match T 3 21

22 Solution: Multiple Match Coloring Algorithm U weather yesterday T 1 matches First Level T 2 matches weather weather T 1 : weather //Temperatures/* T 2 : yesterday //*[date = today() 1] T 3 : //*.tuple.date //*.tuple.modified T 4 : //*.tuple.date //*.tuple.received U U U //*[date = today() 1] yesterday //Temperatures/* U U Second Level U yesterday //Temperatures/* U U //*[modified = today() 1] T 3, T 4 match //*[date = today() 1] //*[received = today() 1] 22

23 Multiple Match Coloring Algorithm Analysis Problem: MMCA is exponential in number of levels Solution: Trail Pruning Prune by number of levels Prune by top-k trails matched in each level Prune by both top-k trails and number of levels 23

24 Outline Motivation itrails Experiments Conclusion and Future Work 24

25 itrails Evaluation in imemex imemex Dataspace System: Open-source prototype available at Main Questions in Evaluation Quality: Top-K Precision and Recall Performance: Use of Materialization Scalability: Query-rewrite Time vs. Number of Trails 25

26 itrails Evaluation in imemex Scenario 1: Few High-quality Trails Closer to information integration use cases Obtained real datasets and indexed them 18 hand-crafted trails 14 hand-crafted queries Scenario 2: Many Low-quality Trails Closer to search use cases Generated up to 10,000 trails 26

27 itrails Evaluation in imemex: Scenario 1 Configured imemex to act in three modes Baseline: Graph / IR search engine itrails: Rewrite search queries with trails Perfect Query: Semantics-aware query Data: shipped to central index sizes in MB Laptop Web Server Server DB Server 27

28 Quality: Top-K Precision and Recall K = 20 Scenario 1: few high-quality trails (18 trails) Search Engine misses relevant results Queries perfect query Perfect Query always has precision and recall equal to 1 Search Query is partially semantics-aware Q3: pdf yesterday Q13: to = raimund.grube@ enron.com 28

29 Performance: Use of Materialization Scenario 1: few high-quality trails (18 trails) Trail merging adds overhead to query execution Trail Materialization provides interactive times for all queries response times in sec. 29

30 Scalability: Query-rewrite Time vs. Number of Trails Scenario 2: many low-quality trails Query-rewrite time can be controlled with pruning 30

31 Conclusion: Pay-as-you-go Information Integration Step 1: Provide a search service over all the data Step 2: Add integration semantics via trails global warming zurich text, links Data Sources Step 3: If more semantics needed, go back to step 2 Our Contributions itrails: generic method to model semantic relationships (e.g. implicit meaning, bookmarks, dictionaries, thesauri, attribute matches,...) We propose a framework and algorithms for Pay-as-yougo Information Integration Smooth transition between search and data integration Dataspace System 31

32 Future Work Trail Creation Use collections (ontologies, thesauri, wikipedia) Work on automatic mining of trails from the dataspace Other types of trails Associations Lineage 32

33 Questions? Thanks in advance for your feedback! 33

34 Backup Slides 34

35 Problem: Global Warming in Zurich Query: What is the impact of global warming in Zurich? Search for: global warming zurich Meaning of keyword query global warming should lead to query on Temperatures zurich should lead to a query for a city 35

36 Problem: PDF Yesterday Query: Retrieve all PDF documents added/modified yesterday Search for: pdf yesterday Meaning of keywords pdf and yesterday Different sources, different schemas: Laptop: modified received DBMS: changed 36

37 Related Work: Search vs. Data Integration vs. Dataspaces Integration Solution Search Dataspaces Data Integration Features Integration Effort Low Pay-as-yougo High Query Semantics Precision / Recall Precision / Recall Precise Need for Schema Schemalater Schemanever Schemafirst 37

38 Personal Dataspaces Literature Dittrich, Salles, Kossmann, Blunschi. imemex: Escapes from the Personal Information Jungle (Demo Paper). VLDB, September Dittrich, Salles. idm: A Unified and Versatile Data Model for Personal Dataspace Management. VLDB, September 2006 Dittrich. imemex: A Platform for Personal Dataspace Management. SIGIR PIM, August Blunschi, Dittrich, Girard, Karakashian, Salles. A Dataspace Odyssey: The imemex Personal Dataspace Management System (Demo Paper). CIDR, January Dittrich, Blunschi, Färber, Girard, Karakashian, Salles. From Personal Desktops to Personal Dataspaces: A Report on Building the imemex Personal Dataspace Management System. BTW 2007, March 2007 Salles, Dittrich, Karakashian, Girard, Blunschi. itrails: Pay-as-yougo Information Integration in Dataspaces. VLDB, September

39 idm: imemex Data Model Our approach: get the data model closer to personal information not the other way around Supports: Unstructured, semi-structured and structured data, e.g., files&folders, XML, relations Clearly separation of logical and physical representation of data Arbitrary directed graph structures, e.g., section references in LaTeX documents, links in filesystems, etc Lazily computed data, e.g., ActiveXML (Abiteboul et. al.) Infinite data, e.g., media and data streams See VLDB

40 Data Model Options Bag of Words Data Models Relational XML idm Nonschematic data Support for Personal Data Serialization independent Support for Graph data Support for Lazy Computation Specific schema View mechanism Extension: XLink/ XPointer Extension: ActiveXML Support for Infinite data Extension: Document streams Extension: Relational streams Extension: XML streams 40

41 Data Models for Personal Information lower Abstraction Level higher Physical Level Relational idm Personal Information XML Document / Bag of Words 41

42 Architectural Perspective of imemex Complex operators (query algebra)... Search & Browse iql Query Processor Application Layer imemex PDSMS Operators Catalog Office Tools... Data Store Physical Algebra Result Cache Indexes&Replicas access (warehousing) idm Query Processor Operators Catalog Data Store Data Cleaning Indexes & Replicas Data source access (mediation) Data Source Query Processor Data Source Plugins Operators Catalog Content Converters... File System Data Source Layer IMAP DBMS... 42

imemex: From Search to Information Integration and Back

imemex: From Search to Information Integration and Back imemex: From Search to Information Integration and Back Jens Dittrich Marcos Antonio Vaz Salles Lukas Blunschi Saarland University Cornell University ETH Zurich Abstract In this paper, we report on lessons

More information

A Dataspace Odyssey: The imemex Personal Dataspace Management System

A Dataspace Odyssey: The imemex Personal Dataspace Management System A Dataspace Odyssey: The imemex Personal Dataspace Management System Lukas Blunschi Jens-Peter Dittrich Olivier René Girard Shant Kirakos Karakashian ETH Zurich, Switzerland dbis.ethz.ch imemex.org Marcos

More information

Dataspaces: The Tutorial. Day 2

Dataspaces: The Tutorial. Day 2 Dataspaces: The Tutorial Day 2 Alon Halevy, David Maier VLDB 28 Auckland, New Zealand Outline Dataspaces: why? What are they? Dataspaces: why? What are they? Examples and motivation Dataspace techniques:

More information

Principles of Dataspaces

Principles of Dataspaces Principles of Dataspaces Seminar From Databases to Dataspaces Summer Term 2007 Monika Podolecheva University of Konstanz Department of Computer and Information Science Tutor: Prof. M. Scholl, Alexander

More information

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not

More information

Flexible Dataspace Management Through Model Management

Flexible Dataspace Management Through Model Management Flexible Dataspace Management Through Model Management Cornelia Hedeler, Khalid Belhajjame, Lu Mao, Norman W. Paton, Alvaro A.A. Fernandes, Chenjuan Guo, and Suzanne M. Embury School of Computer Science,

More information

Computer-based Tracking Protocols: Improving Communication between Databases

Computer-based Tracking Protocols: Improving Communication between Databases Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability

More information

OKKAM-based instance level integration

OKKAM-based instance level integration OKKAM-based instance level integration Paolo Bouquet W3C RDF2RDB This work is co-funded by the European Commission in the context of the Large-scale Integrated project OKKAM (GA 215032) RoadMap Using the

More information

Extracting and Querying Probabilistic Information From Text in BayesStore-IE

Extracting and Querying Probabilistic Information From Text in BayesStore-IE Extracting and Querying Probabilistic Information From Text in BayesStore-IE Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis 2, Joseph M. Hellerstein University of California, Berkeley Technical

More information

Research Collection. Indexed semantic mapping rules. Master Thesis. ETH Library. Author(s): Zhang, Haoning. Publication Date: 2009

Research Collection. Indexed semantic mapping rules. Master Thesis. ETH Library. Author(s): Zhang, Haoning. Publication Date: 2009 Research Collection Master Thesis Indexed semantic mapping rules Author(s): Zhang, Haoning Publication Date: 29 Permanent Link: https://doi.org/1.3929/ethz-a-577358 Rights / License: In Copyright - Non-Commercial

More information

Intuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs

Intuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs Intuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs Nandish Jayaram University of Texas at Arlington PhD Advisors: Dr. Chengkai Li, Dr. Ramez

More information

Data Integration and Data Warehousing Database Integration Overview

Data Integration and Data Warehousing Database Integration Overview Data Integration and Data Warehousing Database Integration Overview Sergey Stupnikov Institute of Informatics Problems, RAS ssa@ipi.ac.ru Outline Information Integration Problem Heterogeneous Information

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Jennifer Widom. Stanford University

Jennifer Widom. Stanford University Principled Research in Database Systems Stanford University What Academics Give Talks About Other people s papers Thesis and new results Significant research projects The research field BIG VISION Other

More information

Shrapnels in Baghdad. Dataspaces. Google Base. Data is the plural of anecdote. The Web is Getting Semantic. + Data Integration & The Web

Shrapnels in Baghdad. Dataspaces. Google Base. Data is the plural of anecdote. The Web is Getting Semantic. + Data Integration & The Web Shrapnels in Baghdad Dataspaces + Data Integration & The Web Story courtesy of Phil Bernstein Personal Information Management AttachedTo [Semex: Dong et al.] Google Base Recipient ExperimentOf Cites ArticleAbout

More information

On the Use of Query-driven XML Auto-Indexing

On the Use of Query-driven XML Auto-Indexing On the Use of Query-driven XML Auto-Indexing Karsten Schmidt and Theo Härder SMDB'10 (ICDE), Long Beach March, 1 Motivation Self-Tuning '10 The last 10+ years Index tuning What-if Wizards, Guides, Druids

More information

Advances in Data Management - Web Data Integration A.Poulovassilis

Advances in Data Management - Web Data Integration A.Poulovassilis Advances in Data Management - Web Data Integration A.Poulovassilis 1 1 Integrating Deep Web Data Traditionally, the web has made available vast amounts of information in unstructured form (i.e. text).

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information

(Big Data Integration) : :

(Big Data Integration) : : (Big Data Integration) : : 3 # $%&'! ()* +$,- 2/30 ()* + # $%&' = 3 : $ 2 : 17 ;' $ # < 2 6 ' $%&',# +'= > 0 - '? @0 A 1 3/30 3?. - B 6 @* @(C : E6 - > ()* (C :(C E6 1' +'= - ''3-6 F :* 2G '> H-! +'-?

More information

Improving Precision of Office Document Search by Metadata-based Queries

Improving Precision of Office Document Search by Metadata-based Queries Improving Precision of Office Document Search by Metadata-based Queries Somchai Chatvichienchai Dept. of Info-Media, Siebold University of Nagasaki, 1-1-1 Manabino, Nagayo Cho, Nishisonogi Gun, Nagasaki,

More information

Intensional Associations in Dataspaces

Intensional Associations in Dataspaces Intensional Associations in Dataspaces Marcos Antonio Vaz Salles #1, Jens Dittrich, Lukas Blunschi +3 # Cornell niversity Saarland niversity + ETH Zurich 1 vmarcos@cs.cornell.edu jens.dittrich@cs.uni-sb.de

More information

Asking the Right Questions in Crowd Data Sourcing

Asking the Right Questions in Crowd Data Sourcing MoDaS Mob Data Sourcing Asking the Right Questions in Crowd Data Sourcing Tova Milo Tel Aviv University Outline Introduction to crowd (data) sourcing Databases and crowds Declarative is good How to best

More information

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content DATA SHEET EMC Documentum xdb High-performance native XML database optimized for storing and querying large volumes of XML content The Big Picture Ideal for content-oriented applications like dynamic publishing

More information

Probabilistic/Uncertain Data Management

Probabilistic/Uncertain Data Management Probabilistic/Uncertain Data Management 1. Dalvi, Suciu. Efficient query evaluation on probabilistic databases, VLDB Jrnl, 2004 2. Das Sarma et al. Working models for uncertain data, ICDE 2006. Slides

More information

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Module 4. Implementation of XQuery. Part 0: Background on relational query processing Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:

More information

Evaluation of Keyword Search System with Ranking

Evaluation of Keyword Search System with Ranking Evaluation of Keyword Search System with Ranking P.Saranya, Dr.S.Babu UG Scholar, Department of CSE, Final Year, IFET College of Engineering, Villupuram, Tamil nadu, India Associate Professor, Department

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Introduction to Information Retrieval. Hongning Wang

Introduction to Information Retrieval. Hongning Wang Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an

More information

Indexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4

Indexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 Indexing Bi-temporal Windows Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 1 2 3 4 Outline Introduction Bi-temporal Windows Related Work The BiSW Index Experiments Conclusion

More information

Conceptual Database Modeling

Conceptual Database Modeling Course A7B36DBS: Database Systems Lecture 01: Conceptual Database Modeling Martin Svoboda Irena Holubová Tomáš Skopal Faculty of Electrical Engineering, Czech Technical University in Prague Course Plan

More information

Information Management (IM)

Information Management (IM) 1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;

More information

Data Integration with Uncertainty

Data Integration with Uncertainty Noname manuscript No. (will be inserted by the editor) Xin Dong Alon Halevy Cong Yu Data Integration with Uncertainty the date of receipt and acceptance should be inserted later Abstract This paper reports

More information

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 253 Rank-aware XML Data Model and Algebra: Towards Unifying Exact

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Chapter 6: Entity-Relationship Model. The Next Step: Designing DB Schema. Identifying Entities and their Attributes. The E-R Model.

Chapter 6: Entity-Relationship Model. The Next Step: Designing DB Schema. Identifying Entities and their Attributes. The E-R Model. Chapter 6: Entity-Relationship Model The Next Step: Designing DB Schema Our Story So Far: Relational Tables Databases are structured collections of organized data The Relational model is the most common

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Limitations of XPath & XQuery in an Environment with Diverse Schemes

Limitations of XPath & XQuery in an Environment with Diverse Schemes Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany 23.06.2003

More information

Will Dataspaces Make Data Integration Obsolete?

Will Dataspaces Make Data Integration Obsolete? DBKDA 2011, January 23-28, 2011 St. Maarten, The Netherlands Antilles DBKDA 2011 Panel Discussion: Will Dataspaces Make Data Integration Obsolete? Moderator: Fritz Laux, Reutlingen Univ., Germany Panelists:

More information

The Next Step: Designing DB Schema. Chapter 6: Entity-Relationship Model. The E-R Model. Identifying Entities and their Attributes.

The Next Step: Designing DB Schema. Chapter 6: Entity-Relationship Model. The E-R Model. Identifying Entities and their Attributes. Chapter 6: Entity-Relationship Model Our Story So Far: Relational Tables Databases are structured collections of organized data The Relational model is the most common data organization model The Relational

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

The challenge of reasoning for OLF s s IO G2

The challenge of reasoning for OLF s s IO G2 The challenge of reasoning for OLF s s IO G2 Arild Waaler Department of Informatics University of Oslo March 25, 2007 The challenge of reasoning for IO G2 1 OLF s vision of IO G2 Potential Integration

More information

CSE 444: Database Internals. Lecture 22 Distributed Query Processing and Optimization

CSE 444: Database Internals. Lecture 22 Distributed Query Processing and Optimization CSE 444: Database Internals Lecture 22 Distributed Query Processing and Optimization CSE 444 - Spring 2014 1 Readings Main textbook: Sections 20.3 and 20.4 Other textbook: Database management systems.

More information

Open Data Integration. Renée J. Miller

Open Data Integration. Renée J. Miller Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that

More information

#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.

#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data

More information

Answering Queries Using Cooperative Semantic Caching

Answering Queries Using Cooperative Semantic Caching Answering Queries Using Cooperative Caching Andrei Vancea 1, Prof. Dr. Burkhard Stiller 1,2 1 Department of Informatics IFI, Communication Systems Group CSG, University of Zürich 2 associated with the

More information

II. Data Models. Importance of Data Models. Entity Set (and its attributes) Data Modeling and Data Models. Data Model Basic Building Blocks

II. Data Models. Importance of Data Models. Entity Set (and its attributes) Data Modeling and Data Models. Data Model Basic Building Blocks Data Modeling and Data Models II. Data Models Model: Abstraction of a real-world object or event Data modeling: Iterative and progressive process of creating a specific data model for a specific problem

More information

LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor

LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES Theodoros Rekatsinas, Amol Deshpande, Lise Getoor Motivation Probabilistic databases store, manage and query uncertain data Numerous applications

More information

An Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL

An Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL An Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL Praveena M V 1, Dr. Ajeet A. Chikkamannur 2 1 Department of CSE, Dr Ambedkar Institute of Technology, VTU, Karnataka, India 2 Department

More information

Data Modeling with the Entity Relationship Model. CS157A Chris Pollett Sept. 7, 2005.

Data Modeling with the Entity Relationship Model. CS157A Chris Pollett Sept. 7, 2005. Data Modeling with the Entity Relationship Model CS157A Chris Pollett Sept. 7, 2005. Outline Conceptual Data Models and Database Design An Example Application Entity Types, Sets, Attributes and Keys Relationship

More information

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Columbia University High-Level Feature Detection: Parts-based Concept Detectors TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Chapter 2: Intro. To the Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS is Collection of

More information

Collage: A Declarative Programming Model for Compositional Development and Evolution of Cross-Organizational Applications

Collage: A Declarative Programming Model for Compositional Development and Evolution of Cross-Organizational Applications Collage: A Declarative Programming Model for Compositional Development and Evolution of Cross-Organizational Applications Bruce Lucas, IBM T J Watson Research Center (bdlucas@us.ibm.com) Charles F Wiecha,

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.

Introduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University. COCS 6421 Advanced Database Systems CSE, York University March 20, 2008 Agenda 1 Problem description Problems 2 3 Open questions and future work Conclusion Bibliography Problem description Problems Why

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle NoSQL Database and Oracle Relational Database - A Perfect Fit Dave Rubin Director NoSQL Database Development 2 The following is intended to outline our general product direction. It is intended

More information

Mobile Query Interfaces

Mobile Query Interfaces Mobile Query Interfaces Matthew Krog Abstract There are numerous alternatives to the application-oriented mobile interfaces. Since users use their mobile devices to manage personal information, a PIM interface

More information

Multiple Graphs Updatable views & Choices

Multiple Graphs Updatable views & Choices CIP2017-06-18 and related Multiple Graphs Updatable views & Choices Presentation at ocim 4 on May 22-24, 2018 Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j) History Started working on multiple graphs

More information

CSE-6490B Final Exam

CSE-6490B Final Exam February 2009 CSE-6490B Final Exam Fall 2008 p 1 CSE-6490B Final Exam In your submitted work for this final exam, please include and sign the following statement: I understand that this final take-home

More information

Wrap-Up: Data Sharing and the Web

Wrap-Up: Data Sharing and the Web Wrap-Up: Data Sharing and the Web Zachary G. Ives University of Pennsylvania April 16, 2003 Administrivia I Reminder: Monday is your project presentation About 5-7 minutes each Slides are allowed (but

More information

CSE4701 Principles of Databases Midterm Exam

CSE4701 Principles of Databases Midterm Exam CSE4701 Principles of Databases Midterm Exam Name: Problem Points Score 1 15 2 15 3 15 4 15 5 15 Total 75 Please show all work to receive ANY credit!!!! Roughly equate 15 points to fifteen (15) minutes

More information

Notes. These slides are based on slide sets provided by M. T. Öszu and by R. Ramakrishnan and J. Gehrke. CS 640 Views Winter / 15.

Notes. These slides are based on slide sets provided by M. T. Öszu and by R. Ramakrishnan and J. Gehrke. CS 640 Views Winter / 15. Views and View Management Olaf Hartig David R. Cheriton School of Computer Science University of Waterloo CS 640 Principles of Database Management and Use Winter 2013 These slides are based on slide sets

More information

Self-Tuning Database Systems: The AutoAdmin Experience

Self-Tuning Database Systems: The AutoAdmin Experience Self-Tuning Database Systems: The AutoAdmin Experience Surajit Chaudhuri Data Management and Exploration Group Microsoft Research http://research.microsoft.com/users/surajitc surajitc@microsoft.com 5/10/2002

More information

Keyword Search on Form Results

Keyword Search on Form Results Keyword Search on Form Results Aditya Ramesh (Stanford) * S Sudarshan (IIT Bombay) Purva Joshi (IIT Bombay) * Work done at IIT Bombay Keyword Search on Structured Data Allows queries to be specified without

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user

More information

Progressive Data Integration and Semantic Enrichment Based on LinkedScales and Trails

Progressive Data Integration and Semantic Enrichment Based on LinkedScales and Trails Progressive Data Integration and Semantic Enrichment Based on LinkedScales and Trails Matheus Silva Mota, Fagner Leal Pantoja, Julio Cesar dos Reis, and André Santanchè Institute of Computing, University

More information

NoSQL Schema Evolution and Big Data Migration at Scale

NoSQL Schema Evolution and Big Data Migration at Scale NoSQL Schema Evolution and Big Data Migration at Scale Uta Störl Darmstadt University of Applied Sciences, Germany April 2018 Joint work with Meike Klettke (University of Rostock, Germany) and Stefanie

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Massive Online Analysis - Storm,Spark

Massive Online Analysis - Storm,Spark Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R

More information

Prof. Dr. Christian Bizer

Prof. Dr. Christian Bizer STI Summit July 6 th, 2011, Riga, Latvia Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline 1. Topology of the Web of Data What data

More information

ABSTRACT. expanding data ensures that all of it cannot be fit in one conventional database management

ABSTRACT. expanding data ensures that all of it cannot be fit in one conventional database management ABSTRACT Nowadays, data is available everywhere in every field and the abundance of this rapidly expanding data ensures that all of it cannot be fit in one conventional database management system. In general

More information

Semantic Document Architecture for Desktop Data Integration and Management

Semantic Document Architecture for Desktop Data Integration and Management Semantic Document Architecture for Desktop Data Integration and Management Saša Nešić 1, Dragan Gašević 2, Mehdi Jazayeri 1 1 Faculty of Informatics, University of Lugano, Lugano, Switzerland 2 School

More information

ADT 2009 Other Approaches to XQuery Processing

ADT 2009 Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

State of the Art and Trends in Search Engine Technology. Gerhard Weikum

State of the Art and Trends in Search Engine Technology. Gerhard Weikum State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

Structural characterizations of schema mapping languages

Structural characterizations of schema mapping languages Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki

More information

CIP Multiple Graphs. Presentation at ocig 8 on April 11, Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j)

CIP Multiple Graphs. Presentation at ocig 8 on April 11, Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j) CIP2017-06-18 Multiple Graphs Presentation at ocig 8 on April 11, 2018 Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j) History Started working on multiple graphs early 2017 Parallel work in LDBC

More information

Resilient Distributed Datasets

Resilient Distributed Datasets Resilient Distributed Datasets A Fault- Tolerant Abstraction for In- Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin,

More information

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University

More information

AMAχOS Abstract Machine for Xcerpt

AMAχOS Abstract Machine for Xcerpt AMAχOS Abstract Machine for Xcerpt Principles Architecture François Bry, Tim Furche, Benedikt Linse PPSWR 06, Budva, Montenegro, June 11th, 2006 Abstract Machine(s) Definition and Variants abstract machine

More information

Information Integration. Mediators Warehousing Answering Queries Using Views

Information Integration. Mediators Warehousing Answering Queries Using Views Information Integration Mediators Warehousing Answering Queries Using Views 1 Example Applications 1. Enterprise Information Integration: making separate DB s, all owned by one company, work together.

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

Chapter 1 Overview. data, documents, multi-media objects, accessible in computer networks content is published

Chapter 1 Overview. data, documents, multi-media objects, accessible in computer networks content is published Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 1 Overview Overview "Content" data, documents, multi-media objects,

More information

A Data Cube Model for Analysis of High Volumes of Ambient Data

A Data Cube Model for Analysis of High Volumes of Ambient Data A Data Cube Model for Analysis of High Volumes of Ambient Data Hao Gui and Mark Roantree March 25, 2013 1 Introduction The concept of the smart city [9] of which there are many initiatives, projects and

More information

Database Group Research Overview. Immanuel Trummer

Database Group Research Overview. Immanuel Trummer Database Group Research Overview Immanuel Trummer Talk Overview User Query Data Analysis Result Processing Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query

More information

Databases Lectures 1 and 2

Databases Lectures 1 and 2 Databases Lectures 1 and 2 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lectures 1 and 2 DB 2009 1 / 36 Re-ordered Syllabus

More information

Reinforcement learning for intensional data management

Reinforcement learning for intensional data management Reinforcement learning for intensional data management Pierre Senellart ÉCOLE NORMALE SUPÉRIEURE RESEARCH UNIVERSITY PARIS 2 April 218, SEQUEL Seminar 1/44 Uncertain data is everywhere Numerous sources

More information