itrails: Pay-as-you-go Information Integration in Dataspaces
|
|
- Randolph Bishop
- 5 years ago
- Views:
Transcription
1 itrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich Shant Karakashian Olivier Girard Lukas Blunschi ETH Zurich VLDB 2007
2 Outline Motivation itrails Experiments Conclusions and Future Work 2
3 Problem: Querying Several Sources Query What is the impact of global warming in Zurich????? Systems Data Sources Laptop Server Web Server DB Server 3
4 Solution 1: Use a Search Engine Job! Query global warming zurich System Graph IR Search Engine Drawback: Query semantics are not precise! TopX [VLDB05], FleXPath [SIGMOD04], XSearch [VLDB03], XRank [SIGMOD03] Data Sources Laptop text, links text, links Server text, links Web Server text, links DB Server 4
5 Solution 2: Use an Information Integration System //Temperatures/*[city = zurich ] Query... Laptop Server Information Integration System Temps Cities Drawback: Too much... CO 2 effort to provide Sunspots schema mappings!. GAV (e.g. [ICDE95]), LAV (e.g. [VLDB96]), GLAV [AAAI99], P2P (e.g. [SIGMOD04]) missing schema mapping. missing schema mapping Web Server schema mapping DB Server schema mapping System Data Sources 5
6 Research Challenge: Is There an Integration Solution in-between These Two Extremes? global warming zurich global warming zurich //Temperatures/*[city = zurich ] Graph IR Search Engine Dataspace System Temps CO 2. Cities Sunspots Information Integration System Pay-as-you-go Information text, links Integration text, links? text, links text, links text, links full-blown schema mappings Data Sources Laptop Server Web Server DB Server Data Sources Dataspace Vision by Franklin, Halevy, and Maier [SIGMOD Record 05] 6
7 Outline Motivation itrails Experiments Conclusions and Future Work 7
8 itrails Core Idea: Add Integration Hints Incrementally Step 1: Provide a search service over all the data Use a general graph data model (see VLDB 2006) Works for unstructured documents, XML, and relations Step 2: Add integration semantics via hints (trails) on top of the graph Works across data sources, not only between sources Step 3: If more semantics needed, go back to step 2 Impact: Smooth transition between search and data integration Semantics added incrementally improve precision / recall 8
9 itrails: Defining Trails Basic Form of a Trail Q L [.C L ] Q R [.C R ] Queries: NEXI-like keyword and path expressions Attribute projections Intuition: When I query for Q L [.C L ], you should also query for Q R [.C R ] 9
10 Trail Examples: Global Warming Zurich global warming zurich Temperatures date 24-Sep 24-Sep 25-Sep 26-Sep city Bern Uster Zurich Zurich region BE ZH ZH ZH celsius Trail for Implicit Meaning: When I query for global warming, you should also query for Temperature data above 10 degrees global warming //Temperatures/*[celsius > 10] Trail for an Entity: When I query for zurich, you should also query for references of zurich as a region DB Server zurich //*[region = ZH ] 10
11 Trail Example: Deep Web Bookmarks train home Web Server Trail for a Bookmark: When I query for train home, you should also query for the TrainCompany s website with origin at ETH Uni and destination at Seilbahn Rigiblick train home //traincompany.com//*[origin= ETH Uni and dest = Seilbahn Rigiblick ] 11
12 Trail Examples: Thesauri, Dictionaries, Language-agnostic Search Laptop Server car auto Trail for Thesauri: When I query for car, you should also query for auto car auto car carro Trails for Dictionary: When I query for car, you should also query for carro and vice-versa car carro carro car 12
13 Trail Examples: Schema Equivalences Employee empid empname salary Trail for schema match on names: When I query for Employee.empName, you should also query for Person.name DB Server Person //Employee//*.tuple.empName //Person//*.tuple.name SSN name age income Trail for schema match on salaries: When I query for Employee.salary, you should also query for Person.income //Employee//*.tuple.salary //Person//*.tuple.income 13
14 Outline Motivation itrails Experiments Conclusion and Future Work Core Idea Trail Examples How are Trails Created? Uncertainty and Trails Rewriting Queries with Trails Recursive Matches 14
15 How are Trails Created? Given by the user Explicitly Via Relevance Feedback (Semi-)Automatically Information extraction techniques Automatic schema matching Ontologies and thesauri (e.g., wordnet) User communities (e.g., trails on gene data, bookmarks) 15
16 Uncertainty and Trails Probabilistic Trails: model uncertain trails probabilities used to rank trails Q L [.C L ] Q R [.C R ], 0 p 1 p Example: car auto p =
17 Certainty and Trails Scored Trails: give higher value to certain trails scoring factors used to boost scores of query results obtained by the trail Examples: Q L [.C L ] Q R [.C R ], sf > 1 - T 1 :weather //Temperatures/* sf p = 0.9, sf = 2 - T 2 :yesterday //*[date = today() 1] p = 1, sf = 3 17
18 Rewriting Queries with Trails Query U weather yesterday T 2 matches U (3) Merging weather U //*[date = today() 1] yesterday Trail T 2 : yesterday //*[date = today() 1] (1) Matching (2) Transformation 18
19 Replacing Trails Trails that use replace instead of union semantics Query U U (3) Merging weather yesterday weather //*[date = today() 1] T 2 matches Trail T 2 : yesterday //*[date = today() 1] (1) Matching (2) Transformation 19
20 Problem: Recursive Matches (1/2) U weather yesterday T 2 matches U //*[date = today() 1] New query still matches T 2, so T 2 could be applied again U T 2 : yesterday //*[date = today() 1] T 2 matches weather U //*[date = today() 1] U //*[date = today() 1]... U yesterday //*[date = today() 1] U... //*[date = today() 1] Infinite recursion 20
21 Problem: Recursive Matches (2/2) U weather yesterday T 3 matches U Trails may be //*[date = today() 1] mutually recursive U T 3 : //*.tuple.date //*.tuple.modified weather U yesterday U //*[modified = today() 1] //*[date = today() 1] T 10 matches T 10 : //*.tuple.modified //*.tuple.date U weather U yesterday U and enter an infinite loop //*[date = today() 1] U //*[date = today() 1] //*[modified = today() 1] We again match T 3 21
22 Solution: Multiple Match Coloring Algorithm U weather yesterday T 1 matches First Level T 2 matches weather weather T 1 : weather //Temperatures/* T 2 : yesterday //*[date = today() 1] T 3 : //*.tuple.date //*.tuple.modified T 4 : //*.tuple.date //*.tuple.received U U U //*[date = today() 1] yesterday //Temperatures/* U U Second Level U yesterday //Temperatures/* U U //*[modified = today() 1] T 3, T 4 match //*[date = today() 1] //*[received = today() 1] 22
23 Multiple Match Coloring Algorithm Analysis Problem: MMCA is exponential in number of levels Solution: Trail Pruning Prune by number of levels Prune by top-k trails matched in each level Prune by both top-k trails and number of levels 23
24 Outline Motivation itrails Experiments Conclusion and Future Work 24
25 itrails Evaluation in imemex imemex Dataspace System: Open-source prototype available at Main Questions in Evaluation Quality: Top-K Precision and Recall Performance: Use of Materialization Scalability: Query-rewrite Time vs. Number of Trails 25
26 itrails Evaluation in imemex Scenario 1: Few High-quality Trails Closer to information integration use cases Obtained real datasets and indexed them 18 hand-crafted trails 14 hand-crafted queries Scenario 2: Many Low-quality Trails Closer to search use cases Generated up to 10,000 trails 26
27 itrails Evaluation in imemex: Scenario 1 Configured imemex to act in three modes Baseline: Graph / IR search engine itrails: Rewrite search queries with trails Perfect Query: Semantics-aware query Data: shipped to central index sizes in MB Laptop Web Server Server DB Server 27
28 Quality: Top-K Precision and Recall K = 20 Scenario 1: few high-quality trails (18 trails) Search Engine misses relevant results Queries perfect query Perfect Query always has precision and recall equal to 1 Search Query is partially semantics-aware Q3: pdf yesterday Q13: to = raimund.grube@ enron.com 28
29 Performance: Use of Materialization Scenario 1: few high-quality trails (18 trails) Trail merging adds overhead to query execution Trail Materialization provides interactive times for all queries response times in sec. 29
30 Scalability: Query-rewrite Time vs. Number of Trails Scenario 2: many low-quality trails Query-rewrite time can be controlled with pruning 30
31 Conclusion: Pay-as-you-go Information Integration Step 1: Provide a search service over all the data Step 2: Add integration semantics via trails global warming zurich text, links Data Sources Step 3: If more semantics needed, go back to step 2 Our Contributions itrails: generic method to model semantic relationships (e.g. implicit meaning, bookmarks, dictionaries, thesauri, attribute matches,...) We propose a framework and algorithms for Pay-as-yougo Information Integration Smooth transition between search and data integration Dataspace System 31
32 Future Work Trail Creation Use collections (ontologies, thesauri, wikipedia) Work on automatic mining of trails from the dataspace Other types of trails Associations Lineage 32
33 Questions? Thanks in advance for your feedback! 33
34 Backup Slides 34
35 Problem: Global Warming in Zurich Query: What is the impact of global warming in Zurich? Search for: global warming zurich Meaning of keyword query global warming should lead to query on Temperatures zurich should lead to a query for a city 35
36 Problem: PDF Yesterday Query: Retrieve all PDF documents added/modified yesterday Search for: pdf yesterday Meaning of keywords pdf and yesterday Different sources, different schemas: Laptop: modified received DBMS: changed 36
37 Related Work: Search vs. Data Integration vs. Dataspaces Integration Solution Search Dataspaces Data Integration Features Integration Effort Low Pay-as-yougo High Query Semantics Precision / Recall Precision / Recall Precise Need for Schema Schemalater Schemanever Schemafirst 37
38 Personal Dataspaces Literature Dittrich, Salles, Kossmann, Blunschi. imemex: Escapes from the Personal Information Jungle (Demo Paper). VLDB, September Dittrich, Salles. idm: A Unified and Versatile Data Model for Personal Dataspace Management. VLDB, September 2006 Dittrich. imemex: A Platform for Personal Dataspace Management. SIGIR PIM, August Blunschi, Dittrich, Girard, Karakashian, Salles. A Dataspace Odyssey: The imemex Personal Dataspace Management System (Demo Paper). CIDR, January Dittrich, Blunschi, Färber, Girard, Karakashian, Salles. From Personal Desktops to Personal Dataspaces: A Report on Building the imemex Personal Dataspace Management System. BTW 2007, March 2007 Salles, Dittrich, Karakashian, Girard, Blunschi. itrails: Pay-as-yougo Information Integration in Dataspaces. VLDB, September
39 idm: imemex Data Model Our approach: get the data model closer to personal information not the other way around Supports: Unstructured, semi-structured and structured data, e.g., files&folders, XML, relations Clearly separation of logical and physical representation of data Arbitrary directed graph structures, e.g., section references in LaTeX documents, links in filesystems, etc Lazily computed data, e.g., ActiveXML (Abiteboul et. al.) Infinite data, e.g., media and data streams See VLDB
40 Data Model Options Bag of Words Data Models Relational XML idm Nonschematic data Support for Personal Data Serialization independent Support for Graph data Support for Lazy Computation Specific schema View mechanism Extension: XLink/ XPointer Extension: ActiveXML Support for Infinite data Extension: Document streams Extension: Relational streams Extension: XML streams 40
41 Data Models for Personal Information lower Abstraction Level higher Physical Level Relational idm Personal Information XML Document / Bag of Words 41
42 Architectural Perspective of imemex Complex operators (query algebra)... Search & Browse iql Query Processor Application Layer imemex PDSMS Operators Catalog Office Tools... Data Store Physical Algebra Result Cache Indexes&Replicas access (warehousing) idm Query Processor Operators Catalog Data Store Data Cleaning Indexes & Replicas Data source access (mediation) Data Source Query Processor Data Source Plugins Operators Catalog Content Converters... File System Data Source Layer IMAP DBMS... 42
imemex: From Search to Information Integration and Back
imemex: From Search to Information Integration and Back Jens Dittrich Marcos Antonio Vaz Salles Lukas Blunschi Saarland University Cornell University ETH Zurich Abstract In this paper, we report on lessons
More informationA Dataspace Odyssey: The imemex Personal Dataspace Management System
A Dataspace Odyssey: The imemex Personal Dataspace Management System Lukas Blunschi Jens-Peter Dittrich Olivier René Girard Shant Kirakos Karakashian ETH Zurich, Switzerland dbis.ethz.ch imemex.org Marcos
More informationDataspaces: The Tutorial. Day 2
Dataspaces: The Tutorial Day 2 Alon Halevy, David Maier VLDB 28 Auckland, New Zealand Outline Dataspaces: why? What are they? Dataspaces: why? What are they? Examples and motivation Dataspace techniques:
More informationPrinciples of Dataspaces
Principles of Dataspaces Seminar From Databases to Dataspaces Summer Term 2007 Monika Podolecheva University of Konstanz Department of Computer and Information Science Tutor: Prof. M. Scholl, Alexander
More informationDataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom
Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not
More informationFlexible Dataspace Management Through Model Management
Flexible Dataspace Management Through Model Management Cornelia Hedeler, Khalid Belhajjame, Lu Mao, Norman W. Paton, Alvaro A.A. Fernandes, Chenjuan Guo, and Suzanne M. Embury School of Computer Science,
More informationComputer-based Tracking Protocols: Improving Communication between Databases
Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability
More informationOKKAM-based instance level integration
OKKAM-based instance level integration Paolo Bouquet W3C RDF2RDB This work is co-funded by the European Commission in the context of the Large-scale Integrated project OKKAM (GA 215032) RoadMap Using the
More informationExtracting and Querying Probabilistic Information From Text in BayesStore-IE
Extracting and Querying Probabilistic Information From Text in BayesStore-IE Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis 2, Joseph M. Hellerstein University of California, Berkeley Technical
More informationResearch Collection. Indexed semantic mapping rules. Master Thesis. ETH Library. Author(s): Zhang, Haoning. Publication Date: 2009
Research Collection Master Thesis Indexed semantic mapping rules Author(s): Zhang, Haoning Publication Date: 29 Permanent Link: https://doi.org/1.3929/ethz-a-577358 Rights / License: In Copyright - Non-Commercial
More informationIntuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs
Intuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs Nandish Jayaram University of Texas at Arlington PhD Advisors: Dr. Chengkai Li, Dr. Ramez
More informationData Integration and Data Warehousing Database Integration Overview
Data Integration and Data Warehousing Database Integration Overview Sergey Stupnikov Institute of Informatics Problems, RAS ssa@ipi.ac.ru Outline Information Integration Problem Heterogeneous Information
More informationDatabases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016
+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationJennifer Widom. Stanford University
Principled Research in Database Systems Stanford University What Academics Give Talks About Other people s papers Thesis and new results Significant research projects The research field BIG VISION Other
More informationShrapnels in Baghdad. Dataspaces. Google Base. Data is the plural of anecdote. The Web is Getting Semantic. + Data Integration & The Web
Shrapnels in Baghdad Dataspaces + Data Integration & The Web Story courtesy of Phil Bernstein Personal Information Management AttachedTo [Semex: Dong et al.] Google Base Recipient ExperimentOf Cites ArticleAbout
More informationOn the Use of Query-driven XML Auto-Indexing
On the Use of Query-driven XML Auto-Indexing Karsten Schmidt and Theo Härder SMDB'10 (ICDE), Long Beach March, 1 Motivation Self-Tuning '10 The last 10+ years Index tuning What-if Wizards, Guides, Druids
More informationAdvances in Data Management - Web Data Integration A.Poulovassilis
Advances in Data Management - Web Data Integration A.Poulovassilis 1 1 Integrating Deep Web Data Traditionally, the web has made available vast amounts of information in unstructured form (i.e. text).
More informationSupporting Fuzzy Keyword Search in Databases
I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as
More informationSemantic Web Mining and its application in Human Resource Management
International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2
More information(Big Data Integration) : :
(Big Data Integration) : : 3 # $%&'! ()* +$,- 2/30 ()* + # $%&' = 3 : $ 2 : 17 ;' $ # < 2 6 ' $%&',# +'= > 0 - '? @0 A 1 3/30 3?. - B 6 @* @(C : E6 - > ()* (C :(C E6 1' +'= - ''3-6 F :* 2G '> H-! +'-?
More informationImproving Precision of Office Document Search by Metadata-based Queries
Improving Precision of Office Document Search by Metadata-based Queries Somchai Chatvichienchai Dept. of Info-Media, Siebold University of Nagasaki, 1-1-1 Manabino, Nagayo Cho, Nishisonogi Gun, Nagasaki,
More informationIntensional Associations in Dataspaces
Intensional Associations in Dataspaces Marcos Antonio Vaz Salles #1, Jens Dittrich, Lukas Blunschi +3 # Cornell niversity Saarland niversity + ETH Zurich 1 vmarcos@cs.cornell.edu jens.dittrich@cs.uni-sb.de
More informationAsking the Right Questions in Crowd Data Sourcing
MoDaS Mob Data Sourcing Asking the Right Questions in Crowd Data Sourcing Tova Milo Tel Aviv University Outline Introduction to crowd (data) sourcing Databases and crowds Declarative is good How to best
More informationEMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content
DATA SHEET EMC Documentum xdb High-performance native XML database optimized for storing and querying large volumes of XML content The Big Picture Ideal for content-oriented applications like dynamic publishing
More informationProbabilistic/Uncertain Data Management
Probabilistic/Uncertain Data Management 1. Dalvi, Suciu. Efficient query evaluation on probabilistic databases, VLDB Jrnl, 2004 2. Das Sarma et al. Working models for uncertain data, ICDE 2006. Slides
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationEvaluation of Keyword Search System with Ranking
Evaluation of Keyword Search System with Ranking P.Saranya, Dr.S.Babu UG Scholar, Department of CSE, Final Year, IFET College of Engineering, Villupuram, Tamil nadu, India Associate Professor, Department
More informationProcessing Structural Constraints
SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited
More informationOntology Based Prediction of Difficult Keyword Queries
Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com
More informationIntroduction to Information Retrieval. Hongning Wang
Introduction to Information Retrieval Hongning Wang CS@UVa What is information retrieval? 2 Why information retrieval Information overload It refers to the difficulty a person can have understanding an
More informationIndexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4
Indexing Bi-temporal Windows Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 1 2 3 4 Outline Introduction Bi-temporal Windows Related Work The BiSW Index Experiments Conclusion
More informationConceptual Database Modeling
Course A7B36DBS: Database Systems Lecture 01: Conceptual Database Modeling Martin Svoboda Irena Holubová Tomáš Skopal Faculty of Electrical Engineering, Czech Technical University in Prague Course Plan
More informationInformation Management (IM)
1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;
More informationData Integration with Uncertainty
Noname manuscript No. (will be inserted by the editor) Xin Dong Alon Halevy Cong Yu Data Integration with Uncertainty the date of receipt and acceptance should be inserted later Abstract This paper reports
More informationRank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 253 Rank-aware XML Data Model and Algebra: Towards Unifying Exact
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationChapter 6: Entity-Relationship Model. The Next Step: Designing DB Schema. Identifying Entities and their Attributes. The E-R Model.
Chapter 6: Entity-Relationship Model The Next Step: Designing DB Schema Our Story So Far: Relational Tables Databases are structured collections of organized data The Relational model is the most common
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationLimitations of XPath & XQuery in an Environment with Diverse Schemes
Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany 23.06.2003
More informationWill Dataspaces Make Data Integration Obsolete?
DBKDA 2011, January 23-28, 2011 St. Maarten, The Netherlands Antilles DBKDA 2011 Panel Discussion: Will Dataspaces Make Data Integration Obsolete? Moderator: Fritz Laux, Reutlingen Univ., Germany Panelists:
More informationThe Next Step: Designing DB Schema. Chapter 6: Entity-Relationship Model. The E-R Model. Identifying Entities and their Attributes.
Chapter 6: Entity-Relationship Model Our Story So Far: Relational Tables Databases are structured collections of organized data The Relational model is the most common data organization model The Relational
More informationLearning mappings and queries
Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language
More informationKeyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan
Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation
More informationPresented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu
Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day
More informationSemantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.
Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...
More informationThe challenge of reasoning for OLF s s IO G2
The challenge of reasoning for OLF s s IO G2 Arild Waaler Department of Informatics University of Oslo March 25, 2007 The challenge of reasoning for IO G2 1 OLF s vision of IO G2 Potential Integration
More informationCSE 444: Database Internals. Lecture 22 Distributed Query Processing and Optimization
CSE 444: Database Internals Lecture 22 Distributed Query Processing and Optimization CSE 444 - Spring 2014 1 Readings Main textbook: Sections 20.3 and 20.4 Other textbook: Database management systems.
More informationOpen Data Integration. Renée J. Miller
Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that
More information#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.
Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data
More informationAnswering Queries Using Cooperative Semantic Caching
Answering Queries Using Cooperative Caching Andrei Vancea 1, Prof. Dr. Burkhard Stiller 1,2 1 Department of Informatics IFI, Communication Systems Group CSG, University of Zürich 2 associated with the
More informationII. Data Models. Importance of Data Models. Entity Set (and its attributes) Data Modeling and Data Models. Data Model Basic Building Blocks
Data Modeling and Data Models II. Data Models Model: Abstraction of a real-world object or event Data modeling: Iterative and progressive process of creating a specific data model for a specific problem
More informationLOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES. Theodoros Rekatsinas, Amol Deshpande, Lise Getoor
LOCAL STRUCTURE AND DETERMINISM IN PROBABILISTIC DATABASES Theodoros Rekatsinas, Amol Deshpande, Lise Getoor Motivation Probabilistic databases store, manage and query uncertain data Numerous applications
More informationAn Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL
An Ameliorated Methodology to Eliminate Redundancy in Databases Using SQL Praveena M V 1, Dr. Ajeet A. Chikkamannur 2 1 Department of CSE, Dr Ambedkar Institute of Technology, VTU, Karnataka, India 2 Department
More informationData Modeling with the Entity Relationship Model. CS157A Chris Pollett Sept. 7, 2005.
Data Modeling with the Entity Relationship Model CS157A Chris Pollett Sept. 7, 2005. Outline Conceptual Data Models and Database Design An Example Application Entity Types, Sets, Attributes and Keys Relationship
More informationColumbia University High-Level Feature Detection: Parts-based Concept Detectors
TRECVID 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab
More informationData about data is database Select correct option: True False Partially True None of the Above
Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationChapter 1: Introduction
Chapter 1: Introduction Chapter 2: Intro. To the Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS is Collection of
More informationCollage: A Declarative Programming Model for Compositional Development and Evolution of Cross-Organizational Applications
Collage: A Declarative Programming Model for Compositional Development and Evolution of Cross-Organizational Applications Bruce Lucas, IBM T J Watson Research Center (bdlucas@us.ibm.com) Charles F Wiecha,
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationIntroduction Data Integration Summary. Data Integration. COCS 6421 Advanced Database Systems. Przemyslaw Pawluk. CSE, York University.
COCS 6421 Advanced Database Systems CSE, York University March 20, 2008 Agenda 1 Problem description Problems 2 3 Open questions and future work Conclusion Bibliography Problem description Problems Why
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle NoSQL Database and Oracle Relational Database - A Perfect Fit Dave Rubin Director NoSQL Database Development 2 The following is intended to outline our general product direction. It is intended
More informationMobile Query Interfaces
Mobile Query Interfaces Matthew Krog Abstract There are numerous alternatives to the application-oriented mobile interfaces. Since users use their mobile devices to manage personal information, a PIM interface
More informationMultiple Graphs Updatable views & Choices
CIP2017-06-18 and related Multiple Graphs Updatable views & Choices Presentation at ocim 4 on May 22-24, 2018 Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j) History Started working on multiple graphs
More informationCSE-6490B Final Exam
February 2009 CSE-6490B Final Exam Fall 2008 p 1 CSE-6490B Final Exam In your submitted work for this final exam, please include and sign the following statement: I understand that this final take-home
More informationWrap-Up: Data Sharing and the Web
Wrap-Up: Data Sharing and the Web Zachary G. Ives University of Pennsylvania April 16, 2003 Administrivia I Reminder: Monday is your project presentation About 5-7 minutes each Slides are allowed (but
More informationCSE4701 Principles of Databases Midterm Exam
CSE4701 Principles of Databases Midterm Exam Name: Problem Points Score 1 15 2 15 3 15 4 15 5 15 Total 75 Please show all work to receive ANY credit!!!! Roughly equate 15 points to fifteen (15) minutes
More informationNotes. These slides are based on slide sets provided by M. T. Öszu and by R. Ramakrishnan and J. Gehrke. CS 640 Views Winter / 15.
Views and View Management Olaf Hartig David R. Cheriton School of Computer Science University of Waterloo CS 640 Principles of Database Management and Use Winter 2013 These slides are based on slide sets
More informationSelf-Tuning Database Systems: The AutoAdmin Experience
Self-Tuning Database Systems: The AutoAdmin Experience Surajit Chaudhuri Data Management and Exploration Group Microsoft Research http://research.microsoft.com/users/surajitc surajitc@microsoft.com 5/10/2002
More informationKeyword Search on Form Results
Keyword Search on Form Results Aditya Ramesh (Stanford) * S Sudarshan (IIT Bombay) Purva Joshi (IIT Bombay) * Work done at IIT Bombay Keyword Search on Structured Data Allows queries to be specified without
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More informationProgressive Data Integration and Semantic Enrichment Based on LinkedScales and Trails
Progressive Data Integration and Semantic Enrichment Based on LinkedScales and Trails Matheus Silva Mota, Fagner Leal Pantoja, Julio Cesar dos Reis, and André Santanchè Institute of Computing, University
More informationNoSQL Schema Evolution and Big Data Migration at Scale
NoSQL Schema Evolution and Big Data Migration at Scale Uta Störl Darmstadt University of Applied Sciences, Germany April 2018 Joint work with Meike Klettke (University of Rostock, Germany) and Stefanie
More informationLearning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search
1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history
More informationMassive Online Analysis - Storm,Spark
Massive Online Analysis - Storm,Spark presentation by R. Kishore Kumar Research Scholar Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Kharagpur-721302, India (R
More informationProf. Dr. Christian Bizer
STI Summit July 6 th, 2011, Riga, Latvia Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline 1. Topology of the Web of Data What data
More informationABSTRACT. expanding data ensures that all of it cannot be fit in one conventional database management
ABSTRACT Nowadays, data is available everywhere in every field and the abundance of this rapidly expanding data ensures that all of it cannot be fit in one conventional database management system. In general
More informationSemantic Document Architecture for Desktop Data Integration and Management
Semantic Document Architecture for Desktop Data Integration and Management Saša Nešić 1, Dragan Gašević 2, Mehdi Jazayeri 1 1 Faculty of Informatics, University of Lugano, Lugano, Switzerland 2 School
More informationADT 2009 Other Approaches to XQuery Processing
Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath
More informationSEMANTIC WEB POWERED PORTAL INFRASTRUCTURE
SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National
More informationState of the Art and Trends in Search Engine Technology. Gerhard Weikum
State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationDeep Character-Level Click-Through Rate Prediction for Sponsored Search
Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as
More informationStructural characterizations of schema mapping languages
Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki
More informationCIP Multiple Graphs. Presentation at ocig 8 on April 11, Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j)
CIP2017-06-18 Multiple Graphs Presentation at ocig 8 on April 11, 2018 Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j) History Started working on multiple graphs early 2017 Parallel work in LDBC
More informationResilient Distributed Datasets
Resilient Distributed Datasets A Fault- Tolerant Abstraction for In- Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin,
More informationKikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML
Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University
More informationAMAχOS Abstract Machine for Xcerpt
AMAχOS Abstract Machine for Xcerpt Principles Architecture François Bry, Tim Furche, Benedikt Linse PPSWR 06, Budva, Montenegro, June 11th, 2006 Abstract Machine(s) Definition and Variants abstract machine
More informationInformation Integration. Mediators Warehousing Answering Queries Using Views
Information Integration Mediators Warehousing Answering Queries Using Views 1 Example Applications 1. Enterprise Information Integration: making separate DB s, all owned by one company, work together.
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationChapter 1 Overview. data, documents, multi-media objects, accessible in computer networks content is published
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 1 Overview Overview "Content" data, documents, multi-media objects,
More informationA Data Cube Model for Analysis of High Volumes of Ambient Data
A Data Cube Model for Analysis of High Volumes of Ambient Data Hao Gui and Mark Roantree March 25, 2013 1 Introduction The concept of the smart city [9] of which there are many initiatives, projects and
More informationDatabase Group Research Overview. Immanuel Trummer
Database Group Research Overview Immanuel Trummer Talk Overview User Query Data Analysis Result Processing Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query
More informationDatabases Lectures 1 and 2
Databases Lectures 1 and 2 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lectures 1 and 2 DB 2009 1 / 36 Re-ordered Syllabus
More informationReinforcement learning for intensional data management
Reinforcement learning for intensional data management Pierre Senellart ÉCOLE NORMALE SUPÉRIEURE RESEARCH UNIVERSITY PARIS 2 April 218, SEQUEL Seminar 1/44 Uncertain data is everywhere Numerous sources
More information