Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Similar documents
Course Design Document: IS202 Data Management. Version 4.5

Introduction to Databases Fall-Winter 2009/10. Syllabus

Introduction to Databases Fall-Winter 2010/11. Syllabus

CS 525 Advanced Database Organization - Spring 2017 Mon + Wed 1:50-3:05 PM, Room: Stuart Building 111

Can you name one application that does not need any data? Can you name one application that does not need organized data?

Chapter 1: Introduction

LIS 2680: Database Design and Applications

Modern Database Systems CS-E4610

Overview of the Class and Introduction to DB schemas and queries. Lois Delcambre

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

San José State University Computer Science Department CS157A: Introduction to Database Management Systems Sections 5 and 6, Fall 2015

CS317 File and Database Systems

DEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES

20464 Developing Microsoft SQL Server Databases

Introduction to Database S ystems Systems CSE 444 Lecture 1 Introduction CSE Summer

COAP 3110 INTERACTIVE SITE DEVELOPMENT

CS157a Fall 2018 Sec3 Home Page/Syllabus

CSE 132A. Database Systems Principles

COMP-421: Database Systems. Joseph D silva McConnel Engg. 102

Elementary IR: Scalable Boolean Text Search. (Compare with R & G )

Course and Contact Information. Course Description. Course Objectives

Introduction to Database Systems CSE 444. Lecture #1 March 26, 2007

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

Jennifer Widom. Stanford University

CSE 544 Principles of Database Management Systems

Database Systems (INFR10070) Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Databases TDA357/DIT620. Niklas Broberg

TITLE OF COURSE SYLLABUS, SEMESTER, YEAR

CSE 344 JANUARY 3 RD - INTRODUCTION

CMPUT 391 Database Management Systems. Fall Semester 2006, Section A1, Dr. Jörg Sander. Introduction

Information Retrieval

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

20762B: DEVELOPING SQL DATABASES

SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server Upcoming Dates. Course Description.

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL

Introduction to Databases

Course and Contact Information. Course Description. Course Objectives

Introduction to Database Systems CSE 444. Lecture 1 Introduction

Fundamentals of Databases

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF CSE COURSE PLAN

Microsoft. [MS20762]: Developing SQL Databases

745: Advanced Database Systems

ITM DEVELOPMENT (ITMD)

Information Retrieval CSCI

Developing SQL Databases

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview

What s a database anyway?

Course Outline Faculty of Computing and Information Technology

EECS 647: Introduction to Database Systems

PROJECT PERIODIC REPORT

Database Technology Introduction. Heiko Paulheim

Chapter 2 Introduction to Relational Models

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

Information Management (IM)

Information Retrieval

CSC 261/461 Database Systems Lecture 19

CSE 303: Database. Teaching Staff. Lecture 01. Lectures: 1 st half - from a user s perspective. Lectures: 2 nd half - understanding how it works

SRM UNIVERSITY. : Batch1: TP1102 Batch2: TP406

: Semantic Web (2013 Fall)

CS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016

INDE499B: Information Systems Course Review Autumn 2000

COMP-421: Database Systems. Joseph D silva McConnel Engg. 102

Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM).

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

TEACHING & ASSESSMENT PLAN

INSTITUTE OF AERONAUTICAL ENGINEERING

Microsoft FAST Search Server 2010 for SharePoint for Application Developers Course 10806A; 3 Days, Instructor-led

Chapter 27 Introduction to Information Retrieval and Web Search

Data Integration Systems

Advanced Relational Database Management MISM Course S A3 Spring 2019 Carnegie Mellon University

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Fundamentals of Database Systems

Data Integration and Data Warehousing Database Integration Overview

Aggregation for searching complex information spaces. Mounia Lalmas

Avi Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concept, McGraw- Hill, ISBN , 6th edition.

Chapter 1: Introduction

Keyword Search in Databases

Learning Alliance Corporation, Inc. For more info: go to

CMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi

INF 315E Introduction to Databases School of Information Fall 2015

I. PREREQUISITES For information regarding prerequisites for this course, please refer to the Academic Course Catalog.

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML

Case Study: Lufthansa Cargo Database

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

CSC 111 Introduction to Computer Science (Section C)

Introduction to Data Management. Lecture #1 (Course Trailer )

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

Microsoft Developing SQL Databases

An Archiving System for Managing Evolution in the Data Web

Introduction to Data Management. Lecture #1 (The Course Trailer )

Chapter 1: Introduction. Chapter 1: Introduction

Introduction to Database Systems CS432. CS432/433: Introduction to Database Systems. CS432/433: Introduction to Database Systems

Database of historical places, persons, and lemmas

ABD - Database Administration

Model 4.2 Faculty member + student Course syllabus for Advanced programming language - CS313D

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Course and Contact Information. Catalog Description. Course Objectives

Introduction to Data Management. Lecture #1 (Course Trailer ) Instructor: Chen Li

Advances in Data Management - Web Data Integration A.Poulovassilis

Transcription:

+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

+ DB & IR Integration Databases and information retrieval are two areas that have been developed separately! They have focused on different areas of application They have given emphasis to different methodologies 2

+ DB & IR Integration In databases: We pose queries to data with a particular schema, we use an algebra, we care about the accuracy of the queries results In information retrieval: We focus on queries expressed with keywords, queries are applied on free text documents, we care about how to rank the queries results, based on statistics and probabilities 3

+ DB & IR Integration Nowadays, there are many applications that require the concurrent management of structured and unstructured data, so necessary shows the integration of these two worlds 4

+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

+ Adding Ranking to DB OR Adding Semantics to IR unstructured search (keywords) [keyword search on databases] IR Systems Search Engines structured search (sql, xquery) Database Systems [querying entities] structured data (records) unstructured data (documents) 6

+ DB & IR Differences Databases Structured data Structured querying Soundness & Completeness User is expected to be aware of the underlying structure of the data or a query language IR Unstructured data Unstructured querying High Precision & Recall No expectations 7

+ Why DB & IR Integration? DB and IR have evolved as separate communities Their focus is on very different application areas, e.g.: (DB) accounting and reservation systems (IR) library and patent information So, they have different methodological paradigms (DB) precise querying over schematized data, based on logic and algebra (IR) keyword search and ranking over text and uncertain data, based on statistics and probability theory 8

+ Why DB & IR Integration? TODAY: many applications require managing both structured and unstructured data Considerations on how to integrate the DB and IR worlds at both foundational and software-system levels In the next slides: Tenets, from different viewpoints, on why DB & IR Integration is desirable 9

+ too-many-answers Example: Searches over travel portals or product catalogs Too-many-answers problem What if, tightening the query conditions? This may produce too few or even no results Note also: interactive reformulation and browsing is timeconsuming and may irritate customers/users For large result sets, ranking! Data and/or workload statistics User profiles 10

+ text-matching Because of misspellings, spelling variants, etc., there is a need for adding text-matching functionality to DB systems Need for approximate matching E.g., record linkage for matching entities Reconcile Hector Garcia-Molina and Garcia-Molina, H. Intuitively, approximate matching by similarity measures requires ranking! 11

+ heterogeneity Typically, applications access multiple databases Often with a run-time choice of the data sources No unified global schema Even if the sources contain structured, exact data records and have an explicit schema The application has to cope with the heterogeneity of the underlying schema names, XML tags, or RDF properties Queries need to be schema-agnostic or tolerant to schema relaxation 12

+ information-extraction Textual information (natural-language sentences) contains named entities and relationships between them Information-extraction techniques (pattern matching, statistical learning) for locating the entities Potentially, large knowledge bases whose facts with some uncertainty Querying the extracted facts: need for ranking! 13

+ information-extraction Querying the extracted facts: Use keywords rather than sophisticated expressions in SQL or Xquery If the extracted data are organized in graph structures: Determine when keyword occurrences are interconnected in a meaningful way Efficiently compute answers in ranked order (new, or not so new, research problems ) 14

+ structured IR Structured IR: go beyond keyword search by understanding attributes, XML tags and metadata Digital libraries, enterprise intranets, e-science portals, and businessoriented Web sites Example: faceted search paradigm Access information organized according to multiple dimensions (ranking in multiple ways) Allow users to explore a collection of information by applying multiple filters Internet merchant sites for product search, result refinement, interactive exploration 15

+ search-result personalization Take care for the user s information needs Better search precision/recall, higher user satisfaction Exploit: User preferences Profiling User s long-term history of queries, clicks and data usage Contextual profiling User s short-term behavior in the context of the current task Personalization already in Web, news and blog search Enormous potential for individualizing 16

+ Different Views of the Coin About the need for structure DB emphasizes relaxation of structure IR emphasizes adding structure to information (The Web community takes a mix of structured and unstructured data for granted) About the need for named entities DB emphasizes approximate matching and ranking IR emphasizes adding relationships between entities 17

+ DB & IR Integration Learning outcomes After completing the course, the students are expected to: know the basic concepts and techniques for the integration of databases and information retrieval be able to handle contemporary research issues and problems on the topic be able to perform a comparative assessment of existing works 18

+ DB & IR Integration 24 Oct 16 Dec (8 weeks) Two parts: 1st part (4 weeks) all lectures will be given by the instructor 2nd part (4 weeks) lectures (in the form of assignments) will be mostly given by the students 19

+ DB & IR Integration 1st part (4 weeks) all lectures will be given by the instructor Introduction on big data and on the need for data exploration, on the techniques that will be presented at the lectures, and on the structure/organization of the course For this part, algorithmic exercises or extensions on the presented approaches will be given to the students on a weekly basis (each student will work on his/her own) 20

+ DB & IR Integration 1st part (4 weeks) all lectures will be given by the instructor Top-k and skyline queries Rank aggregation, top-k algorithms, skylines Keyword-based search Schema-based & graph-based approaches in databases Preferential search Preference representation and composition, preferential query processing Recommender systems Collaborative filtering, content-based recommendations 21

+ DB & IR Integration 2nd part (4 weeks) lectures (in the form of assignments) will be mostly given by the students Students will form groups (at most 4 students per group: TBD) Each group will be assigned with a project Each project will be associated with two research papers Each week, each group will make a short presentation 22

+ DB & IR Integration 2nd part (4 weeks) lectures (in the form of assignments) will be mostly given by the students Each week, each group will make a short presentation (~10-15 mins) 1st week: shortly describe the topic and the solutions of the papers of the projects 2nd week: describe the main disadvantages/drawbacks of the solutions given by the original authors 3rd week: present ideas from other related papers published after the papers of the project Search for upcoming papers related to the project 4th week: extend the ideas of the project students contributions 23

+ DB & IR Integration 2nd part (4 weeks) lectures (in the form of assignments) will be mostly given by the students + 1 assignment from my side on a weekly basis related to one of the projects 24

+ DB & IR Integration Grades The final grade will be determined: 30% by the assignments of the first part 20% by the assignments of the second part, and 50% by the presentations of the project 25

+ Course Projects Project 1: Top-k join tuples Project 2: Preference integration in databases Project 3: Personalized keyword search Project 4: Contextual recommendations Project 5: Recommend packages Project 6: Recommendations for groups Project 7: Diversity in recommender systems Project 8: Efficient diverse search Project 9: Frameworks based on different definitions of diversity Project 10: Tags for search Project 11: Interactive data exploration

+ Where, When When: Monday, Thursday, Friday: 10.00-12.00 (24 Oct 2016-16 Dec 2016) Where: Pinni B0016 Instructor: Kostas Stefanidis E-mail: kostas.stefanidis@uta.fi Course web page: http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html 27