A Keyword-based Structured Query Language

Size: px
Start display at page:

Download "A Keyword-based Structured Query Language"

Transcription

1 Expressive and Flexible Access to Web-Extracted Data : A Keyword-based Structured Query Language Department of Computer Science and Engineering Indian Institute of Technology Delhi 22th September 2011

2 Outline 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

3 Data Sets Unstructured data Data can be of any type. Do not follow any rules, sequence. e.g. text,video data. Structured data Data is organised. (entity) Similar entities are grouped together. (class,relation)

4 Information Overload Problem Generally, structured data sets have schema items numbering in the millions. Problem??

5 Information Overload Problem Generally, structured data sets have schema items numbering in the millions. Problem?? Writing structured queries is a tedious work due to the huge available information. This is Information Overload Problem. Solutions??

6 Outline 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

7 Information Need Find all people of German nationality who have won a Nobel award.

8 Information Need Find all people of German nationality who have won a Nobel award. Query q(x):- GERMAN PEOPLE(x), haswonprize(x,y), NOBEL PRIZE(y).

9 Information Need Find all people of German nationality who have won a Nobel award. Query q(x):- GERMAN PEOPLE(x), haswonprize(x,y), NOBEL PRIZE(y). Highly expressive. Very less flexible.

10 Outline 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

11 Keyword Query german has won nobel award.

12 Keyword Query german has won nobel award. Highly flexible. Less expressive.

13 Keyword Query german has won nobel award. Highly flexible. Less expressive. Our Goal Retain the expressive structure of structured queries. Also, incorporate the flexibility of keyword queries.

14 Outline 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

15 New Query german, has won(nobel award). This approach is flexible and is structured as well. Here, keywords will be replaced by candidate schema items.

16 New Query german, has won(nobel award). This approach is flexible and is structured as well. Here, keywords will be replaced by candidate schema items. Problem The number of possible queries is exponential in the size of the query.

17 An Example Have to use disambiguation steps before query evaluation to avoid exponential blow-up.

18 Outline Entity Search Semantic Search and Knowledge Base Preliminaries 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

19 Entity Search Entity Search Semantic Search and Knowledge Base Preliminaries Entity Search Entities form the most basic data-level construct. e.g. Los Angeles, Albert Einstein, etc. Reduces ambiguity in search. e.g. entity Max Planck is different from entity Max Planck Institute or the Max Planck Medal.

20 Outline Entity Search Semantic Search and Knowledge Base Preliminaries 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

21 Entity Search Semantic Search and Knowledge Base Preliminaries Semantic Search and Knowledge Base Semantic Search Entities and relations form the base level data constructs. Semantics defined by KBs are taken into consideration. e.g. search for GUITERIST would return documents about B B King, Jimi Hendrix.

22 Entity Search Semantic Search and Knowledge Base Preliminaries Semantic Search and Knowledge Base Semantic Search Entities and relations form the base level data constructs. Semantics defined by KBs are taken into consideration. e.g. search for GUITERIST would return documents about B B King, Jimi Hendrix. Knowledge Base A general term for a domain ontology and instance data. NOBEL LAUREATE is-a PERSON that has a haswonprize relation to some NOBEL PRIZE. Albert Einstein is-a PHYSICIST.

23 Outline Entity Search Semantic Search and Knowledge Base Preliminaries 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

24 Preliminaries Entity Search Semantic Search and Knowledge Base Preliminaries Entities denote constant terms such as people, places etc. Primitive concepts are types or classes to which entities may belong, e.g. SCIENTIST, COUNTRY. Relations are binary relationships that can exist between entities and/or primitive concepts, e.g. bornin. A Concept is any number of entities or primitive concepts joined by logical connectives such as conjunctions and relations. e.g. (SCIENTIST, haswonprize(nobel Prize in Physics)).

25 Outline 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

26

27 Structured Keyword Query Document Query DQ ::= Q (Q 1 ), (Q 2 ),..., (Q n ) where each Q i is a structured keyword query. Structured Keyword Query Q ::= k k(q) Q 1, Q 2,..., Q n where k is a keyword phrase. e.g. german,scientists, have won( nobel award ).

28 Disambiguation Graph Let Q be a structured keyword query. Then a disambiguation graph G = (V, E), and P a set of disjoint partitions of the nodes in V, are given by the following. V = k Q M(k). P = k Q {M(k)} E = edges(q).

29 Disambiguation Graph... Edge Generating Function edges(k) = {}. edges(k(q)) = ( n 1 M(k),n 2 root(q) (n 1, n 2 )) edges(q) edges(q 1, Q 2 ) = ( n 1 root(q 1 ),n 2 root(q 2 ) (n 1, n 2 )) edges(q 1 ) edges(q 2 )

30 Disambiguation Graph Example root(k) = M(k) root(k(q)) = M(k) root(q 1, Q 2 ) = root(q 1 ) root(q 2 ).

31 Semantic Similarity Semantic Similarity Sim Lin (A, B) = 2.log(P(LCS(A,B))) log(p(a))+log(p(b)) Sim Jaccard (A, B) = A B A B

32 Let C be a general concept expressed using the primitive concepts, relations, and entities in the knowledge base KB. Then the support of C by KB is given by, support(c, KB) = 0 if C is primitive. support(c, KB) = semanticsim(c 1, C 2,..., C n )+ i support(c i, KB), otherwise. where each C i is the i-th concept expression in a conjunction or relation occurring in C.

33 Knowledge Base Support... A Concept C GERMAN PEOPLE, SCIENTIST, haswonprize(nobel PRIZE), livesin(country, hasofficiallanguage(english Language)) Let C 1, C 2, C 3 be defined as follows: C 1 = haswonprize(nobel PRIZE) C 2 = livesin(country, C 3 ) C 3 = hasofficiallanguage(english Language) So, C= GERMAN PEOPLE, SCIENTIST, C 1, C 2.

34 Support Of A Concept support(c, KB) semanticsim(german PEOPLE, SCIENTIST, C 1, C 2 )+ semanticsim(haswonprize, NOBEL PRIZE)+ semanticsim(livesin, COUNTRY, C 3 )+ semanticsim(hasofficiallanguage, English Language).

35 Syntactic Similarity and Score Aggregation Syntactic Similarity syntactic matching of the concept label (or one of its synonyms) to the keyword. We use edit distance or q-gram distance. Score Aggregation Semantic and syntactic similarities are aggregated. It should be monotonic.

36 Concept Score Concept Score Let Q be a structured keyword query, C a concept, a binary aggregation function, and KB a knowledge base. Then, score(c, Q, KB) = support(c, KB) syntaxsim(c, Q). We define a new weight function as, w(v) = syntaxsim(label(v), k), where v M(k) w((v 1, v 2 )) = support((item(v 1 ), item(v 2 )), KB)

37 Approximate Score Approximate Score Let Q be a structured keyword query, KB a knowledge base, a binary aggregation function, and G a subgraph of the disambiguation graph of Q representing a concept C. Then, score(g, Q, KB) = ( (v 1,v 2 ) E w((v 1, v 2 )) v V w(v))

38 Outline 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

39 Disambiguation Problem Disambiguation Problem Find the top-k maximally scoring subgraphs corresponding to concept interpretations of the original query. Implement a rank-join algorithm assuming, scoring function is monotonic, sorted access to vertices and edges is possible.

40 Partial Interpretation Partial Interpretation No subgraph can be found that spans all partitions in a disambiguation graph for some query. Then, we build partial interpretations. We retrieve the documents and favor those which contain additional keywords. Rather, we can use document index to do a lookup of the keywords and then boosting documents from entity search, that occurs in this set.

41 Ranking Ranking The entities could be ranked by their relationships to the query in the KB, or based on statistics over the document corpus. Alternatively, we may have a document list as the result, ranked by the relevance of the corresponding entities to the document. We may also rank groups of documents per entity.

42 Outline Disambiguation Task Entity Search Task Document Retrieval Task 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

43 Disambiguation Task Disambiguation Task Entity Search Task Document Retrieval Task

44 Outline Disambiguation Task Entity Search Task Document Retrieval Task 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

45 Entity Search Task Disambiguation Task Entity Search Task Document Retrieval Task

46 Outline Disambiguation Task Entity Search Task Document Retrieval Task 1 2 Entity Search Semantic Search and Knowledge Base Preliminaries 3 4 Disambiguation Task Entity Search Task Document Retrieval Task

47 Document Retrieval Task Disambiguation Task Entity Search Task Document Retrieval Task

48 System Performance Disambiguation Task Entity Search Task Document Retrieval Task

49 Disambiguation Task Entity Search Task Document Retrieval Task Thank You

QUICK: Queries Using Inferred Concepts from Keywords

QUICK: Queries Using Inferred Concepts from Keywords QUICK: Queries Using Inferred Concepts from Keywords Technical Report CS-2009-18 Jeffrey Pound, Ihab F. Ilyas, and Grant Weddell David R. Cheriton School of Computer Science University of Waterloo Waterloo,

More information

Databases & Information Retrieval

Databases & Information Retrieval Databases & Information Retrieval Maya Ramanath (Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek,

More information

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day

More information

SPARK: Top-k Keyword Query in Relational Database

SPARK: Top-k Keyword Query in Relational Database SPARK: Top-k Keyword Query in Relational Database Wei Wang University of New South Wales Australia 20/03/2007 1 Outline Demo & Introduction Ranking Query Evaluation Conclusions 20/03/2007 2 Demo 20/03/2007

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Natural Language Processing. SoSe Question Answering

Natural Language Processing. SoSe Question Answering Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation

More information

Reasoning on Web Data Semantics

Reasoning on Web Data Semantics Reasoning on Web Data Semantics Oui. Peut-on préciser l'heure et le lieu? Merci Marie-Christine Rousset Université de Grenoble (UJF) et Institut Universitaire de France Amicalement Marie-Christine 1 Evolution

More information

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities 112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

NAGA: Searching and Ranking Knowledge. Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum

NAGA: Searching and Ranking Knowledge. Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum NAGA: Searching and Ranking Knowledge Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum MPI I 2007 5 001 March 2007 Authors Addresses Gjergji Kasneci Max-Planck-Institut

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

Web Semantics: Science, Services and Agents on the World Wide Web

Web Semantics: Science, Services and Agents on the World Wide Web Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 189 203 Contents lists available at ScienceDirect Web Semantics: Science, Services and Agents on the World Wide Web journal homepage:

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) Natural Language Processing SoSe 2015 Question Answering Dr. Mariana Neves July 6th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Introduction History QA Architecture Outline 3 Introduction

More information

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22 Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Junjun Wang 2013/4/22 Outline Introduction Related Word System Overview Subtopic Candidate Mining Subtopic Ranking Results and Discussion

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic Things to consider when using Semantics in your Information Management strategy Toby Conrad Smartlogic toby.conrad@smartlogic.com +1 773 251 0824 Some of Smartlogic s 250+ Customers Awards Trend Setting

More information

Database Design with Entity Relationship Model

Database Design with Entity Relationship Model Database Design with Entity Relationship Model Vijay Kumar SICE, Computer Networking University of Missouri-Kansas City Kansas City, MO kumarv@umkc.edu Database Design Process Database design process integrates

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The

More information

This book is licensed under a Creative Commons Attribution 3.0 License

This book is licensed under a Creative Commons Attribution 3.0 License 6. Syntax Learning objectives: syntax and semantics syntax diagrams and EBNF describe context-free grammars terminal and nonterminal symbols productions definition of EBNF by itself parse tree grammars

More information

Peter T. Wood. Third Alberto Mendelzon International Workshop on Foundations of Data Management

Peter T. Wood. Third Alberto Mendelzon International Workshop on Foundations of Data Management Languages Languages query School of Computer Science and Information Systems Birkbeck, University of London ptw@dcs.bbk.ac.uk Third Alberto Mendelzon International Workshop on Foundations of Data Management

More information

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu 12, Jianfeng Gao 1, Xiaodong He 1 Li Deng 1, Kevin Duh 2, Ye-Yi Wang 1 1

More information

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs

More information

Feature selection. LING 572 Fei Xia

Feature selection. LING 572 Fei Xia Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection

More information

Basics of Data Management

Basics of Data Management Basics of Data Management Chaitan Baru 2 2 Objectives of this Module Introduce concepts and technologies for managing structured, semistructured, unstructured data Obtain a grounding in traditional data

More information

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,

More information

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted

More information

Multi-agent and Semantic Web Systems: RDF Data Structures

Multi-agent and Semantic Web Systems: RDF Data Structures Multi-agent and Semantic Web Systems: RDF Data Structures Fiona McNeill School of Informatics 31st January 2013 Fiona McNeill Multi-agent Semantic Web Systems: RDF Data Structures 31st January 2013 0/25

More information

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information

More information

TextJoiner: On-demand Information Extraction with Multi-Pattern Queries

TextJoiner: On-demand Information Extraction with Multi-Pattern Queries TextJoiner: On-demand Information Extraction with Multi-Pattern Queries Chandra Sekhar Bhagavatula, Thanapon Noraset, Doug Downey Electrical Engineering and Computer Science Northwestern University {csb,nor.thanapon}@u.northwestern.edu,ddowney@eecs.northwestern.edu

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

Huayu Wu, and Zhifeng Bao National University of Singapore

Huayu Wu, and Zhifeng Bao National University of Singapore Huayu Wu, and Zhifeng Bao National University of Singapore Outline Introduction Motivation Our Approach Experiments Conclusion Introduction XML Keyword Search Originated from Information Retrieval (IR)

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

THE amount of Web data has increased exponentially

THE amount of Web data has increased exponentially 1 Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions Wei Shen, Jianyong Wang, Senior Member, IEEE, and Jiawei Han, Fellow, IEEE Abstract The large number of potential applications

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Efficient Top-K Count Queries over Imprecise Duplicates

Efficient Top-K Count Queries over Imprecise Duplicates Efficient Top-K Count Queries over Imprecise Duplicates Sunita Sarawagi IIT Bombay sunita@iitb.ac.in Vinay S Deshpande IIT Bombay vinaysd@cse.iitb.ac.in Sourabh Kasliwal IIT Bombay sourabh@cse.iitb.ac.in

More information

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) )

Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) ) Natural Language Processing SoSe 2014 Question Answering Dr. Mariana Neves June 25th, 2014 (based on the slides of Dr. Saeedeh Momtazi) ) Outline 2 Introduction History QA Architecture Natural Language

More information

Semantic Retrieval of the TIB AV-Portal. Dr. Sven Strobel IATUL 2015 July 9, 2015; Hannover

Semantic Retrieval of the TIB AV-Portal. Dr. Sven Strobel IATUL 2015 July 9, 2015; Hannover Semantic Retrieval of the TIB AV-Portal Dr. Sven Strobel IATUL 2015 July 9, 2015; Hannover Semantic Retrieval of the TIB AV-Portal Contents 1. TIB AV-Portal 2. Automatic Video Analysis 3. Named-Entity

More information

Jan Pedersen 22 July 2010

Jan Pedersen 22 July 2010 Jan Pedersen 22 July 2010 Outline Problem Statement Best effort retrieval vs automated reformulation Query Evaluation Architecture Query Understanding Models Data Sources Standard IR Assumptions Queries

More information

Speech-based Information Retrieval System with Clarification Dialogue Strategy

Speech-based Information Retrieval System with Clarification Dialogue Strategy Speech-based Information Retrieval System with Clarification Dialogue Strategy Teruhisa Misu Tatsuya Kawahara School of informatics Kyoto University Sakyo-ku, Kyoto, Japan misu@ar.media.kyoto-u.ac.jp Abstract

More information

QAKiS: an Open Domain QA System based on Relational Patterns

QAKiS: an Open Domain QA System based on Relational Patterns QAKiS: an Open Domain QA System based on Relational Patterns Elena Cabrio, Julien Cojan, Alessio Palmero Aprosio, Bernardo Magnini, Alberto Lavelli, Fabien Gandon To cite this version: Elena Cabrio, Julien

More information

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN Movie Related Information Retrieval Using Ontology Based Semantic Search Tarjni Vyas, Hetali Tank, Kinjal Shah Nirma University, Ahmedabad tarjni.vyas@nirmauni.ac.in, tank92@gmail.com, shahkinjal92@gmail.com

More information

Distributed Case-based Reasoning for Fault Management

Distributed Case-based Reasoning for Fault Management Distributed Case-based Reasoning for Fault Management Ha Manh Tran and Jürgen Schönwälder Computer Science, Jacobs University Bremen, Germany 1st EMANICS Workshop on Peer-to-Peer Management University

More information

Towards Summarizing the Web of Entities

Towards Summarizing the Web of Entities Towards Summarizing the Web of Entities contributors: August 15, 2012 Thomas Hofmann Director of Engineering Search Ads Quality Zurich, Google Switzerland thofmann@google.com Enrique Alfonseca Yasemin

More information

Open Data Integration. Renée J. Miller

Open Data Integration. Renée J. Miller Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that

More information

Extensible Dynamic Form Approach for Supplier Discovery

Extensible Dynamic Form Approach for Supplier Discovery Extensible Dynamic Form Approach for Supplier Discovery Yan Kang, Jaewook Kim, and Yun Peng Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County {kangyan1,

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

The Luxembourg BabelNet Workshop

The Luxembourg BabelNet Workshop The Luxembourg BabelNet Workshop 2 March 2016: Session 3 Tech session Disambiguating text with Babelfy. The Babelfy API Claudio Delli Bovi Outline Multilingual disambiguation with Babelfy Using Babelfy

More information

CMU System for Entity Discovery and Linking at TAC-KBP 2016

CMU System for Entity Discovery and Linking at TAC-KBP 2016 CMU System for Entity Discovery and Linking at TAC-KBP 2016 Xuezhe Ma, Nicolas Fauceglia, Yiu-chang Lin, and Eduard Hovy Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave, Pittsburgh,

More information

Understanding Tables on the Web

Understanding Tables on the Web Understanding Tables on the Web ABSTRACT The Web contains a wealth of information, and a key challenge is to make this information machine processable. Because natural language understanding at web scale

More information

Fractional Similarity : Cross-lingual Feature Selection for Search

Fractional Similarity : Cross-lingual Feature Selection for Search : Cross-lingual Feature Selection for Search Jagadeesh Jagarlamudi University of Maryland, College Park, USA Joint work with Paul N. Bennett Microsoft Research, Redmond, USA Using All the Data Existing

More information

MSRA Columbus at GeoCLEF 2006

MSRA Columbus at GeoCLEF 2006 MSRA Columbus at GeoCLEF 2006 Zhisheng Li, Chong Wang 2, Xing Xie 2, Wei-Ying Ma 2 Department of Computer Science, University of Sci. & Tech. of China, Hefei, Anhui, 230026, P.R. China zsli@mail.ustc.edu.cn

More information

Entity Ranking and Relationship Queries Using an Extended Graph Model

Entity Ranking and Relationship Queries Using an Extended Graph Model Entity Ranking and Relationship Queries Using an Extended Graph Model Ajitav Sahoo IIT Bombay ajitavsahoo@gmail.com Ankur Agrawal IIT Bombay ankuragrawal.iitb@gmail.com Adil Anis Sandalwala IIT Bombay

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections

More information

Some experiments on Telugu

Some experiments on Telugu Siva Abhilash Language Technologies Research Center IIIT Hyderabad August 27, 2008 Outline 1 WSD for telugu Introduction Resources Approach 2 Introduction Dutch Wordnet 3 Building Telugu Wordnet Method

More information

On Generating Benchmark Data for Entity Matching

On Generating Benchmark Data for Entity Matching J Data Semant (2013) 2:37 56 DOI 10.1007/s13740-012-0015-8 ORIGINAL ARTICLE On Generating Benchmark Data for Entity Matching Ekaterini Ioannou Nataliya Rassadko Yannis Velegrakis Received: 31 August 2011

More information

Efficient Top-k Algorithms for Fuzzy Search in String Collections

Efficient Top-k Algorithms for Fuzzy Search in String Collections Efficient Top-k Algorithms for Fuzzy Search in String Collections Rares Vernica Chen Li Department of Computer Science University of California, Irvine First International Workshop on Keyword Search on

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

Web Document Clustering using Semantic Link Analysis

Web Document Clustering using Semantic Link Analysis Web Document Clustering using Semantic Link Analysis SOMJIT ARCH-INT, Ph.D. Semantic Information Technology Innovation (SITI) LAB Department of Computer Science, Faculty of Science, Khon Kaen University,

More information

Fundamentals of Database Systems

Fundamentals of Database Systems Fundamentals of Database Systems Assignment: 4 September 21, 2015 Instructions 1. This question paper contains 10 questions in 5 pages. Q1: Calculate branching factor in case for B- tree index structure,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2015 Quiz I There are 12 questions and 13 pages in this quiz booklet. To receive

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

MIA - Master on Artificial Intelligence

MIA - Master on Artificial Intelligence MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use

More information

Graph-Based Entity Linking

Graph-Based Entity Linking Graph-Based Entity Linking A. Naderi, J. Turmo, H. Rodríguez TALP Research Center Universitat Politecnica de Catalunya February 20, 2014 1 / 29 Table of contents 1 Introduction Entity Linking EL applications

More information

NERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017

NERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017 NERD workshop Luca Foppiano @ ALMAnaCH - Inria Paris Berlin, 18/09/2017 Agenda Introducing the (N)ERD service NERD REST API Usages and use cases Entities Rigid textual expressions corresponding to certain

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Yunqing Xia 1 and Sen Na 2 1 Tsinghua University 2 Canon Information Technology (Beijing) Co. Ltd. Before we start Who are we? THUIS is

More information

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ??? @ INSIDE DEEPQA Managing complex unstructured data with UIMA Simon Ellis INTRODUCTION 22 nd November, 2013 WAT SON TECHNOLOGIES AND OPEN ARCHIT ECT URE QUEST ION ANSWERING PROFESSOR JIM HENDLER S IMON

More information

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,

More information

ER to Relational Mapping

ER to Relational Mapping ER to Relational Mapping 1 / 19 ER to Relational Mapping Step 1: Strong Entities Step 2: Weak Entities Step 3: Binary 1:1 Relationships Step 4: Binary 1:N Relationships Step 5: Binary M:N Relationships

More information

Scalable Knowledge Harvesting with High Precision and High Recall. Ndapa Nakashole Martin Theobald Gerhard Weikum

Scalable Knowledge Harvesting with High Precision and High Recall. Ndapa Nakashole Martin Theobald Gerhard Weikum Scalable Knowledge Harvesting with High Precision and High Recall Ndapa Nakashole Martin Theobald Gerhard Weikum Web Knowledge Harvesting Goal: To organise text into precise facts Alex Rodriguez A.Rodriguez

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Ornsiri Thonggoom, Il-Yeol Song, Yuan An The ischool at Drexel Philadelphia, PA USA Outline Long Term Research

More information

Implementing Data Models and Reports with Microsoft SQL Server Exam Summary Syllabus Questions

Implementing Data Models and Reports with Microsoft SQL Server Exam Summary Syllabus Questions 70-466 Implementing Data Models and Reports with Microsoft SQL Server Exam Summary Syllabus Questions Table of Contents Introduction to 70-466 Exam on Implementing Data Models and Reports with Microsoft

More information

Unstructured Data. CS102 Winter 2019

Unstructured Data. CS102 Winter 2019 Winter 2019 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data

More information

Information Retrieval (IR) through Semantic Web (SW): An Overview

Information Retrieval (IR) through Semantic Web (SW): An Overview Information Retrieval (IR) through Semantic Web (SW): An Overview Gagandeep Singh 1, Vishal Jain 2 1 B.Tech (CSE) VI Sem, GuruTegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi 2

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

WebSAIL Wikifier at ERD 2014

WebSAIL Wikifier at ERD 2014 WebSAIL Wikifier at ERD 2014 Thanapon Noraset, Chandra Sekhar Bhagavatula, Doug Downey Department of Electrical Engineering & Computer Science, Northwestern University {nor.thanapon, csbhagav}@u.northwestern.edu,ddowney@eecs.northwestern.edu

More information

A Short Introduction to CATMA

A Short Introduction to CATMA A Short Introduction to CATMA Outline: I. Getting Started II. Analyzing Texts - Search Queries in CATMA III. Annotating Texts (collaboratively) with CATMA IV. Further Search Queries: Analyze Your Annotations

More information

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect:

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect: Outline Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries Dongwon Lee, Byung-Won On Penn State University, USA Jaewoo Kang North Carolina State University, USA

More information

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation Kaushik Chakrabarti Venkatesh Ganti Jiawei Han Dong Xin* Microsoft Research Microsoft Research University of Illinois University

More information

Incremental Pseudo Rectangular Organization of Information Relative to a Domain RAMiCS 13 Cambridge, UK September 17-20

Incremental Pseudo Rectangular Organization of Information Relative to a Domain RAMiCS 13 Cambridge, UK September 17-20 Incremental Pseudo Rectangular Organization of Information Relative to a Domain RAMiCS 13 Cambridge, UK September 17-20 Sahar Ismail & Ali Jaoua 1 Agenda 1. Motivation and Objectives 2. Background 3. Solution

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 95-96 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity (Matching)

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114 [Saranya, 4(3): March, 2015] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A SURVEY ON KEYWORD QUERY ROUTING IN DATABASES N.Saranya*, R.Rajeshkumar, S.Saranya

More information

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

A SEMANTIC MATCHMAKER SERVICE ON THE GRID

A SEMANTIC MATCHMAKER SERVICE ON THE GRID DERI DIGITAL ENTERPRISE RESEARCH INSTITUTE A SEMANTIC MATCHMAKER SERVICE ON THE GRID Andreas Harth Yu He Hongsuda Tangmunarunkit Stefan Decker Carl Kesselman DERI TECHNICAL REPORT 2004-05-18 MAY 2004 DERI

More information

Tansu Alpcan C. Bauckhage S. Agarwal

Tansu Alpcan C. Bauckhage S. Agarwal 1 / 16 C. Bauckhage S. Agarwal Deutsche Telekom Laboratories GBR 2007 2 / 16 Outline 3 / 16 Overview A novel expert peering system for community-based information exchange A graph-based scheme consisting

More information

Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking

Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking P.Ilakiya Abstract The growth of information in the web is too large, so search engine come

More information

IE in Context. Machine Learning Problems for Text/Web Data

IE in Context. Machine Learning Problems for Text/Web Data Machine Learning Problems for Text/Web Data Lecture 24: Document and Web Applications Sam Roweis Document / Web Page Classification or Detection 1. Does this document/web page contain an example of thing

More information

COMP718: Ontologies and Knowledge Bases

COMP718: Ontologies and Knowledge Bases 1/35 COMP718: Ontologies and Knowledge Bases Lecture 9: Ontology/Conceptual Model based Data Access Maria Keet email: keet@ukzn.ac.za home: http://www.meteck.org School of Mathematics, Statistics, and

More information

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1 Outline Quick Introduction to Database Systems Why do we need a different kind of system? What is a database system? Separating the what the how: The relational data model Querying the databases: SQL May

More information