6LPSOLI\LQJ'DWD$FFHVV 7KH(QHUJ\'DWD&ROOHFWLRQ3URMHFW. José Luis Ambite University of Southern California Information Science Institute

Size: px
Start display at page:

Download "6LPSOLI\LQJ'DWD$FFHVV 7KH(QHUJ\'DWD&ROOHFWLRQ3URMHFW. José Luis Ambite University of Southern California Information Science Institute"

Transcription

1 6LPSOLI\LQJ'DWD$FFHVV 7KH(QHUJ\'DWD&ROOHFWLRQ3URMHFW José Luis Ambite University of Southern California Information Science Institute

2 7KH9LVLRQ $VNWKH*RYHUQPHQW We re thinking of moving to Denver...What are the schools like there? Query results How many people had breast cancer in the area over the past 30 years? Is there an orchestra? An art gallery? How far are the nightclubs? How have property values in the area changed over the past decade? Census Labor Stats

3 7KHSUREOHPDQGWKHVROXWLRQ Q Problem: FedStats has thousands of databases in over 70 Government agencies: X data is overlapping, sometimes near-duplicated X even Government officials cannot find it! Q Solution: Create a system to provide easy standardized access: X need multi-database access engine, X need powerful user interface, X need terminology standardization mechanism.

4 5HVHDUFKFKDOOHQJHV Q How do you scale to incorporate many databases? try to build data models automatically Q How do you integrate their data models, to allow querying across sources and agencies? take a large ontology, link the models into it automatically, and provide query reformulator Q How do you incorporate additional information that is available from text sources? use language processing tools to extract it Q How do you handle footnotes in the databases? extract them from the tables automatically

5 6\VWHP$UFKLWHFWXUH Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data

6 Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data

7 ,QIRUPDWLRQ,QWHJUDWLRQLQ6,06 To enable query access, SIMS needs to: [Arens+96] [Ambite+00] Q address semantic heterogeneity: => describe sources in common domain model Q address syntactic (format) heterogeneity: => standardize access to sources: X Structured (DBMS): Oracle, MS Access X Semistructured: wrappers for html, text, pdf

8 'RPDLQ0RGHOIRU7LPH6HULHV 8QLW 3RLQWRI6DOH 3URGXFW <HDU 3HULRG 0RQWK $UHD 86$ &$ 1< :HHN 0HDVXUHPHQW 'DWH 6XEFODVV 3DUWRI *HQHUDO5HODWLRQ 6RXUFH0DSSLQJ 9DOXH 7LPH 6HULHV )RRWQRWH 7DJ *5HJXODU 7H[W *3UHPLXP *DVROLQH &3, *3UHPLXP 8QOHDGHG 4XDOLW\ 3ULFH */HDGHG *8QOHDGHG 33, 9ROXPH

9 <HDU 'RPDLQ0RGHO 3HULRG 0RQWK $UHD 86$ 1< &$ :HHN 8QLW &HQWVJDOORQ (,$7 0HDVXUHPHQW 'DWH 6XEFODVV 3DUWRI *HQHUDO5HODWLRQ 6RXUFH0DSSLQJ (,$73ULFHVRI5HJXODU*DVROLQHLQ&$ 6DOHVWRHQGXVHUVWKURXJKUHWDLORXWOHWV 0HDVXUHGLQFHQWVSHUJDOORQ 7LPH 6HULHV 9DOXH 3RLQWRI6DOH 5HWDLORXWOHWV )RRWQRWH 7DJ (,$0XOWLVWDWH :UDSSHU *5HJXODU *3UHPLXP 7H[W 3URGXFW *DVROLQH *3UHPLXP 8QOHDGHG 4XDOLW\ 3ULFH */HDGHG *8QOHDGHG 9ROXPH

10 :UDSSHUV provide uniform mechanism for extracting data from semi-structured sources (HTML, text, ) 5HVWDXUDQWVLQ 6DQWD0RQLFD" 1DPH $GGUHVV &KLQRLVRQ0DLQ 0DLQ6W &KDR'DUD 8QLRQ6T «Wrapper

11 :UDSSHU%XLOGLQJ7RROV Q Creating Wrappers (semi-)automatically: X Demonstration-oriented interface enables users to show system what to extract by example X System automatically induces extraction rules X Common extraction engine Q Benefits: X Rapid wrapper creation X Simplified wrapper maintenance Q Fetch.com X Start-up that comercializes the technology [Muslea+99]

12 :UDSSHU%XLOGLQJ7RROV

13 Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data

14 The Core Large ontology (SENSUS) Concepts from glossaries (LKB) Linguistic mapping Logical mapping Domain-specific ontologies (SIMS models) Data sources

15 2QWRORJ\PRGHOVGDWD Q SENSUS ontology (90000 concepts) Q LKB glossary extracts (6000 concepts) X Purpose: provide rich domain knowledge X Sources: Census, EIA, EPA texts Q SIMS domain model (500 dimension conce, series) X Purpose: describe data, query planning and optimization X Sources: database metadata, extracted by hand Q Database sources (25000 time series) X EIA OGIRS: CD with MS Access database; no footnotes Q Web data sources (25000 time series) X EIA Multi-State: time series; formatted text X BLS: 60 time series; web form; 40 have footnotes X EIA Petroleum Supply Monthly: 20 time series in PDF; converted to HTML and then wrapped; all with footnotes X CEC: 12 series; static HTML tables; no footnotes

16 ,6, V 6(1686RQWRORJ\ [Knight+95] [Hovy+00] Q Q Q Q Q Taxonomy, multiple superclass links Approx. 90,000 concepts Top level: Penman Upper Model (ISI) Body: WordNet (Princeton), rearranged KWWSDULDGQHLVLHGXVHQVXVHGF Used at ISI for machine translation, text summarization, database access

17 ([WUDFWLQJ PHWDGDWD IURPWH[W [Klavans+01] Problems: X Proliferation of terms in domain X Agencies define terms differently X Many refer to the same or related entity X Lengthy term definitions often contain important information which is buried Example input: Motor Gasoline Blending Components: Naphthas (e.g., straight-run gasoline, alkylate, reformate, benzene, toluene, xylene) used for blending or compounding into finished motor gasoline. These components include reformulated gasoline blendstock for oxygenate blending (RBOB) but exclude oxygenates (alcohols, ethers), butane, and pentanes plus. Note: Oxygenates are reported as individual components and are included in the total for other hydrocarbons, hydrogens, and oxygenates.

18 /H[LFDO.QRZOHGJH%DVH/.%7RRO Combines statistical and linguistic methods: Q Q Q Q Q trainable Finite State Recognizer identifies topics with high accuracy provides complete coverage useful for any subject area produced over 6,000 gasoline concepts from EIA and BLS

19 /LQNLQJ'RPDLQ0RGHOVWR2QWRORJ\ Match DM concepts against SENSUS concepts 1. Name Match Variations: substrings; reward for contiguous endings; stemming; variant word forms; case sensitivity; etc. Implemented string match from molecular biology for DNA matching 2. Definition Match Variations: stemming; stop words; words or letter trigrams; word weights (itf, tf.idf ), etc. Used IR vector space; cosine measure 3. Dispersal Match Def: Given a cluster of concepts, match them all and then select one match for each concept to form tightest cluster in SENSUS Variations: greedy search; number of candidates; weightings Name match (6 hrs) k-means clustering (5 hrs) Def match (4 hrs) Dispersal match (incompl.) [Hovy+01]

20 Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data

21

22

23

24

25

26

27

28

29 &RQFOXVLRQ Large-scale information integration Q Major research challenges: X Semi-automated domain modeling, information extraction, linkage to ontology X Efficient query processing X Footnote extraction and display Q Major practical challenges: X Understanding users needs: interface, information... X Getting a lot of data into the system

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

0.1 Knowledge Organization Systems for Semantic Web

0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization

More information

Database Foundations. 1-2 Introduction to Databases. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Database Foundations. 1-2 Introduction to Databases. Copyright 2015, Oracle and/or its affiliates. All rights reserved. Database Foundations 1-2 Roadmap You are here About the Course Introduction to Databases Types of Database Models Relational Databases Database Storage Structures Understanding Business Requirements 3

More information

Searching the Deep Web

Searching the Deep Web Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 95-96 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity (Matching)

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 94-95 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London ISSUES IN INFORMATION RETRIEVAL Brian Vickery Presentation at ISKO meeting on June 26, 2008 At University College, London NEEDLE IN HAYSTACK MY BACKGROUND Plant chemist, then reports librarian Librarian,

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Searching the Deep Web

Searching the Deep Web Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index

More information

Chapter 12 DATA AND KNOWLEDGE INTEGRATION FOR EGOVERNMENT CHAPTER OVERVIEW. Eduard Hovy

Chapter 12 DATA AND KNOWLEDGE INTEGRATION FOR EGOVERNMENT CHAPTER OVERVIEW. Eduard Hovy Chapter 12 DATA AND KNOWLEDGE INTEGRATION FOR EGOVERNMENT Eduard Hovy Information Sciences Institute, University of Southern California, Marina del Rey, California, U.S.A. (hovy@isi.edu) CHAPTER OVERVIEW

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

MULTIMEDIA DATABASES OVERVIEW

MULTIMEDIA DATABASES OVERVIEW MULTIMEDIA DATABASES OVERVIEW Recent developments in information systems technologies have resulted in computerizing many applications in various business areas. Data has become a critical resource in

More information

Source Modeling. Kristina Lerman University of Southern California. Based on slides by Mark Carman, Craig Knoblock and Jose Luis Ambite

Source Modeling. Kristina Lerman University of Southern California. Based on slides by Mark Carman, Craig Knoblock and Jose Luis Ambite Source Modeling Kristina Lerman University of Southern California Based on slides by Mark Carman, Craig Knoblock and Jose Luis Ambite Source Modeling vs. Schema Matching Schema Matching Align schemas between

More information

Everyday Activity. Course Content. Objectives of Lecture 13 Search Engine

Everyday Activity. Course Content. Objectives of Lecture 13 Search Engine Web Technologies and Applications Winter 2001 CMPUT 499: Search Engines Dr. Osmar R. Zaïane University of Alberta Everyday Activity We use search engines whenever we look for resources on the Internet

More information

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Data Mining Associate Professor Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology (1) 2016 2017 Department of CS- DM - UHD 1 Points to Cover Why Do We Need Data

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Ian Fore, D.Phil. Associate Director, Biorepository and Pathology Informatics Senior Program

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

Emerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc.

Emerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc. Emerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc. This paper provides an overview of a presentation at the Internet Librarian International conference in London

More information

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation Oracle 10G Lindsey M. Pickle, Jr. Senior Solution Specialist Technologies Oracle Corporation Oracle 10g Goals Highest Availability, Reliability, Security Highest Performance, Scalability Problem: Islands

More information

Conceptual Integration of Genome Databases via Reduced Autonomy and Domain-Specific Data Models

Conceptual Integration of Genome Databases via Reduced Autonomy and Domain-Specific Data Models Conceptual Integration of Genome Databases via Reduced Autonomy and Domain-Specific Data Models Mark Graves Dept of Cell Biology Houston, TX USA email: mgraves@bcm.tmc.edu Ellen Bergeman Charles Lawrence

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Database Heterogeneity

Database Heterogeneity Database Heterogeneity Lecture 13 1 Outline Database Integration Wrappers Mediators Integration Conflicts 2 1 1. Database Integration Goal: providing a uniform access to multiple heterogeneous information

More information

Acquiring Experience with Ontology and Vocabularies

Acquiring Experience with Ontology and Vocabularies Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended

More information

Get the most value from your surveys with text analysis

Get the most value from your surveys with text analysis SPSS Text Analysis for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That s

More information

Semantic Interoperability. Being serious about the Semantic Web

Semantic Interoperability. Being serious about the Semantic Web Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA 1 Being serious about the Semantic Web It is not one person s ontology It is not several people s common

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Information Integration

Information Integration .. Dennis Sun DATA 401: Data Science Alexander Dekhtyar.. Information Integration Data Integration. Data Integration is the process of combining data residing in different sources and providing the user

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the Same Challenge

{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the Same Challenge Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 11-5-2006 {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance}

More information

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany Information Systems & University of Koblenz Landau, Germany Semantic Search examples: Swoogle and Watson Steffen Staad credit: Tim Finin (swoogle), Mathieu d Aquin (watson) and their groups 2009-07-17

More information

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE June 30, 2012 San Diego Convention Center ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE Stuart Laurie, Senior Consultant #SPSSAN Agenda 1. Challenges 2. What comes out of the box

More information

Using Ontologies for Data and Semantic Integration

Using Ontologies for Data and Semantic Integration Using Ontologies for Data and Semantic Integration Monica Crubézy Stanford Medical Informatics, Stanford University ~~ November 4, 2003 Ontologies Conceptualize a domain of discourse, an area of expertise

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

MSc Advanced Computer Science School of Computer Science The University of Manchester

MSc Advanced Computer Science School of Computer Science The University of Manchester PROGRESS REPORT Ontology-Based Technical Document Retrieval System Ruvin Yusubov Supervisor: Professor Ulrike Sattler MSc Advanced Computer Science School of Computer Science The University of Manchester

More information

Craig Knoblock University of Southern California Joint work with Rahul Parundekar and José Luis Ambite

Craig Knoblock University of Southern California Joint work with Rahul Parundekar and José Luis Ambite Building Semantic Descriptions of Linked Data Craig Knoblock University of Southern California Joint work with Rahul Parundekar and José Luis Ambite Linked Open Data and Services Vast collection of interlinked

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Oracle Insurance IStream

Oracle Insurance IStream Oracle Insurance IStream IStream Document Manager Glossary Release 6.3 E15015-01 June 2009 Copyright Copyright 2009, Oracle and/or its affiliates. All rights reserved. Primary Authors: Andrew Brooke and

More information

Lecture 8 May 7, Prabhakar Raghavan

Lecture 8 May 7, Prabhakar Raghavan Lecture 8 May 7, 2001 Prabhakar Raghavan Clustering documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics Given the set of docs from the results of

More information

An Algorithm for Merging and Aligning Ontologies: Automation and Tool Support

An Algorithm for Merging and Aligning Ontologies: Automation and Tool Support An Algorithm for Merging and Aligning Ontologies: Automation and Tool Support Natalya Fridman Noy and Mark A. Musen From: AAAI Technical Report WS-99-13. Compilation copyright 1999, AAAI (www.aaai.org).

More information

Content Interoperability Strategy

Content Interoperability Strategy Content Interoperability Strategy 28th September 2005 Antoine Rizk 1 Presentation plan Introduction Context Objectives Input sources Semantic interoperability EIF Definition Semantic assets The European

More information

THE ARIADNE APPROACH TO WEB-BASED INFORMATION INTEGRATION

THE ARIADNE APPROACH TO WEB-BASED INFORMATION INTEGRATION International Journal of Cooperative Information Systems c World Scientific Publishing Company THE ARIADNE APPROACH TO WEB-BASED INFORMATION INTEGRATION CRAIG A. KNOBLOCK, STEVEN MINTON, JOSE LUIS AMBITE,

More information

Re-designing Online Terminology Resources for German Grammar

Re-designing Online Terminology Resources for German Grammar Re-designing Online Terminology Resources for German Grammar Project Report Karolina Suchowolec, Christian Lang, and Roman Schneider Institut für Deutsche Sprache (IDS), Mannheim, Germany {suchowolec,

More information

Xyleme Studio Data Sheet

Xyleme Studio Data Sheet XYLEME STUDIO DATA SHEET Xyleme Studio Data Sheet Rapid Single-Source Content Development Xyleme allows you to streamline and scale your content strategy while dramatically reducing the time to market

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

BIBL NEEDS REVISION INTRODUCTION

BIBL NEEDS REVISION INTRODUCTION BIBL NEEDS REVISION FCSM Statistical Policy Seminar Session Title: Integrating Electronic Systems for Disseminating Statistics Paper Title: Interacting with Tabular Data Through the World Wide Web Paper

More information

An overview of Graph Categories and Graph Primitives

An overview of Graph Categories and Graph Primitives An overview of Graph Categories and Graph Primitives Dino Ienco (dino.ienco@irstea.fr) https://sites.google.com/site/dinoienco/ Topics I m interested in: Graph Database and Graph Data Mining Social Network

More information

Snehal Thakkar, Craig A. Knoblock, Jose Luis Ambite, Cyrus Shahabi

Snehal Thakkar, Craig A. Knoblock, Jose Luis Ambite, Cyrus Shahabi From: AAAI Technical Report WS-02-07. Compilation copyright 2002, AAAI (www.aaai.org). All rights reserved. Dynamically Composing Web Services from On-line Sources Snehal Thakkar, Craig A. Knoblock, Jose

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics

More information

DEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES

DEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES DEC. 1-5 Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES Monday Overview of Databases A web search engine is a large database containing information about Web pages that have been registered

More information

CSE 344 MAY 7 TH EXAM REVIEW

CSE 344 MAY 7 TH EXAM REVIEW CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

National Association of Regional Councils: SICoP DRM 2.0 Pilot

National Association of Regional Councils: SICoP DRM 2.0 Pilot National Association of Regional Councils: SICoP DRM 2.0 Pilot Brand Niemann (US EPA), Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee (BPC), Federal CIO Council

More information

STS Infrastructural considerations. Christian Chiarcos

STS Infrastructural considerations. Christian Chiarcos STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)

More information

Answer Extraction. NLP Systems and Applications Ling573 May 13, 2014

Answer Extraction. NLP Systems and Applications Ling573 May 13, 2014 Answer Extraction NLP Systems and Applications Ling573 May 13, 2014 Answer Extraction Goal: Given a passage, find the specific answer in passage Go from ~1000 chars -> short answer span Example: Q: What

More information

A Survey on Unsupervised Extraction of Product Information from Semi-Structured Sources

A Survey on Unsupervised Extraction of Product Information from Semi-Structured Sources A Survey on Unsupervised Extraction of Product Information from Semi-Structured Sources Abhilasha Bhagat, ME Computer Engineering, G.H.R.I.E.T., Savitribai Phule University, pune PUNE, India Vanita Raut

More information

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation

More information

P to ~o, c E E' D' INl -a~s

P to ~o, c E E' D' INl -a~s 02- -4S The 6th World Muiticonference on Sy,sttemics, Cyib.emetics and [nformatics July 14-18,2002 - Orlando, Florida, USA P to ~o, c E E' D' INl -a~s ~ - - \.. t ~ 1. \.. t. ~ \ Volume VII Information

More information

Learning High Accuracy Rules for Object Identification

Learning High Accuracy Rules for Object Identification Learning High Accuracy Rules for Object Identification Sheila Tejada Wednesday, December 12, 2001 Committee Chair: Craig A. Knoblock Committee: Dr. George Bekey, Dr. Kevin Knight, Dr. Steven Minton, Dr.

More information

Wide Area Query Systems The Hydra of Databases

Wide Area Query Systems The Hydra of Databases Wide Area Query Systems The Hydra of Databases Stonebraker et al. 96 Gribble et al. 02 Zachary G. Ives University of Pennsylvania January 21, 2003 CIS 650 Data Sharing and the Web The Vision A World Wide

More information

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not

More information

management, and enables businesses to effectively create, deploy, and manage Internet

management, and enables businesses to effectively create, deploy, and manage Internet Public Page 1 of 11 The EchoBus CMS enables your staff to create dynamic websites by providing with extensive features for website development and content management, and enables businesses to effectively

More information

Large Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood

Large Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood Large Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood University of Sheffield, UK 1 Burning questions you may have... In the last 3 years, which female

More information

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM 1 1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL In the context of federation of repositories of Semantic Interoperability s, a number of entities are relevant. The primary entities to be described by ADMS are the

More information

AAAI 2018 Tutorial Building Knowledge Graphs. Craig Knoblock University of Southern California

AAAI 2018 Tutorial Building Knowledge Graphs. Craig Knoblock University of Southern California AAAI 2018 Tutorial Building Knowledge Graphs Craig Knoblock University of Southern California Wrappers for Web Data Extraction Extracting Data from Semistructured Sources NAME Casablanca Restaurant STREET

More information

Information Retrieval Strategies. Presented by Patrick Hoey

Information Retrieval Strategies. Presented by Patrick Hoey Information Retrieval Strategies Presented by Patrick Hoey 1 Information Retrieval Strategies For Keyword-Based Searches Information is stored in various types of media, ranging from customer account databases

More information

Data Mining & Data Warehouse

Data Mining & Data Warehouse Data Mining & Data Warehouse Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Information Technology 2016 2017 (1) Points to Cover Problem:

More information

Lightweight Transformation of Tabular Open Data to RDF

Lightweight Transformation of Tabular Open Data to RDF Proceedings of the I-SEMANTICS 2012 Posters & Demonstrations Track, pp. 38-42, 2012. Copyright 2012 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.

More information

8/19/13. Computational problems. Introduction to Algorithm

8/19/13. Computational problems. Introduction to Algorithm I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship

More information

Presented by Kit Na Goh

Presented by Kit Na Goh Developing A Geo-Spatial Search Tool Using A Relational Database Implementation of the FGDC CSDGM Model Presented by Kit Na Goh Introduction Executive Order 12906 was issued on April 13, 1994 with the

More information

Publishing Linked Statistical Data: Aragón, a case study.

Publishing Linked Statistical Data: Aragón, a case study. Publishing Linked Statistical Data: Aragón, a case study. Oscar Corcho 1, Idafen Santana-Pérez 1, Hugo Lafuente 2, David Portolés 3, César Cano 4, Alfredo Peris 4, and José María Subero 4 1 Ontology Engineering

More information

Craig Knoblock University of Southern California. These slides are based in part on slides from Sheila Tejada and Misha Bilenko

Craig Knoblock University of Southern California. These slides are based in part on slides from Sheila Tejada and Misha Bilenko Record Linkage Craig Knoblock University of Southern California These slides are based in part on slides from Sheila Tejada and Misha Bilenko Craig Knoblock University of Southern California 1 Record Linkage

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

Governmental Statistical Data on the Web: A case study of FedStats

Governmental Statistical Data on the Web: A case study of FedStats Governmental Statistical Data on the Web: A case study of FedStats Irina Ceaparu University of Maryland, Department of Computer Science, Human Computer Interaction Laboratory Revised: December st, 00 Introduction

More information

RiMOM Results for OAEI 2008

RiMOM Results for OAEI 2008 RiMOM Results for OAEI 2008 Xiao Zhang 1, Qian Zhong 1, Juanzi Li 1, Jie Tang 1, Guotong Xie 2 and Hanyu Li 2 1 Department of Computer Science and Technology, Tsinghua University, China {zhangxiao,zhongqian,ljz,tangjie}@keg.cs.tsinghua.edu.cn

More information

Publishing CitiSense Data: Privacy Concerns and Remedies

Publishing CitiSense Data: Privacy Concerns and Remedies Publishing CitiSense Data: Privacy Concerns and Remedies Kapil Gupta Advisor : Prof. Bill Griswold 1 Location Based Services Great utility of location based services data traffic control, mobility management,

More information

The InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure.

The InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure. Metadata Integration Appliance Times have changed and here is modern solution that delivers instant return on your investment. The InfoLibrarian Metadata Appliance Automated Cataloging System for your

More information

Opus: University of Bath Online Publication Store

Opus: University of Bath Online Publication Store Patel, M. (2004) Semantic Interoperability in Digital Library Systems. In: WP5 Forum Workshop: Semantic Interoperability in Digital Library Systems, DELOS Network of Excellence in Digital Libraries, 2004-09-16-2004-09-16,

More information

MetaNews: An Information Agent for Gathering News Articles On the Web

MetaNews: An Information Agent for Gathering News Articles On the Web MetaNews: An Information Agent for Gathering News Articles On the Web Dae-Ki Kang 1 and Joongmin Choi 2 1 Department of Computer Science Iowa State University Ames, IA 50011, USA dkkang@cs.iastate.edu

More information

Chapter 4 Research Prototype

Chapter 4 Research Prototype Chapter 4 Research Prototype According to the research method described in Chapter 3, a schema and ontology-assisted heterogeneous information integration prototype system is implemented. This system shows

More information

The New Document Digital Polymorphic Ubiquitous Actionable Patrick P. Bergmans University of Ghent

The New Document Digital Polymorphic Ubiquitous Actionable Patrick P. Bergmans University of Ghent X X The New Document Digital Polymorphic Ubiquitous Actionable Patrick P. Bergmans University of Ghent The Traditional Document Documents have been around for thousands of years The Bible is a document

More information

Linked Data in Translation-Kits

Linked Data in Translation-Kits Linked Data in Translation-Kits FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation Slides and code are available at: https://copy.com/ngbco13l9yls This presentation was made possible by The main

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Exploiting Semantics Where We Find Them

Exploiting Semantics Where We Find Them Vrije Universiteit Amsterdam 19/06/2018 Exploiting Semantics Where We Find Them A Bottom-up Approach to the Semantic Web Prof. Dr. Christian Bizer Bizer: Exploiting Semantics Where We Find Them. VU Amsterdam,

More information

ER/Studio Enterprise Portal User Guide

ER/Studio Enterprise Portal User Guide ER/Studio Enterprise Portal 1.1.1 User Guide Copyright 1994-2009 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco, CA 94111 U.S.A. All rights

More information

Ontology based Web Page Topic Identification

Ontology based Web Page Topic Identification Ontology based Web Page Topic Identification Abhishek Singh Rathore Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Devshri Roy Department of Computer

More information

RDF for Life Sciences

RDF for Life Sciences RDF for Life Sciences Presentation to Oracle Life Sciences User Group June 23, 2004 John Wilbanks World Wide Web Consortium (W3C) What is the W3C? Founded in 1994 by Tim Berners-Lee Develops common protocols

More information

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN Movie Related Information Retrieval Using Ontology Based Semantic Search Tarjni Vyas, Hetali Tank, Kinjal Shah Nirma University, Ahmedabad tarjni.vyas@nirmauni.ac.in, tank92@gmail.com, shahkinjal92@gmail.com

More information

Databases available to ISU researchers:

Databases available to ISU researchers: Databases available to ISU researchers: Table of Contents Web of Knowledge Overview 3 Web of Science 4 Cited Reference Searching 5 Secondary Cited Author Searching 8 Eliminating Self-Citations 9 Saving

More information

Methodologies for Ontology-Based Semantic Translation

Methodologies for Ontology-Based Semantic Translation Methodologies for Ontology-Based Semantic Translation H. Stuckenschmidt, H. Wache, U. Visser, G. Schuster The BUSTER Project: TZI, Group Outline! A Survey of Existing Approaches! The Role on Ontologies!

More information

Web Data Extraction. Craig Knoblock University of Southern California. This presentation is based on slides prepared by Ion Muslea and Kristina Lerman

Web Data Extraction. Craig Knoblock University of Southern California. This presentation is based on slides prepared by Ion Muslea and Kristina Lerman Web Data Extraction Craig Knoblock University of Southern California This presentation is based on slides prepared by Ion Muslea and Kristina Lerman Extracting Data from Semistructured Sources NAME Casablanca

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information