6LPSOLI\LQJ'DWD$FFHVV 7KH(QHUJ\'DWD&ROOHFWLRQ3URMHFW. José Luis Ambite University of Southern California Information Science Institute
|
|
- Ethelbert Cooper
- 5 years ago
- Views:
Transcription
1 6LPSOLI\LQJ'DWD$FFHVV 7KH(QHUJ\'DWD&ROOHFWLRQ3URMHFW José Luis Ambite University of Southern California Information Science Institute
2 7KH9LVLRQ $VNWKH*RYHUQPHQW We re thinking of moving to Denver...What are the schools like there? Query results How many people had breast cancer in the area over the past 30 years? Is there an orchestra? An art gallery? How far are the nightclubs? How have property values in the area changed over the past decade? Census Labor Stats
3 7KHSUREOHPDQGWKHVROXWLRQ Q Problem: FedStats has thousands of databases in over 70 Government agencies: X data is overlapping, sometimes near-duplicated X even Government officials cannot find it! Q Solution: Create a system to provide easy standardized access: X need multi-database access engine, X need powerful user interface, X need terminology standardization mechanism.
4 5HVHDUFKFKDOOHQJHV Q How do you scale to incorporate many databases? try to build data models automatically Q How do you integrate their data models, to allow querying across sources and agencies? take a large ontology, link the models into it automatically, and provide query reformulator Q How do you incorporate additional information that is available from text sources? use language processing tools to extract it Q How do you handle footnotes in the databases? extract them from the tables automatically
5 6\VWHP$UFKLWHFWXUH Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data
6 Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data
7 ,QIRUPDWLRQ,QWHJUDWLRQLQ6,06 To enable query access, SIMS needs to: [Arens+96] [Ambite+00] Q address semantic heterogeneity: => describe sources in common domain model Q address syntactic (format) heterogeneity: => standardize access to sources: X Structured (DBMS): Oracle, MS Access X Semistructured: wrappers for html, text, pdf
8 'RPDLQ0RGHOIRU7LPH6HULHV 8QLW 3RLQWRI6DOH 3URGXFW <HDU 3HULRG 0RQWK $UHD 86$ &$ 1< :HHN 0HDVXUHPHQW 'DWH 6XEFODVV 3DUWRI *HQHUDO5HODWLRQ 6RXUFH0DSSLQJ 9DOXH 7LPH 6HULHV )RRWQRWH 7DJ *5HJXODU 7H[W *3UHPLXP *DVROLQH &3, *3UHPLXP 8QOHDGHG 4XDOLW\ 3ULFH */HDGHG *8QOHDGHG 33, 9ROXPH
9 <HDU 'RPDLQ0RGHO 3HULRG 0RQWK $UHD 86$ 1< &$ :HHN 8QLW &HQWVJDOORQ (,$7 0HDVXUHPHQW 'DWH 6XEFODVV 3DUWRI *HQHUDO5HODWLRQ 6RXUFH0DSSLQJ (,$73ULFHVRI5HJXODU*DVROLQHLQ&$ 6DOHVWRHQGXVHUVWKURXJKUHWDLORXWOHWV 0HDVXUHGLQFHQWVSHUJDOORQ 7LPH 6HULHV 9DOXH 3RLQWRI6DOH 5HWDLORXWOHWV )RRWQRWH 7DJ (,$0XOWLVWDWH :UDSSHU *5HJXODU *3UHPLXP 7H[W 3URGXFW *DVROLQH *3UHPLXP 8QOHDGHG 4XDOLW\ 3ULFH */HDGHG *8QOHDGHG 9ROXPH
10 :UDSSHUV provide uniform mechanism for extracting data from semi-structured sources (HTML, text, ) 5HVWDXUDQWVLQ 6DQWD0RQLFD" 1DPH $GGUHVV &KLQRLVRQ0DLQ 0DLQ6W &KDR'DUD 8QLRQ6T «Wrapper
11 :UDSSHU%XLOGLQJ7RROV Q Creating Wrappers (semi-)automatically: X Demonstration-oriented interface enables users to show system what to extract by example X System automatically induces extraction rules X Common extraction engine Q Benefits: X Rapid wrapper creation X Simplified wrapper maintenance Q Fetch.com X Start-up that comercializes the technology [Muslea+99]
12 :UDSSHU%XLOGLQJ7RROV
13 Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data
14 The Core Large ontology (SENSUS) Concepts from glossaries (LKB) Linguistic mapping Logical mapping Domain-specific ontologies (SIMS models) Data sources
15 2QWRORJ\PRGHOVGDWD Q SENSUS ontology (90000 concepts) Q LKB glossary extracts (6000 concepts) X Purpose: provide rich domain knowledge X Sources: Census, EIA, EPA texts Q SIMS domain model (500 dimension conce, series) X Purpose: describe data, query planning and optimization X Sources: database metadata, extracted by hand Q Database sources (25000 time series) X EIA OGIRS: CD with MS Access database; no footnotes Q Web data sources (25000 time series) X EIA Multi-State: time series; formatted text X BLS: 60 time series; web form; 40 have footnotes X EIA Petroleum Supply Monthly: 20 time series in PDF; converted to HTML and then wrapped; all with footnotes X CEC: 12 series; static HTML tables; no footnotes
16 ,6, V 6(1686RQWRORJ\ [Knight+95] [Hovy+00] Q Q Q Q Q Taxonomy, multiple superclass links Approx. 90,000 concepts Top level: Penman Upper Model (ISI) Body: WordNet (Princeton), rearranged KWWSDULDGQHLVLHGXVHQVXVHGF Used at ISI for machine translation, text summarization, database access
17 ([WUDFWLQJ PHWDGDWD IURPWH[W [Klavans+01] Problems: X Proliferation of terms in domain X Agencies define terms differently X Many refer to the same or related entity X Lengthy term definitions often contain important information which is buried Example input: Motor Gasoline Blending Components: Naphthas (e.g., straight-run gasoline, alkylate, reformate, benzene, toluene, xylene) used for blending or compounding into finished motor gasoline. These components include reformulated gasoline blendstock for oxygenate blending (RBOB) but exclude oxygenates (alcohols, ethers), butane, and pentanes plus. Note: Oxygenates are reported as individual components and are included in the total for other hydrocarbons, hydrogens, and oxygenates.
18 /H[LFDO.QRZOHGJH%DVH/.%7RRO Combines statistical and linguistic methods: Q Q Q Q Q trainable Finite State Recognizer identifies topics with high accuracy provides complete coverage useful for any subject area produced over 6,000 gasoline concepts from EIA and BLS
19 /LQNLQJ'RPDLQ0RGHOVWR2QWRORJ\ Match DM concepts against SENSUS concepts 1. Name Match Variations: substrings; reward for contiguous endings; stemming; variant word forms; case sensitivity; etc. Implemented string match from molecular biology for DNA matching 2. Definition Match Variations: stemming; stop words; words or letter trigrams; word weights (itf, tf.idf ), etc. Used IR vector space; cosine measure 3. Dispersal Match Def: Given a cluster of concepts, match them all and then select one match for each concept to form tightest cluster in SENSUS Variations: greedy search; number of candidates; weightings Name match (6 hrs) k-means clustering (5 hrs) Def match (4 hrs) Dispersal match (incompl.) [Hovy+01]
20 Construction phase: Deploy DBs Extend ontol. Integrated ontology - global terminology - source descriptions - integration axioms User Interface - ontology browser - query constructor Domain modeling - DB analysis - text analysis Query processor - reformulation - cost optimization R S T σ User phase: Compose query Sources Access phase: Create DB query Retrieve data
21
22
23
24
25
26
27
28
29 &RQFOXVLRQ Large-scale information integration Q Major research challenges: X Semi-automated domain modeling, information extraction, linkage to ontology X Efficient query processing X Footnote extraction and display Q Major practical challenges: X Understanding users needs: interface, information... X Getting a lot of data into the system
Chapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More information0.1 Knowledge Organization Systems for Semantic Web
0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization
More informationDatabase Foundations. 1-2 Introduction to Databases. Copyright 2015, Oracle and/or its affiliates. All rights reserved.
Database Foundations 1-2 Roadmap You are here About the Course Introduction to Databases Types of Database Models Relational Databases Database Storage Structures Understanding Business Requirements 3
More informationSearching the Deep Web
Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationSemantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96
ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 95-96 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity (Matching)
More informationSemantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95
ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 94-95 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London
ISSUES IN INFORMATION RETRIEVAL Brian Vickery Presentation at ISKO meeting on June 26, 2008 At University College, London NEEDLE IN HAYSTACK MY BACKGROUND Plant chemist, then reports librarian Librarian,
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationSearching the Deep Web
Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index
More informationChapter 12 DATA AND KNOWLEDGE INTEGRATION FOR EGOVERNMENT CHAPTER OVERVIEW. Eduard Hovy
Chapter 12 DATA AND KNOWLEDGE INTEGRATION FOR EGOVERNMENT Eduard Hovy Information Sciences Institute, University of Southern California, Marina del Rey, California, U.S.A. (hovy@isi.edu) CHAPTER OVERVIEW
More informationInformation Discovery, Extraction and Integration for the Hidden Web
Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk
More informationMULTIMEDIA DATABASES OVERVIEW
MULTIMEDIA DATABASES OVERVIEW Recent developments in information systems technologies have resulted in computerizing many applications in various business areas. Data has become a critical resource in
More informationSource Modeling. Kristina Lerman University of Southern California. Based on slides by Mark Carman, Craig Knoblock and Jose Luis Ambite
Source Modeling Kristina Lerman University of Southern California Based on slides by Mark Carman, Craig Knoblock and Jose Luis Ambite Source Modeling vs. Schema Matching Schema Matching Align schemas between
More informationEveryday Activity. Course Content. Objectives of Lecture 13 Search Engine
Web Technologies and Applications Winter 2001 CMPUT 499: Search Engines Dr. Osmar R. Zaïane University of Alberta Everyday Activity We use search engines whenever we look for resources on the Internet
More informationData Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology
Data Mining Associate Professor Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology (1) 2016 2017 Department of CS- DM - UHD 1 Points to Cover Why Do We Need Data
More informationMIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion
MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad
More informationInteroperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research
Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Ian Fore, D.Phil. Associate Director, Biorepository and Pathology Informatics Senior Program
More informationUNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.
UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index
More informationEmerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc.
Emerging Technologies in Knowledge Management By Ramana Rao, CTO of Inxight Software, Inc. This paper provides an overview of a presentation at the Internet Librarian International conference in London
More informationOracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation
Oracle 10G Lindsey M. Pickle, Jr. Senior Solution Specialist Technologies Oracle Corporation Oracle 10g Goals Highest Availability, Reliability, Security Highest Performance, Scalability Problem: Islands
More informationConceptual Integration of Genome Databases via Reduced Autonomy and Domain-Specific Data Models
Conceptual Integration of Genome Databases via Reduced Autonomy and Domain-Specific Data Models Mark Graves Dept of Cell Biology Houston, TX USA email: mgraves@bcm.tmc.edu Ellen Bergeman Charles Lawrence
More informationText Mining: A Burgeoning technology for knowledge extraction
Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.
More informationDatabase Heterogeneity
Database Heterogeneity Lecture 13 1 Outline Database Integration Wrappers Mediators Integration Conflicts 2 1 1. Database Integration Goal: providing a uniform access to multiple heterogeneous information
More informationAcquiring Experience with Ontology and Vocabularies
Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended
More informationGet the most value from your surveys with text analysis
SPSS Text Analysis for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That s
More informationSemantic Interoperability. Being serious about the Semantic Web
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA 1 Being serious about the Semantic Web It is not one person s ontology It is not several people s common
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationInformation Integration
.. Dennis Sun DATA 401: Data Science Alexander Dekhtyar.. Information Integration Data Integration. Data Integration is the process of combining data residing in different sources and providing the user
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationCLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL
STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More information{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the Same Challenge
Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 11-5-2006 {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance}
More information<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany
Information Systems & University of Koblenz Landau, Germany Semantic Search examples: Swoogle and Watson Steffen Staad credit: Tim Finin (swoogle), Mathieu d Aquin (watson) and their groups 2009-07-17
More informationACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE
June 30, 2012 San Diego Convention Center ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE Stuart Laurie, Senior Consultant #SPSSAN Agenda 1. Challenges 2. What comes out of the box
More informationUsing Ontologies for Data and Semantic Integration
Using Ontologies for Data and Semantic Integration Monica Crubézy Stanford Medical Informatics, Stanford University ~~ November 4, 2003 Ontologies Conceptualize a domain of discourse, an area of expertise
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationMSc Advanced Computer Science School of Computer Science The University of Manchester
PROGRESS REPORT Ontology-Based Technical Document Retrieval System Ruvin Yusubov Supervisor: Professor Ulrike Sattler MSc Advanced Computer Science School of Computer Science The University of Manchester
More informationCraig Knoblock University of Southern California Joint work with Rahul Parundekar and José Luis Ambite
Building Semantic Descriptions of Linked Data Craig Knoblock University of Southern California Joint work with Rahul Parundekar and José Luis Ambite Linked Open Data and Services Vast collection of interlinked
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationOracle Insurance IStream
Oracle Insurance IStream IStream Document Manager Glossary Release 6.3 E15015-01 June 2009 Copyright Copyright 2009, Oracle and/or its affiliates. All rights reserved. Primary Authors: Andrew Brooke and
More informationLecture 8 May 7, Prabhakar Raghavan
Lecture 8 May 7, 2001 Prabhakar Raghavan Clustering documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics Given the set of docs from the results of
More informationAn Algorithm for Merging and Aligning Ontologies: Automation and Tool Support
An Algorithm for Merging and Aligning Ontologies: Automation and Tool Support Natalya Fridman Noy and Mark A. Musen From: AAAI Technical Report WS-99-13. Compilation copyright 1999, AAAI (www.aaai.org).
More informationContent Interoperability Strategy
Content Interoperability Strategy 28th September 2005 Antoine Rizk 1 Presentation plan Introduction Context Objectives Input sources Semantic interoperability EIF Definition Semantic assets The European
More informationTHE ARIADNE APPROACH TO WEB-BASED INFORMATION INTEGRATION
International Journal of Cooperative Information Systems c World Scientific Publishing Company THE ARIADNE APPROACH TO WEB-BASED INFORMATION INTEGRATION CRAIG A. KNOBLOCK, STEVEN MINTON, JOSE LUIS AMBITE,
More informationRe-designing Online Terminology Resources for German Grammar
Re-designing Online Terminology Resources for German Grammar Project Report Karolina Suchowolec, Christian Lang, and Roman Schneider Institut für Deutsche Sprache (IDS), Mannheim, Germany {suchowolec,
More informationXyleme Studio Data Sheet
XYLEME STUDIO DATA SHEET Xyleme Studio Data Sheet Rapid Single-Source Content Development Xyleme allows you to streamline and scale your content strategy while dramatically reducing the time to market
More informationText Mining. Representation of Text Documents
Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,
More informationBIBL NEEDS REVISION INTRODUCTION
BIBL NEEDS REVISION FCSM Statistical Policy Seminar Session Title: Integrating Electronic Systems for Disseminating Statistics Paper Title: Interacting with Tabular Data Through the World Wide Web Paper
More informationAn overview of Graph Categories and Graph Primitives
An overview of Graph Categories and Graph Primitives Dino Ienco (dino.ienco@irstea.fr) https://sites.google.com/site/dinoienco/ Topics I m interested in: Graph Database and Graph Data Mining Social Network
More informationSnehal Thakkar, Craig A. Knoblock, Jose Luis Ambite, Cyrus Shahabi
From: AAAI Technical Report WS-02-07. Compilation copyright 2002, AAAI (www.aaai.org). All rights reserved. Dynamically Composing Web Services from On-line Sources Snehal Thakkar, Craig A. Knoblock, Jose
More informationText Document Clustering Using DPM with Concept and Feature Analysis
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
More informationOntology Extraction from Heterogeneous Documents
Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg
More informationToday s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan
Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics
More informationDEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES
DEC. 1-5 Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES Monday Overview of Databases A web search engine is a large database containing information about Web pages that have been registered
More informationCSE 344 MAY 7 TH EXAM REVIEW
CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice
More informationText mining tools for semantically enriching the scientific literature
Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the
More informationNational Association of Regional Councils: SICoP DRM 2.0 Pilot
National Association of Regional Councils: SICoP DRM 2.0 Pilot Brand Niemann (US EPA), Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee (BPC), Federal CIO Council
More informationSTS Infrastructural considerations. Christian Chiarcos
STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)
More informationAnswer Extraction. NLP Systems and Applications Ling573 May 13, 2014
Answer Extraction NLP Systems and Applications Ling573 May 13, 2014 Answer Extraction Goal: Given a passage, find the specific answer in passage Go from ~1000 chars -> short answer span Example: Q: What
More informationA Survey on Unsupervised Extraction of Product Information from Semi-Structured Sources
A Survey on Unsupervised Extraction of Product Information from Semi-Structured Sources Abhilasha Bhagat, ME Computer Engineering, G.H.R.I.E.T., Savitribai Phule University, pune PUNE, India Vanita Raut
More informationTaxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company
Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation
More informationP to ~o, c E E' D' INl -a~s
02- -4S The 6th World Muiticonference on Sy,sttemics, Cyib.emetics and [nformatics July 14-18,2002 - Orlando, Florida, USA P to ~o, c E E' D' INl -a~s ~ - - \.. t ~ 1. \.. t. ~ \ Volume VII Information
More informationLearning High Accuracy Rules for Object Identification
Learning High Accuracy Rules for Object Identification Sheila Tejada Wednesday, December 12, 2001 Committee Chair: Craig A. Knoblock Committee: Dr. George Bekey, Dr. Kevin Knight, Dr. Steven Minton, Dr.
More informationWide Area Query Systems The Hydra of Databases
Wide Area Query Systems The Hydra of Databases Stonebraker et al. 96 Gribble et al. 02 Zachary G. Ives University of Pennsylvania January 21, 2003 CIS 650 Data Sharing and the Web The Vision A World Wide
More informationDataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom
Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not
More informationmanagement, and enables businesses to effectively create, deploy, and manage Internet
Public Page 1 of 11 The EchoBus CMS enables your staff to create dynamic websites by providing with extensive features for website development and content management, and enables businesses to effectively
More informationLarge Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood
Large Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood University of Sheffield, UK 1 Burning questions you may have... In the last 3 years, which female
More information1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM
1 1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL In the context of federation of repositories of Semantic Interoperability s, a number of entities are relevant. The primary entities to be described by ADMS are the
More informationAAAI 2018 Tutorial Building Knowledge Graphs. Craig Knoblock University of Southern California
AAAI 2018 Tutorial Building Knowledge Graphs Craig Knoblock University of Southern California Wrappers for Web Data Extraction Extracting Data from Semistructured Sources NAME Casablanca Restaurant STREET
More informationInformation Retrieval Strategies. Presented by Patrick Hoey
Information Retrieval Strategies Presented by Patrick Hoey 1 Information Retrieval Strategies For Keyword-Based Searches Information is stored in various types of media, ranging from customer account databases
More informationData Mining & Data Warehouse
Data Mining & Data Warehouse Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Information Technology 2016 2017 (1) Points to Cover Problem:
More informationLightweight Transformation of Tabular Open Data to RDF
Proceedings of the I-SEMANTICS 2012 Posters & Demonstrations Track, pp. 38-42, 2012. Copyright 2012 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
More information8/19/13. Computational problems. Introduction to Algorithm
I519, Introduction to Introduction to Algorithm Yuzhen Ye (yye@indiana.edu) School of Informatics and Computing, IUB Computational problems A computational problem specifies an input-output relationship
More informationPresented by Kit Na Goh
Developing A Geo-Spatial Search Tool Using A Relational Database Implementation of the FGDC CSDGM Model Presented by Kit Na Goh Introduction Executive Order 12906 was issued on April 13, 1994 with the
More informationPublishing Linked Statistical Data: Aragón, a case study.
Publishing Linked Statistical Data: Aragón, a case study. Oscar Corcho 1, Idafen Santana-Pérez 1, Hugo Lafuente 2, David Portolés 3, César Cano 4, Alfredo Peris 4, and José María Subero 4 1 Ontology Engineering
More informationCraig Knoblock University of Southern California. These slides are based in part on slides from Sheila Tejada and Misha Bilenko
Record Linkage Craig Knoblock University of Southern California These slides are based in part on slides from Sheila Tejada and Misha Bilenko Craig Knoblock University of Southern California 1 Record Linkage
More informationMaximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009
Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images
More informationGovernmental Statistical Data on the Web: A case study of FedStats
Governmental Statistical Data on the Web: A case study of FedStats Irina Ceaparu University of Maryland, Department of Computer Science, Human Computer Interaction Laboratory Revised: December st, 00 Introduction
More informationRiMOM Results for OAEI 2008
RiMOM Results for OAEI 2008 Xiao Zhang 1, Qian Zhong 1, Juanzi Li 1, Jie Tang 1, Guotong Xie 2 and Hanyu Li 2 1 Department of Computer Science and Technology, Tsinghua University, China {zhangxiao,zhongqian,ljz,tangjie}@keg.cs.tsinghua.edu.cn
More informationPublishing CitiSense Data: Privacy Concerns and Remedies
Publishing CitiSense Data: Privacy Concerns and Remedies Kapil Gupta Advisor : Prof. Bill Griswold 1 Location Based Services Great utility of location based services data traffic control, mobility management,
More informationThe InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure.
Metadata Integration Appliance Times have changed and here is modern solution that delivers instant return on your investment. The InfoLibrarian Metadata Appliance Automated Cataloging System for your
More informationOpus: University of Bath Online Publication Store
Patel, M. (2004) Semantic Interoperability in Digital Library Systems. In: WP5 Forum Workshop: Semantic Interoperability in Digital Library Systems, DELOS Network of Excellence in Digital Libraries, 2004-09-16-2004-09-16,
More informationMetaNews: An Information Agent for Gathering News Articles On the Web
MetaNews: An Information Agent for Gathering News Articles On the Web Dae-Ki Kang 1 and Joongmin Choi 2 1 Department of Computer Science Iowa State University Ames, IA 50011, USA dkkang@cs.iastate.edu
More informationChapter 4 Research Prototype
Chapter 4 Research Prototype According to the research method described in Chapter 3, a schema and ontology-assisted heterogeneous information integration prototype system is implemented. This system shows
More informationThe New Document Digital Polymorphic Ubiquitous Actionable Patrick P. Bergmans University of Ghent
X X The New Document Digital Polymorphic Ubiquitous Actionable Patrick P. Bergmans University of Ghent The Traditional Document Documents have been around for thousands of years The Bible is a document
More informationLinked Data in Translation-Kits
Linked Data in Translation-Kits FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation Slides and code are available at: https://copy.com/ngbco13l9yls This presentation was made possible by The main
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationExploiting Semantics Where We Find Them
Vrije Universiteit Amsterdam 19/06/2018 Exploiting Semantics Where We Find Them A Bottom-up Approach to the Semantic Web Prof. Dr. Christian Bizer Bizer: Exploiting Semantics Where We Find Them. VU Amsterdam,
More informationER/Studio Enterprise Portal User Guide
ER/Studio Enterprise Portal 1.1.1 User Guide Copyright 1994-2009 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco, CA 94111 U.S.A. All rights
More informationOntology based Web Page Topic Identification
Ontology based Web Page Topic Identification Abhishek Singh Rathore Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Devshri Roy Department of Computer
More informationRDF for Life Sciences
RDF for Life Sciences Presentation to Oracle Life Sciences User Group June 23, 2004 John Wilbanks World Wide Web Consortium (W3C) What is the W3C? Founded in 1994 by Tim Berners-Lee Develops common protocols
More informationIJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN
Movie Related Information Retrieval Using Ontology Based Semantic Search Tarjni Vyas, Hetali Tank, Kinjal Shah Nirma University, Ahmedabad tarjni.vyas@nirmauni.ac.in, tank92@gmail.com, shahkinjal92@gmail.com
More informationDatabases available to ISU researchers:
Databases available to ISU researchers: Table of Contents Web of Knowledge Overview 3 Web of Science 4 Cited Reference Searching 5 Secondary Cited Author Searching 8 Eliminating Self-Citations 9 Saving
More informationMethodologies for Ontology-Based Semantic Translation
Methodologies for Ontology-Based Semantic Translation H. Stuckenschmidt, H. Wache, U. Visser, G. Schuster The BUSTER Project: TZI, Group Outline! A Survey of Existing Approaches! The Role on Ontologies!
More informationWeb Data Extraction. Craig Knoblock University of Southern California. This presentation is based on slides prepared by Ion Muslea and Kristina Lerman
Web Data Extraction Craig Knoblock University of Southern California This presentation is based on slides prepared by Ion Muslea and Kristina Lerman Extracting Data from Semistructured Sources NAME Casablanca
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More information