Limitations of XPath & XQuery in an Environment with Diverse Schemes
|
|
- Theodore Butler
- 6 years ago
- Views:
Transcription
1 Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany Martin Theobald: Automatic Classification of XML Data 1 Limitations of XPath & XQuery in an Environment with Diverse Schemes. <inproceedings key="conf/icde/bargalw02"> <author>roger S. Barga</author> <author>david B. Lomet</author> <author>gerhard Weikum</author> <title>recovery Guarantees for General Multi-Tier Applications.</title> <year>2002</year> <booktitle>icde</booktitle> </inproceedings> //proceedings[contains(., "icde")]/title[contains(., Recovery")]/parent::* 0 Results. //title[contains(., Recovery")]/parent::* Results. DBLP 6/23/2003 Martin Theobald: Automatic Classification of XML Data 2 1
2 Automatic Classification helps. Proceedings 12 Results: SIGMOD VLDB ICDE P2P XML Databases Recovery <inproceedings key="conf/icde/bargalw02"> <author>roger S. Barga</author> <author>david B. Lomet</author> <author>gerhard Weikum</author> <title>recovery Guarantees for General Multi- Tier Applications.</title> <year>2002</year> <booktitle>icde</booktitle> </inproceedings>.. 6/23/2003 Martin Theobald: Automatic Classification of XML Data 3 Challenges in XML Classification Exploit annotation and structure Exploit ontological knowledge on sparse and/or heterogeneous training data Mapping of tags (and text terms) to semantic concepts In-document word sense disambiguation Quantification of concept similarities 6/23/2003 Martin Theobald: Automatic Classification of XML Data 4 2
3 Using Structure and Ontological Knowledge for Classification Tokens with Context Nodes XML Training Documents Structure-aware Document Analyzer XML Test Document Structural Features Ontology Service Disambiguation and Mapping onto Concepts Feature Selection using MI Incremental Mapping Feature Vectors SVM Classifier Tag- Term Pairs Element Paths & Twigs Ontology Database with Dice Similarities (based on WordNet) Topic- Specific Feature Spaces Large Document Collection (Focused Crawling) as Basis for Concept Similarity Estimation wrt. Natural Term Correlations 6/23/2003 Martin Theobald: Automatic Classification of XML Data 5 Feature-Selection & Term Weighting no Database Core Web IR TOPICS yes Semistr. Data no yes no Data Mining XML Linear Support Vector Machines for binary classifications in the topic tree Topic-specific feature spaces to support binary classification steps Mutual Information (MI) yields ranking for the most discriminating features per topic (aka. Kullback-Leibler-Divergence) P[ Xi cj] MI( Xi, cj): = P[ Xi cj]log2 P[ Xi] P[ cj] Term weights in classic TF*IDF IDF computed on element frequencies 6/23/2003 Martin Theobald: Automatic Classification of XML Data 6 3
4 Exploiting Annotation: Tag-Term Pairs Structure-aware features for more precise document representation Interpret (tag, term) pairs as concept-value pairs in the spirit of a database schema <car> <make>audi</make> <type>a4</type> <year>98</year> <price>10.000</price> </car> make$audi, type$a4, year$98, price$ car$make$audi, car$type$a4, car$year$98, car$price$ /23/2003 Martin Theobald: Automatic Classification of XML Data 7 Exploiting Structure: Element Paths and Twigs car Extension of the feature space by structural patterns Paths & Twigs Preserve or disregard element ordering make year price Different feature types (tag-term pairs & twigs) are mapped to distinct dimensions in the vector space car$make$year car$year$price car$make$price Scalability and noise reduction through feature selection (MI) under an integrated SVM model 6/23/2003 Martin Theobald: Automatic Classification of XML Data 8 4
5 Exploiting Ontological Knowledge WordNet: Directed and weighted ontology graph capturing Hypernyms Hyponyms Holonyms 0.8 s 1 [wheeled vehicle] s 2 [motor vehicle] s 3 [car, automobile, wagon, motorcar] sim(s 3,s 4 ) = ½( )? s 5 [wheel] s 4 [truck, motortruck] Quantified relationships based on estimated concept similarities: 2 Dice coefficient: dice( s1, s2) = df 6/23/2003 Martin Theobald: Automatic Classification of XML Data df ( senses( s1) senses( s2) ) ( senses( s1) ) + df ( senses( s2) ) Word Sense Disambiguation Compare term context con( t k ) with synset context con( s j ) using cosine measure Synset context includes hypernyms, hyponyms, and holonyms plus WordNet descriptions Infer semantics from current context rather than stipulate it 6/23/2003 Martin Theobald: Automatic Classification of XML Data 10 5
6 Incremental Mapping for Classification For any unknown concept s in a test document d do: Replace s with its closest match s from the training data Adjust term weight of s in d by concept similarity sim(s, s ) s [ sport utility vehicle, Training Feature Selection S.U.V. ] Data 0.21 using MI s [ jeep, landrover ] Test doc Problem: Possible loss of feature correlations that the SVM has learned No feature independency for SVM Reconsider dice(s, s ) with restrictive threshold Replace concept s only if s is strongly correlated to s, otherwise skip s 6/23/2003 Martin Theobald: Automatic Classification of XML Data 11 Experimental Evaluation: Internet Movie Database (IMDB) Training with very view features for Action vs. Western Homogenous, but rich structure with varying amounts of content Tag-term pairs (95%) plus twigs (5%) using MI Ontology lookups on tags only F measure Tag-Term Pairs & Twigs using tf*idf for Elements Text Features using tf*idf for Documents 1 F= precision recall # Features per topic 6/23/2003 Martin Theobald: Automatic Classification of XML Data 12 6
7 Summary Concept-based classification boosts classification results Detection of synonyms Incremental mapping of unknown concepts Structure-aware features offer a more precise document representation for XML Application area: Training on small, user-specific specific topic directories, e.g., bookmarks Classification of heterogeneous data sources 6/23/2003 Martin Theobald: Automatic Classification of XML Data 13 Future Work More robust term-to to-sense mapping Improved disambiguation of word senses Better awareness of feature correlations (in incremental term-to to-concept mapping) Topic-specific ontologies Is-instance instance-of relationships Integration into large web applications, e.g., focused crawling 6/23/2003 Martin Theobald: Automatic Classification of XML Data 14 7
8 Questions? 6/23/2003 Martin Theobald: Automatic Classification of XML Data 15 8
A Comprehensive Analysis of using Semantic Information in Text Categorization
A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department
More informationAutomatic Construction of WordNets by Using Machine Translation and Language Modeling
Automatic Construction of WordNets by Using Machine Translation and Language Modeling Martin Saveski, Igor Trajkovski Information Society Language Technologies Ljubljana 2010 1 Outline WordNet Motivation
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationClassification and Intelligent Search on Information in XML
Classification and Intelligent Search on Information in XML Norbert Fuhr University of Dortmund, Germany fuhr@cs.uni-dortmund.de Gerhard Weikum University of the Saarland, Germany weikum@cs.uni-sb.de 1
More informationScienceDirect. Enhanced Associative Classification of XML Documents Supported by Semantic Concepts
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 194 201 International Conference on Information and Communication Technologies (ICICT 2014) Enhanced Associative
More informationSCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR
SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG
More informationOptimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.
Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationReading group on Ontologies and NLP:
Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.
More informationWordNet-based User Profiles for Semantic Personalization
PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationWEIGHTING QUERY TERMS USING WORDNET ONTOLOGY
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationAS DIGITAL cameras become more affordable and widespread,
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008 407 Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation Jianping
More informationFrom Dynamic to Unbalanced Ontology Matching
From Dynamic to Unbalanced Ontology Matching Jie Tang Knowledge Engineering Group, Dept. of Computer Science and Technology Tsinghua University May 22 th 2009 1 What is Ontology Matching? 本体 O 本体 2 1 O
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion
More informationStructural Feedback for Keyword-Based XML Retrieval
Structural Feedback for Keyword-Based XML Retrieval Ralf Schenkel and Martin Theobald Max-Planck-Institut für Informatik, Saarbrücken, Germany {schenkel, mtb}@mpi-inf.mpg.de Abstract. Keyword-based queries
More informationThe Security Role for Content Analysis
The Security Role for Content Analysis Jim Nisbet Founder, Tablus, Inc. November 17, 2004 About Us Tablus is a 3 year old company that delivers solutions to provide visibility to sensitive information
More informationLearning Queries for Relational, Semi-structured, and Graph Databases
Learning Queries for Relational, Semi-structured, and Graph Databases Radu Ciucanu University of Lille & INRIA, France Supervised by Angela Bonifati & S lawek Staworko SIGMOD 13 PhD Symposium June 23,
More informationTopX AdHoc and Feedback Tasks
TopX AdHoc and Feedback Tasks Martin Theobald, Andreas Broschart, Ralf Schenkel, Silvana Solomon, and Gerhard Weikum Max-Planck-Institut für Informatik Saarbrücken, Germany http://www.mpi-inf.mpg.de/departments/d5/
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationProcessing Structural Constraints
SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationRanking Algorithms For Digital Forensic String Search Hits
DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,
More informationTABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION
vi TABLE OF CONTENTS ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION iii xii xiii xiv 1 INTRODUCTION 1 1.1 WEB MINING 2 1.1.1 Association Rules 2 1.1.2 Association Rule Mining 3 1.1.3 Clustering
More informationPTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks
PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationA Semantic Role Repository Linking FrameNet and WordNet
A Semantic Role Repository Linking FrameNet and WordNet Volha Bryl, Irina Sergienya, Sara Tonelli, Claudio Giuliano {bryl,sergienya,satonelli,giuliano}@fbk.eu Fondazione Bruno Kessler, Trento, Italy Abstract
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationLearning Similarity Metrics for Event Identification in Social Media. Hila Becker, Luis Gravano
Learning Similarity Metrics for Event Identification in Social Media Hila Becker, Luis Gravano Columbia University Mor Naaman Rutgers University Event Content in Social Media Sites Event Content in Social
More informationData-Mining Algorithms with Semantic Knowledge
Data-Mining Algorithms with Semantic Knowledge Ontology-based information extraction Carlos Vicient Monllaó Universitat Rovira i Virgili December, 14th 2010. Poznan A Project funded by the Ministerio de
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationTagonto. Tagonto Project is an attempt of nearing two far worlds Tag based systems. Almost completely unstructured and semantically empty
Tagonto is an attempt of nearing two far worlds Tag based systems Almost completely unstructured and semantically empty Ontologies Strongly structured and semantically significant Taking the best of both
More informationImproving Collection Selection with Overlap Awareness in P2P Search Engines
Improving Collection Selection with Overlap Awareness in P2P Search Engines Matthias Bender Peter Triantafillou Gerhard Weikum Christian Zimmer and Improving Collection Selection with Overlap Awareness
More informationSAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering
SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering 1 G. Loshma, 2 Nagaratna P Hedge 1 Jawaharlal Nehru Technological University, Hyderabad 2 Vasavi
More informationMovie Recommender System - Hybrid Filtering Approach
Chapter 7 Movie Recommender System - Hybrid Filtering Approach Recommender System can be built using approaches like: (i) Collaborative Filtering (ii) Content Based Filtering and (iii) Hybrid Filtering.
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationKikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML
Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University
More informationWebSci and Learning to Rank for IR
WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles
More informationSocial Search Networks of People and Search Engines. CS6200 Information Retrieval
Social Search Networks of People and Search Engines CS6200 Information Retrieval Social Search Social search Communities of users actively participating in the search process Goes beyond classical search
More informationDocument Clustering for Mediated Information Access The WebCluster Project
Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at
More informationOntology-Based Web Query Classification for Research Paper Searching
Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of
More informationVector Space Models: Theory and Applications
Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du
More informationLevel of analysis Finding Out About Chapter 3: 25 Sept 01 R. K. Belew
Overview The fascination with the subliminal, the camouflaged, and the encrypted is ancient. Getting a computer to munch away at long strings of letters from the Old Testament is not that different from
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More informationProcessing Rank-Aware Queries in P2P Systems
Processing Rank-Aware Queries in P2P Systems Katja Hose, Marcel Karnstedt, Anke Koch, Kai-Uwe Sattler, and Daniel Zinn Department of Computer Science and Automation, TU Ilmenau P.O. Box 100565, D-98684
More informationA hybrid method to categorize HTML documents
Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationFliX: A Flexible Framework for Indexing Complex XML Document Collections
FliX: A Flexible Framework for Indexing Complex XML Document Collections Ralf Schenkel Max-Planck-Institut für Informatik Saarbrücken, Germany http://www.mpi-sb.mpg.de/units/ag5/ schenkel@mpi-sb.mpg.de
More informationImpact of Term Weighting Schemes on Document Clustering A Review
Volume 118 No. 23 2018, 467-475 ISSN: 1314-3395 (on-line version) url: http://acadpubl.eu/hub ijpam.eu Impact of Term Weighting Schemes on Document Clustering A Review G. Hannah Grace and Kalyani Desikan
More informationLearning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search
1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history
More informationConceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationQuestion Answering Approach Using a WordNet-based Answer Type Taxonomy
Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering
More informationAspEm: Embedding Learning by Aspects in Heterogeneous Information Networks
AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research
More informationA Document-centered Approach to a Natural Language Music Search Engine
A Document-centered Approach to a Natural Language Music Search Engine Peter Knees, Tim Pohle, Markus Schedl, Dominik Schnitzer, and Klaus Seyerlehner Dept. of Computational Perception, Johannes Kepler
More informationSemantic-Based Information Retrieval for Java Learning Management System
AENSI Journals Australian Journal of Basic and Applied Sciences Journal home page: www.ajbasweb.com Semantic-Based Information Retrieval for Java Learning Management System Nurul Shahida Tukiman and Amirah
More informationBuilding Search Applications
Building Search Applications Lucene, LingPipe, and Gate Manu Konchady Mustru Publishing, Oakton, Virginia. Contents Preface ix 1 Information Overload 1 1.1 Information Sources 3 1.2 Information Management
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationSCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS
SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu Department of Computer Science and Engineering, Arizona State University, Tempe, AZ fagelgi@asu.edu,
More informationContent-based Dimensionality Reduction for Recommender Systems
Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender
More informationCHAPTER 3 DYNAMIC NOMINAL LANGUAGE MODEL FOR INFORMATION RETRIEVAL
60 CHAPTER 3 DYNAMIC NOMINAL LANGUAGE MODEL FOR INFORMATION RETRIEVAL 3.1 INTRODUCTION Information Retrieval (IR) models produce ranking functions which assign scores to documents regarding a given query
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationSemantics-Based Resource Discovery in Large-Scale Grids
Semantics-Based Resource Discovery in Large-Scale Grids Juan Li 1, Samee Ullah Khan 1,*, and Nasir Ghani 2 1 North Dakota State University {j.li, samee.khan}ndsu.edu 2 University of New Mexico nghani@ece.unm.edu
More informationEnhancing Automatic Wordnet Construction Using Word Embeddings
Enhancing Automatic Wordnet Construction Using Word Embeddings Feras Al Tarouti University of Colorado Colorado Springs 1420 Austin Bluffs Pkwy Colorado Springs, CO 80918, USA faltarou@uccs.edu Jugal Kalita
More informationSense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm
ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using
More informationSYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT
SYSTEMS FOR NON STRUCTURED INFORMATION MANAGEMENT Prof. Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION SEARCH AND RETRIEVAL Inf. retrieval 1 PRESENTATION SCHEMA GOALS AND
More informationMEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI
MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationVECTOR SPACE CLASSIFICATION
VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use
More informationSemantic Overlay Networks
Semantic Overlay Networks Arturo Crespo and Hector Garcia-Molina Write-up by Pavel Serdyukov Saarland University, Department of Computer Science Saarbrücken, December 2003 Content 1 Motivation... 3 2 Introduction
More informationA REALM-BASED QUESTION ANSWERING SYSTEM USING PROBABILISTIC MODELING
A REALM-BASED QUESTION ANSWERING SYSTEM USING PROBABILISTIC MODELING By CLINT PAZHAYIDAM GEORGE A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
More informationState of the Art and Trends in Search Engine Technology. Gerhard Weikum
State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is
More informationOntology Matching with CIDER: Evaluation Report for the OAEI 2008
Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationIMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM
IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT
More informationWeb Information Retrieval using WordNet
Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT
More informationMeasuring Similarity to Detect
Measuring Similarity to Detect Qualified Links Xiaoguang Qi, Lan Nie, and Brian D. Davison Dept. of Computer Science & Engineering Lehigh University Introduction Approach Experiments Discussion & Conclusion
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Graph Data & Introduction to Information Retrieval Huan Sun, CSE@The Ohio State University 11/21/2017 Slides adapted from Prof. Srinivasan Parthasarathy @OSU 2 Chapter 4
More informationTopX & XXL at INEX 2005
TopX & XXL at INEX 2005 Martin Theobald, Ralf Schenkel, and Gerhard Weikum Max-Planck Institute für Informatik, Saarbrücken, Germany {mtb,schenkel,weikum}@mpi-inf.mpg.de Abstract. We participated with
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationOntology Based Search Engine
Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India
More informationInformation Retrieval
Natural Language Processing SoSe 2014 Information Retrieval Dr. Mariana Neves June 18th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing
More informationQUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL
QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationIO-Top-k at TREC 2006: Terabyte Track
IO-Top-k at TREC 2006: Terabyte Track Holger Bast Debapriyo Majumdar Ralf Schenkel Martin Theobald Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {bast,deb,schenkel,mtb,weikum}@mpi-inf.mpg.de
More informationCoXML: A Cooperative XML Query Answering System
CoXML: A Cooperative XML Query Answering System Shaorong Liu 1 and Wesley W. Chu 2 1 IBM Silicon Valley Lab, San Jose, CA, 95141, USA shaorongliu@gmail.com 2 UCLA Computer Science Department, Los Angeles,
More informationSemantic Indexing of Technical Documentation
Semantic Indexing of Technical Documentation Samaneh CHAGHERI 1, Catherine ROUSSEY 2, Sylvie CALABRETTO 1, Cyril DUMOULIN 3 1. Université de LYON, CNRS, LIRIS UMR 5205-INSA de Lyon 7, avenue Jean Capelle
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More informationA Survey on Keyword Diversification Over XML Data
ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,
More informationWeb Document Clustering using Semantic Link Analysis
Web Document Clustering using Semantic Link Analysis SOMJIT ARCH-INT, Ph.D. Semantic Information Technology Innovation (SITI) LAB Department of Computer Science, Faculty of Science, Khon Kaen University,
More informationA FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS
A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:
More informationOntology Based Prediction of Difficult Keyword Queries
Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Skiing Seminar Information Retrieval 2010/2011 Introduction to Information Retrieval Prof. Ulrich Müller-Funk, MScIS Andreas Baumgart and Kay Hildebrand Agenda 1 Boolean
More information