Semantic Annotation of Web Resources Using IdentityRank and Wikipedia
|
|
- Colin Hill
- 5 years ago
- Views:
Transcription
1 Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Norberto Fernández, José M.Blázquez, Luis Sánchez, and Vicente Luque Telematic Engineering Department. Carlos III University of Madrid Summary. In this paper we introduce the IdentityRank algorithm developed to address the problem of named entity disambiguation. It is used for semantic annotation of Web resources taking Wikipedia as knowledge source. 1 Introduction In order to make the Semantic Web [1] vision become a reality, the semantics of the data needs to be described in a computer understandable manner. This process is known in the literature as semantic annotation. In [2] we introduced a system that exploited user queries to generate annotations and used the information generated and maintained by Wikipedia 1 editors as knowledge source for the annotation process. As is indicated in [2], our system had some limitations, for instance, it is a manual system so it requires user collaboration to gather metadata. More automatized semantic annotation approaches seem more appropriate for the annotation of high volumes of information due to scalability reasons. In order to deal with these limitations, we have extended our initial proposal by including an information extraction tool, ANNIE 2, into our system. With such tool, we process the textual contents of the Web resources to be annotated, extracting occurrences of named entities: persons, locations and organizations. Once we have these entities, in order to generate semantic annotations, we need to link each entity (e.g. the person Alonso) with its Wikipedia page. As there are usually several instances that can be associated to a certain entity 3, we need to disambiguate that entity selecting the Wikipedia page that best represents it in the context of the document being annotated. To address this need, we have developed an algorithm for named entity disambiguation based on Google s PageRank [3] that we name IdentityRank (a.k.a. IdRank) For instance for the person Alonso we have among others Fernando Alonso, a Formula 1 driver, Alonso, and José Antonio Alonso a Spanish Minister. Jos%C3%A9 Antonio Alonso Su%C3%A1rez K.M. W egrzyn-wolska and P.S. Szczepaniak (Eds.): Adv. in Intel. Web, ASC 43, pp , springerlink.com c Springer-Verlag Berlin Heidelberg 2007
2 Semantic Annotation of Web Resources 101 The rest of this paper describes the IdRank algorithm and is organized as follows: section 2 describes the IdRank algorithm and shows some results of an initial evaluation, section 3 elaborates on related work and finally, section 4 with concluding remarks and future lines ends this paper. 2 IdRank In this section we will describe in detail the IdRank algorithm and the results of an initial evaluation of that algorithm. Due to the lack of space we will not describe here the PageRank algorithm. The interested reader can find a comprehensive description of that algorithm in [3]. 2.1 IdRank Process The manual annotation process described in [2] required from the user the annotation or disambiguation of the terms in his/her query using concepts represented by Wikipedia pages and the usage of relevance feedback to indicate that a certain Web resource was relevant for that query. By doing so a new annotation was generated linking the Web resource with the person who has generated the annotation and with the concepts in the annotated query. The system has been now extended, so when a user provides a manual annotation as described above, an automatic process starts. This process consist in downloading the Web resource which is going to be annotated and automatically extracting from its contents the entities mentioned there using ANNIE. For each entity, the entity text and the entity type (person, location, organization) are provided. Additionally, the links to Wikipedia pages in the contents are also extracted, because they can be considered as annotations introduced by the page author at authoring time. Now IdRank can run, using the information already available: the manuallygenerated annotation, the links to Wikipedia pages mentioned by the Web resource and the entities. The IdRank process consist of the following steps: Candidate finding. The system finds the URLs of the Wikipedia pages which are candidates to represent each of the input entities. In order to do so, the system uses Yahoo APIs 4 to query Yahoo with a site restriction wikipedia.org as many times as different entities (different pair entity text/entity type) need to be disambiguated. The resultant set of Wikipedia URLs is modified by adding the Wikipedia URLs extracted from the Web resource content and the ones used in the manual annotation. In the case of URLs obtained from queries, we store also the position of each URL in the original Yahoo result set for later usage. Duplicate removal. The algorithm processes the Wikipedia URL set to filter duplicates. One of the difficulties of this filtering process is the fact that there are several Wikipedia URLs representing the same concept (pages in different 4
3 102 N. Fernández et al. languages, redirections). Due to this, the filtering process requires to download and process the candidate Wikipedia pages extracting the language links from that pages and detecting HTTP redirections when downloading a certain page. Once that we know the different Wikipedia URLs that can represent the same concept, we can assign a unique identifier to the concept (a URI) and store the mapping between that URI and the original Wikipedia URLs. So given the original Wikipedia URL set we obtain a set of unique URIs in which each URI, each concept, appears only once. In this page-processing step, we also extract the links between Wikipedia pages, which will be used in next step. Ranking computation. A semantic network is built with the URIs that result from the duplicate removal process. In such network, nodes are concepts represented by URIs. There can be two kinds of links between that nodes: 1. A bidirectional anchor link between node u and node v appears if there is an HTML link between any of the Wikipedia pages that represent the concept u and any of the Wikipedia pages that represent the concept v or vice versa. 2. A bidirectional cooccurrence link between nodes u and v appears if there are former manual annotations defined by this or other users which use the concept u and the concept v in annotating the same Web resource (exploits the information about cooccurrence of concepts in Web resources). We will give weights to these links. The anchor links are handled in the same way as in original PageRank, that is, each node gives the same weight to all of its forward links. The weight of the cooccurrence links, not included in PageRank, is computed using the cooccurrence frequency of the linked concepts. Mathematically, this can be expressed as: α uv = f uv kɛc v f kv Where f uv is the cooccurrence frequency of concepts u and v, that is, the number of Web resources annotated both with u and v divided by the number of Web resources annotated with v. C v is the set of concepts in the semantic network that cooccur with v in at least the annotations of one Web resource apart from the one being analyzed. Apart from link information, the original PageRank algorithm included a vector E used for ranking personalization giving more weight to certain nodes in the network. In IdRank, the values of this vector are computed taking into account the usage in the recent past of the concept u in the annotations of the same user who is defining the current annotation. In that sense the algorithm learns from past user annotations. In practice, the value of the u component of the vector E, E(u), is directly proportional to the number of times the concept u has been used in the last M annotations performed by the user, being M a parameter of the algorithm. (1)
4 Semantic Annotation of Web Resources 103 Taking into account all these contributions we obtain the following equation, adaptation of the original PageRank equation in [3]: R(u) =k A [ 1 (β 1 + β 2 α uv )R(v)] + k E E(u, M) (2) N v v S Where R(u) is the ranking of the node u, S is the set of nodes in the semantic network, β 1 =1/2 if there is an anchor link between v and u or 0 otherwise, β 2 =1/2 if there is a cooccurrence link between u and v (u v) or0otherwise and N v is the number of anchor links of v. In order to control the influence on the final results of each of the components of the algorithm we use two constants k A and k E such as k A + k E =1. We solve this set of equations for each value of u using appropriate numerical methods, as the one described in [3], obtaining as result a weight for each of the candidate concepts in the semantic network. Then we translate back the URIs of the concepts to Wikipedia page URLs using the table generated in the duplicate removal step. Each URL, associated to a certain URI, is assigned the same weight as the algorithm gives to the URI. For each of the original entities the algorithm assigns as Wikipedia representation the candidate whose URL has highest weight. If a certain entity has more than one candidate with maximal weight, the algorithm uses the original Yahoo ranking to decide. 2.2 Evaluation We have carried out a basic experiment to test the behavior of the IdRank algorithm. In that experiment we use a corpus of ten documents that were obtained by querying a repository of news items looking for Alonso and selecting randomly some documents. The entities in these documents were automatically detected, but, in order to avoid the noise introduced in the evaluation of the disambiguation algorithm by the errors in the entity extraction process, the entities were reviewed by two human users. At the end we got 118 entities, 65 of them unique. For each entity, we looked for the entity text in Yahoo with a literal query (among quotes) and a restriction site:wikipedia.org in order to find its candidates. We limited to ten the number of results returned by the search engine and filtered special pages of Wikipedia (like user pages and talk pages) from the result set. Additionally, we have manually reviewed the candidates information in order to check whether ten results per entity were enough for the process, and we got that only in 7 cases (4 different entities) there was not any Wikipedia page in the result set that could be used to represent the real meaning of the entity. We compared our algorithm with two other naive algorithms: one simply assigns to each entity the first result obtained from Yahoo when looking for the entity text in Wikipedia. The other one simply computes the Levenshtein distance between the entity text and the Wikipedia page title using the SimMetrics library 5 and assigns to each entity the Wikipedia page whose title is more similar 5
5 104 N. Fernández et al. to the entity text. Additionally we tested our algorithm in two modes: working with user history (past annotations) and without user history (that is, only using the information on anchor links in the disambiguation process). We build the history by randomly selecting two entities in each document and manually annotating them. The parameters of the algorithm were: k A =0.7, k E =0.3 andm =, that is, we use all the annotations in the history as context for the disambiguation process. We ran the different algorithms over the corpus and then manually checked the correctness of the assignments entity-wikipedia page. The results of these experiments are shown in table 1 where the number of right assignments entity/page are shown. First represents the first result algorithm, Sim the text similarity algorithm, Links IdRank using only the anchor links information and All IdRank taking all the information into account. Table 1. Evaluation results First Sim Links All Related Work There are several approaches in the state of the art dealing with named entity disambiguation. These different approaches can be characterized according to a number of criteria. One of these possible criteria is the context used to disambiguate the entity. Some approaches use the complete document [5] as context. Others use a number words before and after the entity like [10, 9]. Although some approaches use both common words and named entities as context [10], others suggest that better results can be obtained using as context only other named entities [9]. Another criteria is the use of knowledge sources like lexical databases, ontologies, etc. There are approaches that make use of such knowledge sources [4, 8] and approaches that try to cluster the named entities without any reference to an available list of possible instances [10, 9]. Finally we can further calssify the approaches with respect to the disambiguation algorithms used: statistical procedures [7, 10, 9], morphosyntactic analysis [9, 5], algorithms exploiting the information and structure provided by an ontology [8], etc. The usage of a semantic network ranking algorithm, which also takes into account the temporal component of users interests are the main differences of our approach compared with the ones in the state of the art. 4 Conclusions and Future Lines In this paper we introduced the IdRank algorithm developed to address the problem of named entity disambiguation. It is used for semantic annotation of Web resources taking Wikipedia as knowledge source. Though some initial results
6 Semantic Annotation of Web Resources 105 on evaluation are reported, more intensive tests need to be run, for instance in order to measure the influence of the parameters of the algorithm in the final results. Acknowledgements This work has been partially funded by the Spanish Ministry of Education and Science under contract ITACA (TSI C02-01). References 1. Berners-Lee T, Hendler J, Lassila O (2001) The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, May Fernández N, Blázquez JM, Sánchez L, Luque V (2006) Exploiting Wikipedia in Integrating Semantic Annotation with Information Retrieval. In 4th Atlantic Web Intelligence Conference, AWIC Israel, June Page L, Brin S, Motwani R, Winograd T (1999) The PageRank Citation Ranking: Bringing Order to the Web. Stanford Technical Report available online at: Aswani N, Bontcheva K, Cunnigham H (2006) Mining Information for Instance Unification. In 5th International Semantic Web Conference. Ed. Springer, LNCS 4273, pp USA. November Bagga A, Baldwin B (1998) Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In 17th International Conference on Computational Linguistics. Canada. August Ginter F, Boberg J, Ärvinen J, Salakoski T (2004) New Techniques for Disambiguation in Natural Language and their Applications to Biological Text. Journal of Machine Learning Research, 5: , Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K (2004) Two Supervised Learning Approaches for Name Disambiguation in Author Citations. In Joint ACM/IEEE Conference on Digital Libraries. USA. June Hassell J, Aleman-Meza B, Arpinar IB (2006) Ontology-Driven Automatic Entity Disambiguation in Unstructured Text. In 5th International Semantic Web Conference. Ed. Springer, LNCS 4273, pp USA. November Mann GS, Yarowski D (2003) Unsupervised Personal Name Disambiguation. In 7th Conference on Natural Language Learning. Canada. June Pedersen T, Purandare A, Kulkarni A (2005) Name Discrimination by Clustering Similar Contexts. In 6th International Conference on Computational Linguistics and Intelligent Text Processing. Ed. Springer, LNCS Mexico. February 2005.
IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project
IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project Norberto Fernández 1,JoséM.Blázquez 1,LuisSánchez 1, and Ansgar Bernardi 2 1 Carlos III University of Madrid, Leganés, Madrid,
More informationWebTLab: A cooccurrence-based approach to KBP 2010 Entity-Linking task
WebTLab: A cooccurrence-based approach to KBP 2010 Entity-Linking task Norberto Fernández, Jesus A. Fisteus, Luis Sánchez, Eduardo Martín {berto,jaf,luiss,emartin}@it.uc3m.es Web Technologies Laboratory
More informationAnnotation for the Semantic Web During Website Development
Annotation for the Semantic Web During Website Development Peter Plessers and Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,
More informationA Novel Architecture of Ontology based Semantic Search Engine
International Journal of Science and Technology Volume 1 No. 12, December, 2012 A Novel Architecture of Ontology based Semantic Search Engine Paras Nath Gupta 1, Pawan Singh 2, Pankaj P Singh 3, Punit
More informationDocument Retrieval using Predication Similarity
Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research
More informationSemantic Web Search Model for Information Retrieval of the Semantic Data *
Semantic Web Search Model for Information Retrieval of the Semantic Data * Okkyung Choi 1, SeokHyun Yoon 1, Myeongeun Oh 1, and Sangyong Han 2 Department of Computer Science & Engineering Chungang University
More informationUsing an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval
Using an Image-Text Parallel Corpus and the Web for Query Expansion in Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen * Department of Computer Science and Information Engineering National
More informationAn Annotation Tool for Semantic Documents
An Annotation Tool for Semantic Documents (System Description) Henrik Eriksson Dept. of Computer and Information Science Linköping University SE-581 83 Linköping, Sweden her@ida.liu.se Abstract. Document
More informationTSS: A Hybrid Web Searches
410 TSS: A Hybrid Web Searches Li-Xin Han 1,2,3, Gui-Hai Chen 3, and Li Xie 3 1 Department of Mathematics, Nanjing University, Nanjing 210093, P.R. China 2 Department of Computer Science and Engineering,
More informationDiscovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking
Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking P.Ilakiya Abstract The growth of information in the web is too large, so search engine come
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING
More informationSemantic Web Technology Evaluation Ontology (SWETO): A Test Bed for Evaluating Tools and Benchmarking Applications
Wright State University CORE Scholar Kno.e.sis Publications The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) 5-22-2004 Semantic Web Technology Evaluation Ontology (SWETO): A Test
More informationGoNTogle: A Tool for Semantic Annotation and Search
GoNTogle: A Tool for Semantic Annotation and Search Giorgos Giannopoulos 1,2, Nikos Bikakis 1,2, Theodore Dalamagas 2, and Timos Sellis 1,2 1 KDBSL Lab, School of ECE, Nat. Tech. Univ. of Athens, Greece
More informationTerm-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler
Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler Mukesh Kumar and Renu Vig University Institute of Engineering and Technology, Panjab University, Chandigarh,
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationA Technique for Representing Course Knowledge Using Ontologies and Assessing Test Problems
A Technique for Representing Course Knowledge Using Ontologies and Assessing Test Problems Javed Khan and Manas Hardas Kent State University, Kent, Ohio 44240, USA {javed,mhardas}@kent.edu Summary. In
More informationPresented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu
Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day
More informationInternational Journal of Advance Engineering and Research Development. A Review Paper On Various Web Page Ranking Algorithms In Web Mining
Scientific Journal of Impact Factor (SJIF): 4.14 International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Review
More informationSemantic Exploitation of Engineering Models: An Application to Oilfield Models
Semantic Exploitation of Engineering Models: An Application to Oilfield Models Laura Silveira Mastella 1,YamineAït-Ameur 2,Stéphane Jean 2, Michel Perrin 1, and Jean-François Rainaud 3 1 Ecole des Mines
More informationTowards Summarizing the Web of Entities
Towards Summarizing the Web of Entities contributors: August 15, 2012 Thomas Hofmann Director of Engineering Search Ads Quality Zurich, Google Switzerland thofmann@google.com Enrique Alfonseca Yasemin
More informationSemantic Web Mining and its application in Human Resource Management
International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2
More informationSemantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications
Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications WWW2004 (New York, May 22, 2004) Semantic Web Track, Developers Day Boanerges
More informationRelevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline
Relevance Feedback and Query Reformulation Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price IR on the Internet, Spring 2010 1 Outline Query reformulation Sources of relevance
More informationInternational Journal of Scientific & Engineering Research, Volume 7, Issue 2, February ISSN
International Journal of Scientific & Engineering Research, Volume 7, Issue 2, February-2016 1402 An Application Programming Interface Based Architectural Design for Information Retrieval in Semantic Organization
More informationAnatomy of a search engine. Design criteria of a search engine Architecture Data structures
Anatomy of a search engine Design criteria of a search engine Architecture Data structures Step-1: Crawling the web Google has a fast distributed crawling system Each crawler keeps roughly 300 connection
More informationMymory: Enhancing a Semantic Wiki with Context Annotations
Mymory: Enhancing a Semantic Wiki with Context Annotations Malte Kiesel, Sven Schwarz, Ludger van Elst, and Georg Buscher Knowledge Management Department German Research Center for Artificial Intelligence
More informationSubject Classification of Research Papers Based on Interrelationships Analysis
Subject Classification of Research Papers Based on Interrelationships Analysis Mohsen Taheriyan Computer Science Department University of Southern California Los Angeles, CA taheriya@usc.edu ABSTRACT Finding
More informationWEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW
ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer
More informationLinking Entities in Chinese Queries to Knowledge Graph
Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More informationWeb People Search with Domain Ranking
Web People Search with Domain Ranking Zornitsa Kozareva 1, Rumen Moraliyski 2, and Gaël Dias 2 1 University of Alicante, Campus de San Vicente, Spain zkozareva@dlsi.ua.es 2 University of Beira Interior,
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationAssisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites *
Assisting Trustworthiness Based Web Services Selection Using the Fidelity of Websites * Lijie Wang, Fei Liu, Ge Li **, Liang Gu, Liangjie Zhang, and Bing Xie Software Institute, School of Electronic Engineering
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationBoosting Annotated Web Services in SAWSDL
Boosting Annotated Web Services in SAWSDL Antonio J. Roa-Valverde, Jorge Martinez-Gil, and José F. Aldana-Montes University of Málaga, Department of Computer Languages and Computing Sciences Boulevard
More informationMetadata Quality Evaluation of a Repository Based on a Sample Technique
Metadata Quality Evaluation of a Repository Based on a Sample Technique Marc Goovaerts and Dirk Leinders Hasselt University Library, Belgium {marc.goovaerts,dirk.leinders}@uhasselt.be Abstract. In this
More informationLinking Entities in Short Texts Based on a Chinese Semantic Knowledge Base
Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base Yi Zeng, Dongsheng Wang, Tielin Zhang, Hao Wang, and Hongwei Hao Institute of Automation, Chinese Academy of Sciences, Beijing,
More informationMODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS
MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500
More informationCOMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
More informationDeep Web Crawling and Mining for Building Advanced Search Application
Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech
More informationAdaptive and Personalized System for Semantic Web Mining
Journal of Computational Intelligence in Bioinformatics ISSN 0973-385X Volume 10, Number 1 (2017) pp. 15-22 Research Foundation http://www.rfgindia.com Adaptive and Personalized System for Semantic Web
More informationOpen Research Online The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs The Smart Book Recommender: An Ontology-Driven Application for Recommending Editorial Products
More informationIJMIE Volume 2, Issue 8 ISSN:
DISCOVERY OF ALIASES NAME FROM THE WEB N.Thilagavathy* T.Balakumaran** P.Ragu** R.Ranjith kumar** Abstract An individual is typically referred by numerous name aliases on the web. Accurate identification
More informationExploiting Symmetry in Relational Similarity for Ranking Relational Search Results
Exploiting Symmetry in Relational Similarity for Ranking Relational Search Results Tomokazu Goto, Nguyen Tuan Duc, Danushka Bollegala, and Mitsuru Ishizuka The University of Tokyo, Japan {goto,duc}@mi.ci.i.u-tokyo.ac.jp,
More informationOleksandr Kuzomin, Bohdan Tkachenko
International Journal "Information Technologies Knowledge" Volume 9, Number 2, 2015 131 INTELLECTUAL SEARCH ENGINE OF ADEQUATE INFORMATION IN INTERNET FOR CREATING DATABASES AND KNOWLEDGE BASES Oleksandr
More informationFast and Effective System for Name Entity Recognition on Big Data
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam
More informationPopularity Weighted Ranking for Academic Digital Libraries
Popularity Weighted Ranking for Academic Digital Libraries Yang Sun and C. Lee Giles Information Sciences and Technology The Pennsylvania State University University Park, PA, 16801, USA Abstract. We propose
More informationDevelopment of an Ontology-Based Portal for Digital Archive Services
Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw
More informationOn Finding Power Method in Spreading Activation Search
On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova
More informationA Novel Architecture of Ontology-based Semantic Web Crawler
A Novel Architecture of Ontology-based Semantic Web Crawler Ram Kumar Rana IIMT Institute of Engg. & Technology, Meerut, India Nidhi Tyagi Shobhit University, Meerut, India ABSTRACT Finding meaningful
More informationSemantic Web Systems Introduction Jacques Fleuriot School of Informatics
Semantic Web Systems Introduction Jacques Fleuriot School of Informatics 11 th January 2015 Semantic Web Systems: Introduction The World Wide Web 2 Requirements of the WWW l The internet already there
More informationA brief history of Google
the math behind Sat 25 March 2006 A brief history of Google 1995-7 The Stanford days (aka Backrub(!?)) 1998 Yahoo! wouldn't buy (but they might invest...) 1999 Finally out of beta! Sergey Brin Larry Page
More informationBUPT at TREC 2009: Entity Track
BUPT at TREC 2009: Entity Track Zhanyi Wang, Dongxin Liu, Weiran Xu, Guang Chen, Jun Guo Pattern Recognition and Intelligent System Lab, Beijing University of Posts and Telecommunications, Beijing, China,
More informationThe Anatomy of a Large-Scale Hypertextual Web Search Engine
The Anatomy of a Large-Scale Hypertextual Web Search Engine Article by: Larry Page and Sergey Brin Computer Networks 30(1-7):107-117, 1998 1 1. Introduction The authors: Lawrence Page, Sergey Brin started
More informationAn RDF NetAPI. Andy Seaborne. Hewlett-Packard Laboratories, Bristol
An RDF NetAPI Andy Seaborne Hewlett-Packard Laboratories, Bristol andy_seaborne@hp.com Abstract. This paper describes some initial work on a NetAPI for accessing and updating RDF data over the web. The
More informationSemSearch: Refining Semantic Search
SemSearch: Refining Semantic Search Victoria Uren, Yuangui Lei, and Enrico Motta Knowledge Media Institute, The Open University, Milton Keynes, MK7 6AA, UK {y.lei,e.motta,v.s.uren}@ open.ac.uk Abstract.
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationWeb Search Basics. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University
Web Search Basics Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction
More informationA Tagging Approach to Ontology Mapping
A Tagging Approach to Ontology Mapping Colm Conroy 1, Declan O'Sullivan 1, Dave Lewis 1 1 Knowledge and Data Engineering Group, Trinity College Dublin {coconroy,declan.osullivan,dave.lewis}@cs.tcd.ie Abstract.
More informationDeveloping Seamless Discovery of Scholarly and Trade Journal Resources Via OAI and RSS Chumbe, Santiago Segundo; MacLeod, Roddy
Heriot-Watt University Heriot-Watt University Research Gateway Developing Seamless Discovery of Scholarly and Trade Journal Resources Via OAI and RSS Chumbe, Santiago Segundo; MacLeod, Roddy Publication
More informationTime-Surfer: Time-Based Graphical Access to Document Content
Time-Surfer: Time-Based Graphical Access to Document Content Hector Llorens 1,EstelaSaquete 1,BorjaNavarro 1,andRobertGaizauskas 2 1 University of Alicante, Spain {hllorens,stela,borja}@dlsi.ua.es 2 University
More informationThe Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce Website Bo Liu
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) The Application Research of Semantic Web Technology and Clickstream Data Mart in Tourism Electronic Commerce
More informationAnnotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationA Survey on Postive and Unlabelled Learning
A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled
More informationOntology-Driven Automatic Entity Disambiguation in Unstructured Text
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Joseph Hassell, Boanerges Aleman-Meza & I. Budak Arpinar Large Scale Distributed Information Systems (LSDIS) Lab Computer Science Department,
More informationInformation Dissemination in Socially Aware Networks Under the Linear Threshold Model
Information Dissemination in Socially Aware Networks Under the Linear Threshold Model Srinivasan Venkatramanan and Anurag Kumar Department of Electrical Communication Engineering, Indian Institute of Science,
More informationPayola: Collaborative Linked Data Analysis and Visualization Framework
Payola: Collaborative Linked Data Analysis and Visualization Framework Jakub Klímek 1,2,Jiří Helmich 1, and Martin Nečaský 1 1 Charles University in Prague, Faculty of Mathematics and Physics Malostranské
More informationEFFICIENT ALGORITHM FOR MINING ON BIO MEDICAL DATA FOR RANKING THE WEB PAGES
International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 8, August 2017, pp. 1424 1429, Article ID: IJMET_08_08_147 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=8&itype=8
More informationAn Entity Name Systems (ENS) for the [Semantic] Web
An Entity Name Systems (ENS) for the [Semantic] Web Paolo Bouquet University of Trento (Italy) Coordinator of the FP7 OKKAM IP LDOW @ WWW2008 Beijing, 22 April 2008 An ordinary day on the [Semantic] Web
More informationAn Ontology Based Approach for Finding Semantic Similarity between Web Documents
Research Article International Journal of Current Engineering and Technology ISSN 2277-406 203 INPRESSCO. All Rights Reserved. Available at http://inpressco.com/category/ijcet An Ontology Based Approach
More informationSemantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.
Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationA Language Independent Author Verifier Using Fuzzy C-Means Clustering
A Language Independent Author Verifier Using Fuzzy C-Means Clustering Notebook for PAN at CLEF 2014 Pashutan Modaresi 1,2 and Philipp Gross 1 1 pressrelations GmbH, Düsseldorf, Germany {pashutan.modaresi,
More informationTHE GETTY VOCABULARIES TECHNICAL UPDATE
AAT TGN ULAN CONA THE GETTY VOCABULARIES TECHNICAL UPDATE International Working Group Meetings January 7-10, 2013 Joan Cobb Gregg Garcia Information Technology Services J. Paul Getty Trust International
More informationA Machine Learning Approach for Displaying Query Results in Search Engines
A Machine Learning Approach for Displaying Query Results in Search Engines Tunga Güngör 1,2 1 Boğaziçi University, Computer Engineering Department, Bebek, 34342 İstanbul, Turkey 2 Visiting Professor at
More informationEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm Rekha Jain 1, Sulochana Nathawat 2, Dr. G.N. Purohit 3 1 Department of Computer Science, Banasthali University, Jaipur, Rajasthan ABSTRACT
More informationUniversity of Amsterdam at INEX 2010: Ad hoc and Book Tracks
University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationCIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets
CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,
More informationPattern Recognition Using Graph Theory
ISSN: 2278 0211 (Online) Pattern Recognition Using Graph Theory Aditya Doshi Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India Manmohan Jangid Department of
More informationNYU CSCI-GA Fall 2016
1 / 45 Information Retrieval: Personalization Fernando Diaz Microsoft Research NYC November 7, 2016 2 / 45 Outline Introduction to Personalization Topic-Specific PageRank News Personalization Deciding
More informationA Modified Algorithm to Handle Dangling Pages using Hypothetical Node
A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal
More informationPersonalized Document Rankings by Incorporating Trust Information From Social Network Data into Link-Based Measures
Personalized Document Rankings by Incorporating Trust Information From Social Network Data into Link-Based Measures Claudia Hess, Klaus Stein Laboratory for Semantic Information Technology Bamberg University
More informationINFORMATION MANAGEMENT FOR SEMANTIC REPRESENTATION IN RANDOM FOREST
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, August 2015 www.ijcea.com ISSN 2321-3469 INFORMATION MANAGEMENT FOR SEMANTIC REPRESENTATION IN RANDOM FOREST Miss.Priyadarshani
More informationSemantically Enhanced Hypermedia: A First Step
Semantically Enhanced Hypermedia: A First Step I. Alfaro, M. Zancanaro, A. Cappelletti, M. Nardon, A. Guerzoni ITC-irst Via Sommarive 18, Povo TN 38050, Italy {alfaro, zancana, cappelle, nardon, annaguer}@itc.it
More informationIntroduction & Administrivia
Introduction & Administrivia Information Retrieval Evangelos Kanoulas ekanoulas@uva.nl Section 1: Unstructured data Sec. 8.1 2 Big Data Growth of global data volume data everywhere! Web data: observation,
More informationThe PageRank Citation Ranking: Bringing Order to the Web
The PageRank Citation Ranking: Bringing Order to the Web Marlon Dias msdias@dcc.ufmg.br Information Retrieval DCC/UFMG - 2017 Introduction Paper: The PageRank Citation Ranking: Bringing Order to the Web,
More informationUniversity of Alicante at NTCIR-9 GeoTime
University of Alicante at NTCIR-9 GeoTime Fernando S. Peregrino fsperegrino@dlsi.ua.es David Tomás dtomas@dlsi.ua.es Department of Software and Computing Systems University of Alicante Carretera San Vicente
More informationBioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data
BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad
More informationSearching the Deep Web
Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index
More informationNews Filtering and Summarization System Architecture for Recognition and Summarization of News Pages
Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---
More informationScholarly Big Data: Leverage for Science
Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for
More informationWeb Crawling As Nonlinear Dynamics
Progress in Nonlinear Dynamics and Chaos Vol. 1, 2013, 1-7 ISSN: 2321 9238 (online) Published on 28 April 2013 www.researchmathsci.org Progress in Web Crawling As Nonlinear Dynamics Chaitanya Raveendra
More informationCollaborative Content-Based Method for Estimating User Reputation in Online Forums
Collaborative Content-Based Method for Estimating User Reputation in Online Forums Amine Abdaoui 1, Jérôme Azé 1, Sandra Bringay 1 and Pascal Poncelet 1 1 LIRMM B5 UM CNRS, UMR 5506, 161 Rue Ada, 34095
More informationACE 2008: Cross-Document Annotation Guidelines (XDOC)
ACE 2008: Cross-Document Annotation Guidelines (XDOC) Version 1.6 Linguistic Data Consortium http://projects.ldc.upenn.edu/ace/ Overview The objective of the Automatic Content Extraction (ACE) series of
More informationMIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion
MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad
More informationCompetitive Intelligence and Web Mining:
Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More information