A Semantic Similarity Measure for Linked Data: An Information Content-Based Approach

Size: px
Start display at page:

Download "A Semantic Similarity Measure for Linked Data: An Information Content-Based Approach"

Transcription

1 A Semantic Similarity Measure for Linked Data: An Information Content-Based Approach Rouzbeh Meymandpour *, Joseph G. Davis School of Information Technologies, The University of Sydney, Sydney, Australia Abstract. Linked Data allows structured data to be published in a standard way that datasets from various domains can be interlinked. By leveraging Semantic Web standards and technologies, a growing amount of semantic content has been published on the Web as Linked Open Data (LOD). The LOD cloud has made available a large volume of structured data in a range of domains via liberal licenses. The semantic content of LOD in conjunction with the advanced searching and querying mechanisms provided by SPARQL has opened up unprecedented opportunities not only for enhancing existing applications, but also for developing new and innovative intelligent semantic applications. However, query-based information retrieval techniques are inadequate to deal with functionalities such as comparing, prioritizing, and ranking search results which are fundamental to some of the more innovative applications of Linked Data such as recommendation provision, matchmaking, social network analysis, visualization, and data clustering. This paper addresses this problem by building on a systematic and accurate measurement model of semantic similarity between resources. By drawing extensively on a feature-based definition of Linked Data, it proposes an information content-based approach that improves on previous methods which are restricted to specific application domains and are generally less relevant in the context of Linked Data. It is validated and evaluated for measuring item similarity in recommender systems. The experimental evaluation of the proposed measure shows that it outperforms the comparable recommender systems that use conventional similarity measures based on collaborative and content-based filtering. Keywords: Linked Data, Linked Open Data, Similarity Measures, Semantic Similarity, Information Content, Ranking, Recommender Systems, Collaborative Filtering, Content-Based Filtering. 1. Introduction The rapid development of Semantic Web technologies such as resource description framework (RDF) [1] have enabled the publication of structured data in a standard way that can be readily consumed and reused by machines and shared across diverse applications. This has transformed the conventional Web of Documents, associated with Web 1.0, into the Web of Data (also refer to as Linked Data) publishing and interlinking structured data on the Web. Linked Data can be private or public. It can be used inside organizations and enterprises, and shared among business partners to provide easier integration and to facilitate interoperability. Linked Data can also be open. Linked Open Data (LOD) is a recent community-driven effort that provides access to a large and increasing amount of diverse structured data using open Semantic Web standards and through liberal licenses [2]. The LOD cloud 1 provides free access to 570 datasets 2 in areas such as media, geography, government, publications and life sciences. Using Semantic Web standards and LOD protocols (see Berners- Lee [3]), these datasets are publicly available for machine and human consumption. This not only offers unprecedented opportunities for developing novel and innovative applications, but also makes application development more efficient and cost-effective. * Corresponding author. rouzbeh.meymandpour@sydney.edu.au As of October 2014

2 In order for Semantic Web-based applications to be able to systematically search, retrieve and analyze Linked Data specific tools and technologies are required. Semantic Web crawlers and search engines are useful tools for browsing and searching semantic data (e.g. Swoogle search engine [4] and Semantic Web Search Engine [SWSE] [5]). In addition, SPARQL [6] enables querying the Semantic Web and Linked Data. However, it is unable to deal with issues such as prioritizing and ranking search results. A limitation associated with query-based information retrieval techniques is that they cannot answer the questions regarding which of the retrieved results better match the reference query. This is fundamental to some of interesting applications of Linked Data such as recommendation provision, matchmaking, social network analysis, visualization, semantic navigation, and data clustering. These require specific measures to analyze and compare entities in Linked Data. This paper addresses this problem based on a systematic assessment of semantic similarity between entities. Similarity measures evaluate the degree of overlap between entities based on a set of pre-defined factors such as taxonomic relationships, particular characteristics of the entities or statistical information derived from the underlying knowledge base. They have been proposed and used in diverse areas such as cognitive psychology, computational linguistics, artificial intelligence (AI), and natural language processing (NLP) to assess the similarity (or dissimilarity) between domain concepts or entities. However, besides the fact that each similarity measure is dependent on the implicit or explicit assumptions in its design and formulation, they are largely limited to the specifications and knowledge representation models of particular application domains. These assumptions make them less applicable in the Linked Data context. This paper first provides an overview of previous approaches for semantic similarity measurement (Section 2) and describes their limitations in the new context of Linked Data (Section 3). By drawing on a formal, mathematical definition of Linked Data, we proceed to present our LOD-based semantic similarity measure (Section 4). In order to validate the proposed measure and to demonstrate its applicability and value, it is applied for developing a LOD-based recommender system (Section 5). We compare the performance of our recommender system with that of conventional and state-of-the-art systems. Finally, we conclude this paper by discussing the limitations of this study (Section 0), providing a review of the related work on LOD-based similarity measurement and recommendation provision (Section 7) and outlining the future research directions that can be built upon this work (Section 8). 2. Semantic Similarity Measurement As a fundamental basis to theories of perception, behavior, social bonding, learning and judgment, the notion of similarity has been extensively studied for several decades. It has been discussed by famous philosophers such as Plato and Aristotle (the law of similarity principle in associationism, Aristotle 350 B.C.E) and investigated by distinguished psychologists such as Shepard [7], Tversky [8, 9], and Nosofsky [10, 11]. In cognitive psychology, similarity is defined as the degree of resemblance between two perceptual or conceptual objects. People tend to perceive objects that are similar as a group (the law of similarity principle in Gestalt psychology). Many researchers have endeavored to understand and mirror the way humans judge the similarity of two or more objects. Drawn from extensive studies on factors related to the human perception of similarity in psychology (e.g. see Goldstone [12] and Decock and Douven [13]), computer scientists have developed systematic methods to evaluate the level of similarity among various objects of interest. Semantic similarity reflects the relationship between the meaning of two concepts, entities, terms, sentences or documents. It is computed based on a set of factors derived typically from a knowledge representation model. Thesauri, taxonomies, and ontologies are among the main models used for representing the domain knowledge. Therefore, similarity measures are generally domain-dependent; depending on the structure of the application context and its knowledge representation model, various measures have been proposed. Semantic similarity measures can be classified into four main categories: 1) distance-based models that are based on the structural representation of the underlying context, 2) feature-based models that define concepts or entities as sets of features, 3) statistical methods that consider statistics derived from the underlying context, and 4) hybrid models that comprise combinations of the three basic categories. The following sections review the main approaches for semantic similarity measurement. The related work section (Sec-

3 tion 7) also provides a detailed review of the key methods for similarity measurement as it relates to Linked Data Distance-based Similarity Measures Distance-based models, also referred to as geometric models or mental distance approaches in psychology [7] and edge-counting or path-based methods in graph-based representations [14], define similarity as a function of distance between concepts. Distance metrics satisfy the following mathematical properties: 1) Non-negativity: d(a, b) 0 2) Coincidence axiom: d(a, b) = 0 if and only if a = b 3) Symmetry: d(a, b) = d(b, a) 4) Triangle inequality: d(a, c) d(a, b) + d(b, c) Distance in Multidimensional Spaces Shepard [7] proposed a geometric model for similarity assessment where objects are represented as points in a multidimensional similarity space (e.g. size, color, and shape are among the dimensions of real-world objects). The more proximate two concepts are in the underlying similarity space, the more similar they are. Given a multidimensional space and points that represent objects of interest, various distance functions such as the Euclidean, Manhattan, and Minkowski distance metrics can be employed to measure the distance between them. When objects in a particular domain of interest are not explicitly represented in a multidimensional space, various techniques can be applied to derive the spaces. For example, in information retrieval (IR), vector space model (VSM) [15] is a common way for representing documents based on their terms. The frequency of terms in a document (term frequency [TF]) is a simple way to represent each document: each term corresponds to a dimension and a vector of term frequency values denotes each document. The similarity between two documents can be calculated based on the cosine of the angle between their vectors: Cosine(A, B) = n i=1 A B A B = tf(i, A) tf(i, B) n (tf(i, A)) 2 i=1 n (tf(i, B)) 2 i=1 (1) where A and B are the vectors representing the two documents in an n-dimensional space (n is the number of terms) and tf(i, A) and tf(b, i) are term frequency values of the term i in the documents. A B is the dot product (intersection) of the two vectors and A and B are the norm of the vectors. Other related methods for obtaining vector spaces include term frequency-inverse document frequency (TF-IDF), multidimensional scaling (as used by Shepard [7] and Micko [16], among others, also see Nosofsky [10]), latent semantic analysis (LSA) (also known as latent semantic indexing [LSI]) [17, 18], and topic models [19, 20] Distance in Semantic Networks According to the semantic network model proposed by Quillian [21], concepts and relationships among them can be denoted as nodes and links, respectively. In this model, is-a (superordinate and subordinate) relations play a more important role than other types of relations. In taxonomies, where information is structured in a hierarchical manner using is-a relations (see Figure 1), the distance between nodes (number of edges) can provide an estimate of their mutual similarity. These distance functions are also called edge counting or path-based methods. For example, in Figure 1, the node i is closer to e than c. Thus, i tends to be more similar to e. The logic is that the lower the hierarchical distance, the higher the similarity: Similarity = Distance a root c d e f g Figure 1. Link structure of a sample is-a taxonomy h b i (2) An extensive body of literature has explored the measurement of semantic similarity using the ontological knowledge of WordNet a lexical taxonomy of English words widely used in computational linguistics, AI and NLP [22]. It provides definitions as well as several semantic relations between words. It categorizes words into nouns, verbs, adjectives and adverbs, and groups them into synonym sets, called

4 synsets. Synsets are connected using two main types of semantic relations is-a and part-of (member of) relations. Is-a relations include hypernym (superclass) and hyponym (subclass) relations. For example, eagle is a hyponym of bird ( eagle is a bird or eagle is a subclass of bird ) and animal is a hypernym of bird ( animal is a superclass of bird ). Partof relations include holonymy and meronym relations. As an example of the latter, hand is a meronym of (part of, member of) body and body is a holonym of hand. However, most of the edge counting methods consider only is-a links, which define the concepts, whereas part-of relations characterize them [14]. A basic technique for measuring semantic distance between pairs of words represented in a is-a taxonomy has been proposed by Rada et al. [14]. It describes the conceptual distance as the shortest path length connecting any two nodes: Rada Distance (a, b) = δ(a, b) (3) where δ(a, b) is the shortest path length between a and b. It is based on this idea that in is-a hierarchies terms that are close together are more similar to each other. Other path-based methods incorporate the relative depth of concepts in a given taxonomy into semantic similarity assessment [23, 24, 25]. For example, Wu and Palmer [25] presented the notion of conceptual similarity for verb selection. In this approach, the depth is calculated by counting edges that separate terms from their least common subsumer (LCS) (also known as the most recent common ancestor [MRCA]) the nearest superclass that both concepts share. For example, in Figure 1, the LCS of nodes i and h, denoted by i,h, is f, while i,c = root. Wu and Palmer [25] s metric relies on the fact that in is-a hierarchies, concepts that are more distant from the root are more specific than the ones that are near the root: WuPalmer(a, b) = 2 δ( a,b, ρ) δ(a, a,b ) + δ(b, a,b ) + 2 δ( a,b, ρ) (4) where the function δ(, ) calculates the number of edges separating two nodes, a,b is the most recent superclass of a and b, and ρ is the root of the taxonomy. For example, in Figure 1, the nodes i and h are more similar to each other (WuPalmer(i, h) = 2 ) than f 3 and g (WuPalmer(f, g) = 1 ). 2 A related similarity metric designed for directed graphs is SimRank, proposed by Jeh and Widom [26]. It has been widely used for finding similar Web pages connected by hyperlinks. In this method, the similarity of two nodes is computed based on the similarity between their neighbors by considering the number of outgoing or incoming links. SimRank [26] and other widely used methods such as PageRank [27], HITS [28], Co-citation [29] and SALSA [30] are designed for link-based graphs such as Web or citation networks. In these cases, nodes (i.e. Web pages or academic papers) are connected to each other using one type of link (i.e. hyperlinks in the case of Web and cites in citation networks). Some authors also attempted to modify the link-based methods for the WordNet graph. For example, the Personalized PageRank [31] is used for word sense disambiguation (WSD) using WordNet [32, 33]. However, these methods do not explicitly consider the type of the links in their approaches; all link types have the same weight. Although they can be applied in more complex semantic networks such as Linked Data where nodes are connected using multiple types of semantic relations, different types of links will be regarded as the same. As a result, the semantics of the specific relations will be partially discounted. As discussed in the foregoing, the applicability of the presented edge counting-based metrics is restricted to is-a hierarchies or networks with one type of link. Other approaches have attempted to handle the variety in link types, for example, by weighting the links based on their characteristics in order to reflect their relative importance in similarity evaluation [34, 35, 36]. However, they are largely limited to the relations in the WordNet lexical database Feature-Based Similarity Measures Tversky [9] has discussed the limitations of distance-based methods conducted empirical studies from a psychological point of view. One of the limitations is the symmetry assumption; if a is similar to b then b is also similar to a. In multidimensional similarity spaces and hierarchical structures, the distance between points or nodes is always the same regardless of the point from which the measurement starts. Tversky [9] argued that psychological concepts and, therefore, humans similarity judgments are not always symmetrical; the direction of the similarity statements is essential. For example, we usually say, the son resembles the father instead of the father resembles the son [9:328]. Based on this, Tversky [9]

5 proposed a feature-based model of similarity as the solution. As introduced by Tversky [9], feature-based methods assume that concepts can be represented as sets of features. They assess the similarity of concepts based on the commonalities among their feature sets: any increase in common features among concepts results in a higher similarity score and any decrease in shared features results in lower levels of similarity. Based on this, set-based indices such as Jaccard [37] and Dice [38] coefficients can be adopted for similarity assessment. For example, the Jaccard index of two sets is the ratio of shared features to all features: Jaccard(A, B) = A B A B (5) such that A and B denote the sets of features correspond to concepts a and b. In addition to common features, the Tversky [9] ratio model, which is a generalization of the Jaccard and Dice models, also considers the distinctive characteristics of each concept (the features of one concept which are not part of the other): Tversky(A, B) = A B A B + α A B + β B A for α, β > 0 (6) where α and β represent the relative contribution of unique features of A and B in the similarity value, respectively. For example, for α > β, distinctive features of A are weighted higher than that of B. The α and β parameters can be used to reflect the symmetric or asymmetric nature of a given context: if α = β then Tversky(A, B) = Tversky(B, A) thus, the similarity comparison is symmetric, otherwise, it is asymmetric (Tversky(A, B) Tversky(B, A)). Feature-based models are applicable in contexts in which entities are or can be represented as sets of features. In other situations, such as hierarchical structures of information or ontologies, features need to be explicitly defined for domain concepts or entities. For example, when working with taxonomies, some authors define features of an entity as the set of its superor sub-classes and employ Jaccard or Tversky indices to determine the overall similarity value [39, 40, 41]. The simplicity and flexibility of feature-based methods enable them to be easily combined with other ap- proaches. In Section 2.4 below, we will review a number of hybrid approaches that merge the benefits of feature-based models with other approaches Statistical Similarity Measures Statistical similarity measures incorporate statistics derived from various aspects of the underlying domain into the similarity computation. Several approaches use the frequency of terms in a document as a measure of their informativeness also known as information content (IC) (see Section 4) and use that as a basis for measuring the similarity [42, 43, 44]. For example, Resnik [42] considers the popularity of the LCS of two terms as a measure of their similarity: two terms that share a LCS that is more popular in corpus 3 are considered less similar than two terms that share a less frequent LCS. It is based on the assumption that terms that are more frequently used (such as I, me, the, etc.) are more general and less informative than less common words. These methods tend to show better results compared to feature-based and edge countingbased semantic similarity measures [42, 43, 44]; they showed a higher level of correlation with human judgments. As described, a large proportion of the approaches proposed for measuring semantic similarity has used the WordNet lexical database as the main knowledge base. However, there are several problems associated with using WordNet such as its limited applicability and the lack of technical, domain-specific terms. To overcome these issues, many authors employed large text corpora for measuring semantic similarity. For example, when working with a text corpus, pointwise mutual information (PMI) [46] can be applied for measuring semantic similarity between words [47]. PMI is based on the ratio of the number of co-occurrences of two terms together to their individual occurrences in a document. For example, words such as doctors, dentists, nurses, treating, and hospitals are highly associated because the often appear together in the same document [47]. Although PMI can be applied using a document or set of documents, several approaches (e.g. PMI-IR [48], Etzioni et al. [49], SOC-PMI [50], and Newman et al. [51]) have used online sources such as the search results of Google and Wikipedia as the main corpus. Vector-based methods such as latent semantic analysis (LSA) [17] and explicit semantic analysis (ESA) 3 Resnik [42] calculated the frequency of concepts using the Brown Corpus of American English (also referred to as Brown Corpus), a large collection of text compiled from works published in the United States in 1961 ranging from news articles to science fiction [45].

6 [52] can also be classified as statistical semantic similarity measures. LSA applies singular value decomposition (SVD) to term-document matrices where each cell contains the frequency of the corresponding word in a document. ESA represents words as weighted vectors of concepts derived from Wikipedia articles. Compared to LSA, ESA showed a higher correlation with human judgment for estimating relatedness of words [52] Hybrid Similarity Measures This section gives an overview of a number of approaches that can be classified as hybrid methods: they are based on combinations of the three main methods. Hu et al. [53] combine feature-based methods with distance functions by representing the features of entities in ontologies using description logic and measuring the similarity using a vector-based cosine similarity measure. A number of approaches combine feature-based and statistical methods. For example, in isa taxonomies, intrinsic information content (IIC) [54] incorporates the number of subclasses of a concept for estimating the information content: the higher the number of subclasses of a term, the lower its informativeness. IIC have also been combined with featurebased [39, 40, 55] and edge counting [56] methods. For example, instead of WordNet, WikiRelate! [56] applies IIC on the category hierarchy of Wikipedia for estimating the semantic relatedness of a pair of words. Milne [57] used the Wikipedia link structure to create a vector model for computing the relatedness. Wiki- Walk! [58] is another Wikipedia-based relatedness measure that employs a combination of Personalized PageRank [31] and ESA [59]. It showed better results compared to WordNet-based and other Wikipediabased methods such as Wikipedia link measure (WLM) [60] and WikiRelate! [56]. In another study [61], the authors presented an approach for computing semantic relatedness using multilingual semantic graphs created by integrating concepts from WordNet and Wikipedia. 3. Limitations of Previous Approaches for Semantic Similarity Measurement on Linked Data In the previous section, we reviewed some of the existing similarity measures proposed in the literature in various domains. We studied three main categories 4 rdf: is the prefix for the namespace of approaches, namely, distance-based metrics, feature-based models and statistical methods as well as hybrid approaches. However, owing to the existence of heterogeneous relationships between resources and the unique graph structure of Linked Data, our contention is that the existing similarity measures and metrics developed primarily for taxonomies such as WordNet are not the most suitable measures in this new context. The Linked Data graph is a complex semantic network in which information resources (nodes) are connected by a wide range of semantic relations (edges). Unlike WordNet, Linked Data has a wide range of relations of which is-a and part-of are two particular types. Therefore, any measure of semantic similarity for Linked Data has to consider its particular characteristics such the variety in link types and the direction of the relations. Distance-based metrics deal only with is-a relations, while Linked Data is characterized by many different kinds of links, of which the is-a relation (expressed by rdf: type 4 and rdfs: subclassof 5 properties) is only one type and thus is unable to describe the resources adequately. Moreover, approaches that consider various types of links [34, 35, 36] are limited to the relations in WordNet. In addition, although methods such as SimRank [26] and PageRank [27] can be applied based on the link structure of Linked Data, they do not consider the semantics represented using various types of relations and all link types have the same weight in the similarity measurement. A wide range of approaches based on WordNet such as the metrics proposed by Wu and Palmer [25] and Leacock [23, 24], and statistical methods proposed by Resnik [42], Jiang and Conrath [43], and Lin [44] determine the semantic similarity based on the least common subsumer (LCS). Their applicability to the Linked Data graph structure (see Figure 2 (b) and (c)) is limited. In the tree-based structure of lexical taxonomies such as WordNet, concepts or entities are connected in a hierarchical manner (Figure 2 (a)), while the is-a link structure of Linked Data is different: resources can be subsumed by multiple classes. Therefore, a resource can have multiple parents (see Figure 2 (b) and refer to Section for an example). Moreover, in the Linked Data graph, resources are linked via multiple incoming and outgoing edges (Figure 2 (c)). This graph structure makes the LCS less relevant for the Linked Data context. 5 rdfs: is the prefix for the namespace

7 root root a b a b a b f thing c c d e f g c d e f g h e d (a) (b) (c) Figure 2. Difference between (a) a hierarchical taxonomy, (b) the is-a link structure of Linked Data and (c) a sample Linked Data graph Finally, the Linked Data graph is a complex network in which information about resources is not explicitly represented in sets of features. Therefore, despite their simplicity and flexibility, feature-based measures such as Tversky [9], Jaccard [37], and Dice [38] cannot be readily adopted. In the next section, we propose a feature-based definition of Linked Data which will be combined with statistical models to develop a semantic similarity measure that considers the particular characteristics of Linked Data. 4. A Hybrid Semantic Similarity Measure for Linked Data Similarity measurement is fundamental to a wide range of Linked Data applications including entity comparison and ranking, ranking of search results, recommender systems and, data clustering and visualization. Having reviewed previous approaches of similarity measurement and discussed their limitations in the context of Linked Data, this section presents a feature-based definition of Linked Data that considers its specific characteristics and proposes a semantic similarity measure which is a hybrid of feature-based and statistical approaches Formal Definition of Linked Data Tim Berners-Lee, inventor of the Semantic Web and World Wide Web (WWW), used the term Giant Global Graph (GGG) on his blog in 2007 to refer to the new environment enabled by Semantic Web technologies [62]. Similar to the social graph of social networks where people are connected based on their relationships and interests, in GGG information resources are linked based on semantic relations among them. Linked Data is a massive collection of RDF statements related to various entities of interests such as movies, artists, actors, cities, etc. known as information resources or resources, for short. Each RDF statement (also known as triple) is in the form of subject-predicate-object. In RDF, subjects, predicates and objects are uniquely identifiable using URIs (uniform resource identifiers). RDF statements can be represented as nodes and edges where the subject and the object are the nodes, and the relations (predicates) between them are the edges connecting the nodes. The edges are directed, meaning that the direction of the links is part of the definition of the relation. Moreover, considering that a subject can be connected to several objects to express various statements, and that an object of a statement can also be the subject of another statement, the graph representation of RDF statements describing Linked Data forms a massive graph of interconnected nodes, referred to as the Giant Global Graph. Thereby, we describe Linked Data as a graph of resources and the relations among them: Definition 1. (Linked Data): Linked Data (LD) is a labelled directed graph, defined as R, L, T, such that R = {r 1, r 2,, r R } is a set of resources (nodes, vertices), L = {l 1, l 2,, l L } is a set of links (edges, relations, predicates) and T = {t 1, t 2,, t T } is a set of triples (statements) such as r 1, l 1, r 2, where l 1 L is a link from r 1 R to r 2 R. Based on this definition, resources can be defined according to their neighbors, that is, their relationships with other resources in the Linked Data graph. We define a resource in Linked Data as a set of its features; the statements in which the resource is participating as the subject or the object:

8 Definition 2. (Features in Linked Data): A feature f of the resource r R in Linked Data (LD) is denoted as a triple of kind l, r t, D, where r t R is the (target) resource directly connected to r via the link l L and D is the direction of the link (In/Out). Hence, we define resources based on the notion of features in Linked Data: Definition 3. (A Resource in Linked Data): A resource r R in Linked Data (LD) is denoted as a set of its features F r, defined as follows: F r = F r Out F r In (7) F r Out = { l i, r i, Out, l i L, r i R r, l i, r i LD} (8) F r In = { l i, r i, In, l i L, r i R r i, l i, r LD} (9) In this definition, incoming and outgoing relations of the resource, the type of the relation, the direction of the relation and the target node (the node connected to the other end of the relation) are considered in the description of the resource. As a simple illustration, the features of nodes r and s in Figure 3 below are the sets F r and F s, respectively: F r = {(l 1, a, Out), (l 2, b, In), (l 3, c, Out), (l 4, d, Out)} F s = {(l 2, b, In), (l 4, c, Out), (l 4, d, Out), (l 5, e, Out)}. Also, F r F s = {(l 2, b, In), (l 4, d, Out)}. In this section, we have presented a mathematical definition of Linked Data and resources in Linked Data. In this definition, resources are defined based on their features drawn using their incoming and outgoing relations. This definition provides us with a simple, yet flexible basis for developing Linked Databased measures in the following sections. l 1 a b l 2 l 2 r l 3 c l 4 s l 4 Figure 3. An example of resources and features in the Linked Data graph (r, s, a, b, e are sample resources and l 1, l 2, l 5 are sample links) d e l 4 l A Hybrid Approach for Semantic Similarity Measurement As discussed earlier, Tversky [9] has characterized concepts as representable as sets of features and the commonalities between the features of two concepts can be used as a measure of their similarity (see Section 2.2). However, a main drawback associated with feature-based methods is that they treat all features similarly: all features are weighted the same in the similarity evaluation. Several empirical studies have shown that factors influencing the human s similarity judgment have various levels of importance [63, 64, 65]. These studies showed that the level of importance varies according to the psychological stimuli of the comparison-maker and contextual variables. Statistical models of similarity incorporate statistics about the underlying context into the semantic similarity comparison in order to reflect the relative importance of the influencing factors. Our approach is based on the information content (IC) of features, that is, the relative informativeness of features. Therefore, the importance of the factors influencing the similarity judgment (i.e. features) is derived based on their informativeness, that is, the amount of information conveyed by their presence Information Content Measurement in Linked Data Information theory, as proposed by Shannon [66], describes the mathematical foundations of communication; transmitting information over communication channels by means of coding schemes [67, 68]. Based on earlier work by Hartley [69], Shannon s key idea was to define information as a measurable mathematical quantity, information content (IC). Shannon [66] presents information content as a measure of information conveyed by the occurrence of a random event chosen from a set of possible events. IC is defined as the logarithm of the inverse of the event s probability: IC(x) = log ( 1 ) = log(π(x)) (10) π(x) IC(x) is the amount of information produced with the occurrence of the event x based on its probability π(x): the higher the possibility of an event, the lower its information content. The logic can be generalized to various domains so that common alphabets in a message, frequent messages in a collection of possible messages or frequent terms in a textual document carry less information compared to less frequent ones.

9 In other words, occurrence of less frequent events conveys more information; therefore, they are more informative. The logarithm in Equation (10) is usually to the base two. Therefore, it is measured in units of information called bits. Other bases can also be used. For example, for the base 10 and natural logarithms, the unit of information is called bans (decimal digits or Harleys) and nats (or natural units), respectively. Other bases can also be easily converted to each other ( log b (a) = log 2 (a) ). However, the base two is the log 2 (b) most common case. The concept of information content has been widely used in a number of areas. For example, in data compression, the more frequent terms in a corpus are considered to be less informative. Therefore, they can be stored using fewer bits. Similarly, in variable-length source coding, symbols in the source message that are more common are sent using fewer bits and those that are less frequent are transmitted using more bits. In other domains, the probability of events may not be measured based on their frequency. For example, in hierarchical taxonomies of nouns (such as WordNet), the terms with a higher number of subclasses (children) are considered to be less informative [54]. In the next sections, we extend the notion of information content to Linked Data. In the following sections, we first propose a measure derived from the formal definition of Linked Data and the principles of information content measurement, to assess the value of information associated with features in Linked Data. Based on this, we proceed to define the aggregate information content of resources according to their set of features. The measure of information content of resources will be used as a basis for semantic similarity measurement using Linked Data Information Content of Features in Linked Data Based on Definition 3, a resource in Linked Data can be described using its set of features, that is, by having its characteristics defined as a collection of its incoming and outgoing relations. As explained previously, the type of the relation, the target node and the direction of the relation are considered in the definition of features (Definition 2). Based on the probability theory foundations of information content, we define the IC value associated with a feature in Linked Data as follows: Definition 4. (The Information Content of Features in Linked Data): Let π(f) be the probability of the feature f in Linked Data (LD). The information content of f is defined as: IC(f) = log ( 1 ) = log(π(f)) (11) π(f) The probability of a feature can be computed based on its relative frequency: the ratio of the number of resources with the feature to the total number of resources: π(f i ) = φ(f i) N (12) where φ(f i ) is the frequency of the feature f i and N is the total number of resources. Similar to Shannon s [66] and Hartley s [69] logarithmic measure of information content (Equation (10)), the proposed measure of the information content of features in Linked Data (computed using Equations (11) and (12)) satisfy the following properties (see Mézard and Montanari [70]). For a feature f: 1) IC(f) 0 2) IC(f) = 0 if and only if the feature f is certain, that is, its probability is one (π(f) = 1). In other words, for the feature f, which is shared between all resources in the underlying Linked Data, the amount of information conveyed by its occurrence is zero. 3) IC(f) is maximum when the feature f only occurs once. Thus, for the feature f where φ(f) = 1 and π(f) = 1, IC(f) is maximum and equal to N log(n). Hence, the amount of information content of a feature is always less than or equal to log(n). 4) Additivity, that is, for any two features in Linked Data (which are mutually independent; i.e. the occurrence of one feature does not depend on the occurrence of the other), the information content associated with the occurrence of both features is equal to the sum of the IC values of the two features: IC(f 1, f 2 ) = log(π(f 1, f 2 )) = log(π(f 1 ) π(f 2 )) = log(π(f 1 )) log(π(f 2 )) = IC(f 1 ) + IC(f 2 ) (13)

10 These mathematical properties of the proposed information content measure of features in Linked Data are able to be used to further study and extend the measure. As an example of the second property, all resources in DBpedia are an owl:thing that is defined using the feature (rdf: type, owl: Thing, Out). 6 Therefore, its IC value is zero; no information is produced by its occurrence. The fourth property is able to be extended to a set of features in order to compute the IC value of resources in Linked Data. In the next section, we employ this property and introduce the partitioned information content (PIC) of resources in Linked Data Partitioned Information Content of Resources in Linked Data The proposed logarithmic computation of the information content of features (Equations (11) and (12)) has the additivity property, which implies that the information content of two features is equal to the sum of their IC values. This section extends this property to develop a measure of the information content of resources in Linked Data: Definition 5. (The Probability of a Set of Features in Linked Data): Let π(f) be the probability of the feature f in Linked Data (LD). For the resource r ε R, represented as a set of (mutually independent) features F r = {f 1, f 2,, f fr }, the probability of the set F r is defined as π(f r ) = π(f 1 ) π(f 2 ) π(f fr ) = π(f i ) f i F r (14) Having defined the information content of a feature (Equation (11)) and the probability of a set of features (Equation (14)), we can measure the information content of a resource (r) based on the set of its features (F r ): log(π(f r )) = (15) log ( π(f i )) = log(π(f i )) f i F r f i F r We express this measure as the partitioned information content (PIC) of a resource in Linked Data [71]: Definition 6. (Partitioned Information Content in Linked Data): The information content of a resource in Linked Data is defined as the sum of the information content values of its features: PIC(r) = IC(f i ) f i F r (16) In this definition, IC(f i ) is computed by Equations (11) and (12). The PIC measure can be summarized as follows: PIC(r) = log ( φ(f i ) N ) f i F r (17) The partitioned information content (PIC) of resources in Linked Data is the aggregate amount of information content conveyed by a given resource (PIC(r) 0) and is based on the information content of features of the resource. Owing to the use of the base two for the logarithm function, PIC is measured in units of information, that is, bits. The characteristics of PIC are derived from its information theory fundamentals. Equations (11) and (12) are premised on the notion that highly probable features are general and less informative, while distinctive features, that is, features with a low number of occurrences, are more specific and convey more information. For example, based on the frequency of features, the fact that all actors are a Person (specified using the feature (rdf: type, foaf: Person, Out)) is substantially more popular than the fact that a particular actor starred in a movie (specified using the feature (starring, movieuri, In) ). The former applies to millions of resources in DBpedia that describe a person, while the latter is only used when representing the actors of the movie (specified with movieuri). The frequency of the latter is equal to the number of actors who starred in the movie; therefore, it is more informative than the former. The information content of features influences their contribution in the partitioned information content of the resource to which they belong. The PIC measure, computed using Equation (17), implies that popular features, shared between a large number of resources in Linked Data (e.g. being a person), contribute less to the PIC of resources than infrequent ones (e.g. the actors of a particular movie). As a result, resources with more distinctive features are more informative. 6 owl: is the prefix for

11 Partitioned Information Content across Datasets on the LOD Cloud The Linked Open Data (LOD) cloud 7 provides free access to more than 570 datasets in various areas such as media, geography, publications, life sciences and government. However, entities are often described in multiple datasets. For example, LinkedMDB 8 (Linked Movie Database) is part of the LOD cloud that provides structured information on over 85,600 movies [72]. These movies are also described in other LOD datasets such as DBpedia and Freebase. Another example is GeoNames 9 through which detailed semantics on geographic information are provided. These datasets are connected using the owl: sameas relations. The giant graph formed by linking these datasets is referred to as the Linked Open Data cloud. In order to leverage the potential power of Linked Open Data, we extend PIC to include semantics about resources from various datasets. As resources are often described in multiple datasets in the LOD cloud, valuable information can be obtained from a variety of sources. Therefore, the partitioned information content (PIC) of a resource is the aggregation of its PIC values in all datasets in the whole LOD cloud: PIC LOD (r) = PIC LD (r) LD LOD (18) where PIC LD (r) is computed separately for each dataset in LOD using Equation (17). Datasets considered in the LOD cloud-based PIC computation can be datasets such as DBpedia, Freebase, LinkedMDB, MusicBrainz, etc. or datasets that provide semantics in multiple languages. For example, localized editions of DBpedia 3.8 are published in 111 languages. 10 These datasets can be included to compensate for the missing information or to add information on entities that are described better in languages other than in English PICSS: Partitioned Information Content (PIC)-Based Semantic Similarity Measure We propose our partitioned information content (PIC)-based semantic similarity measure, called PICSS, which is a combination of feature- and information content-based approaches. This measure not only takes into account all types of relations but also adjusts the influence of features in the similarity value based on their informativeness. Given the notion of the information content of features in Linked Data presented in Section and the partitioned information content (PIC) proposed in Section 4.2.3, we employ the Tversky [9] ratio model and propose PICSS, a PIC-based semantic similarity measure for Linked Data: Definition 7. (PICSS a PIC-based Semantic Similarity Measure for Linked Data): Similarity of two resources, r, s ε R, represented as sets of their features F r and F s, respectively, is defined as: PICSS(r, s) = PIC(F r F s ) PIC(F r F s ) + PIC(F r F s ) + PIC(F r F s ) (19) The similarity scores computed by PICSS are normalized between zero and one, where the score of zero represents no similarity between resources (perfectly dissimilar resources) and one represents a perfect similarity (identical resources). 11 Based on the information theoretic foundations of the measure, as less frequent features are considered to be more informative, the fact that they are shared between the resources is more influential than would be the case for frequent, less informative features. An important characteristic of PICSS is that the similarity value increases with more shared features and decreases with differences between resources. Tversky [9:330] used block letters to illustrate the importance of considering differences as well as commonalities in similarity assessment: let us assume that each block letter can be represented with a set of straight lines. Therefore, for example, the only feature of I is one vertical line, while the features of E are one vertical and three horizontal lines. Based on this assumption, I is more similar to F than to E : despite the fact that the same feature (one vertical line) is shared between I and the others, as I and F have fewer distinctive features (three) they are considered to be more similar (in contrast to four distinctive features of I and E ). However, if distinctive features were not considered in the similarity measure, I would be equally similar to both F and E. Therefore, based on Tversky s theory of similarity, any increase in commonalities and/or decrease in differences between entities lead to a higher similarity. In order to accurately compute the similarity, PICSS considers both shared features and distinctive features (i.e. the See 11 In this model, we assume that the similarity is symmetric.

12 features of one resource which are not part of the other) of resources in similarity computation. PICSS combines the advantages of feature- and information content-based measures. It enables applications to perform in-depth semantic analysis of entities based on structured data acquired from Linked Open Data. In the following sections, we explain how PIC and PICSS can be implemented using SPARQL queries and present examples demonstrating their performance Implementation In order to compute our proposed semantic similarity measure, PICSS, a number of SPARQL queries need to be executed to measure the partitioned information content (PIC) of resources using Linked Data. To begin the analysis of resources in a particular domain of interest, we need to retrieve all resources of a certain type using a SPARQL query such as the one showed in Listing 1. SELECT?resource WHERE { } Listing 1. A SPARQL query to retrieve instances (resources) of a certain type (?resourcetype has to be replaced with a particular type from the DBpedia ontology depending on the domain, e.g. dbo:film, 12 dbo:musicalartis, etc.) In order to calculate PIC (Equation (17)), first, we need to compute the total number of resources (N). It is calculated depending on the ontological structure of the underlying Linked Data. For example, in DBpedia, all resources are a Thing. This is expressed using the feature (rdf: type, owl: Thing, Out). The total number of resources can be counted using the SPARQL query shown in Listing 2 below. In our experimental dataset (see Section 5.2), it was equal to 2,350,906. SELECT (COUNT(?resource) AS?N) WHERE {?resource rdf:type?resourcetype.?resource rdf:type owl:thing. We also need to extract the features of resources. Features can be retrieved using two simple SPARQL queries. Based on the definition of features in Linked Data (refer to Definition 2, Section 4.1). Listing 3 shows an example of retrieving outgoing relations of a given resource. A similar query can be executed to extract the incoming relations. Next, the information content of each feature needs to be computed based on its frequency. The SPARQL query presented in Listing 4 is used to retrieve the frequency of an outgoing feature. Finally, by aggregating the IC values of all features of a given resource, its PIC is computed. The same calculations and queries can be applied to compute the PIC of shared or distinctive features between two resources for computing our semantic similarity measure, PICSS (Equation (19)). SELECT DISTINCT?linkType?targetResource Out AS?linkDirection WHERE { } <resource>?linktype?targetresource. FILTER (!isliteral(?targetresource)) Listing 3. A SPARQL query to retrieve outgoing edges of a resource SELECT (COUNT(?resource) AS?freq) WHERE { }?resource?linktype?targetresource. FILTER (!isliteral(?targetresource)) Listing 4. A SPARQL query to retrieve the frequency of an outgoing feature For a better performance, these queries can be combined and executed in parallel Sample Output This section presents the results of our explanatory analysis of applying PICSS in a number of domains, namely, Films; Music, which is a collection of musical } Listing 2. A SPARQL query to retrieve the total number of resources in DBpedia 12 dbo: is the prefix for

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,

More information

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY Ankush Maind 1, Prof. Anil Deorankar 2 and Dr. Prashant Chatur 3 1 M.Tech. Scholar, Department of Computer Science and Engineering, Government

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2017/18 Vedran Sabol with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz December 11 th 2017 Overview What is Semantic

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

MIA - Master on Artificial Intelligence

MIA - Master on Artificial Intelligence MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2018/19 with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz January 7 th 2019 Overview What is Semantic Web? Technology

More information

Vector Space Models: Theory and Applications

Vector Space Models: Theory and Applications Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du

More information

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94 ه عا ی Semantic Web Ontology Engineering and Evaluation Morteza Amini Sharif University of Technology Fall 93-94 Outline Ontology Engineering Class and Class Hierarchy Ontology Evaluation 2 Outline Ontology

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Proposal for Implementing Linked Open Data on Libraries Catalogue

Proposal for Implementing Linked Open Data on Libraries Catalogue Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:

More information

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Engineering and Evaluation Morteza Amini Sharif University of Technology Fall 95-96 Outline Ontology Engineering Class and Class Hierarchy Ontology Evaluation 2 Outline Ontology

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations

More information

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities Nitish Aggarwal, Kartik Asooja, Paul Buitelaar, and Gabriela Vulcu Unit for Natural Language Processing Insight-centre, National University

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Document Retrieval using Predication Similarity

Document Retrieval using Predication Similarity Document Retrieval using Predication Similarity Kalpa Gunaratna 1 Kno.e.sis Center, Wright State University, Dayton, OH 45435 USA kalpa@knoesis.org Abstract. Document retrieval has been an important research

More information

KNOWLEDGE GRAPHS. Lecture 1: Introduction and Motivation. TU Dresden, 16th Oct Markus Krötzsch Knowledge-Based Systems

KNOWLEDGE GRAPHS. Lecture 1: Introduction and Motivation. TU Dresden, 16th Oct Markus Krötzsch Knowledge-Based Systems KNOWLEDGE GRAPHS Lecture 1: Introduction and Motivation Markus Krötzsch Knowledge-Based Systems TU Dresden, 16th Oct 2018 Introduction and Organisation Markus Krötzsch, 16th Oct 2018 Knowledge Graphs slide

More information

Lecture 1: Introduction and Motivation Markus Kr otzsch Knowledge-Based Systems

Lecture 1: Introduction and Motivation Markus Kr otzsch Knowledge-Based Systems KNOWLEDGE GRAPHS Introduction and Organisation Lecture 1: Introduction and Motivation Markus Kro tzsch Knowledge-Based Systems TU Dresden, 16th Oct 2018 Markus Krötzsch, 16th Oct 2018 Course Tutors Knowledge

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Lightweight Transformation of Tabular Open Data to RDF

Lightweight Transformation of Tabular Open Data to RDF Proceedings of the I-SEMANTICS 2012 Posters & Demonstrations Track, pp. 38-42, 2012. Copyright 2012 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.

More information

Domain Specific Semantic Web Search Engine

Domain Specific Semantic Web Search Engine Domain Specific Semantic Web Search Engine KONIDENA KRUPA MANI BALA 1, MADDUKURI SUSMITHA 2, GARRE SOWMYA 3, GARIKIPATI SIRISHA 4, PUPPALA POTHU RAJU 5 1,2,3,4 B.Tech, Computer Science, Vasireddy Venkatadri

More information

: Semantic Web (2013 Fall)

: Semantic Web (2013 Fall) 03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services.

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services. 1. (24 points) Identify all of the following statements that are true about the basics of services. A. If you know that two parties implement SOAP, then you can safely conclude they will interoperate at

More information

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science

More information

vector space retrieval many slides courtesy James Amherst

vector space retrieval many slides courtesy James Amherst vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the

More information

Ontology-based Architecture Documentation Approach

Ontology-based Architecture Documentation Approach 4 Ontology-based Architecture Documentation Approach In this chapter we investigate how an ontology can be used for retrieving AK from SA documentation (RQ2). We first give background information on the

More information

Automatic Construction of WordNets by Using Machine Translation and Language Modeling

Automatic Construction of WordNets by Using Machine Translation and Language Modeling Automatic Construction of WordNets by Using Machine Translation and Language Modeling Martin Saveski, Igor Trajkovski Information Society Language Technologies Ljubljana 2010 1 Outline WordNet Motivation

More information

On Statistical Characteristics of Real-life Knowledge Graphs

On Statistical Characteristics of Real-life Knowledge Graphs On Statistical Characteristics of Real-life Knowledge Graphs Wenliang Cheng, Chengyu Wang, Bing Xiao, Weining Qian, Aoying Zhou Institute for Data Science and Engineering East China Normal University Shanghai,

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

Linked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011

Linked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011 Linked Data Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011 Semantic Web, MI-SWE, 11/2011, Lecture 9 Evropský sociální fond Praha

More information

Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking

Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking Discovering Semantic Similarity between Words Using Web Document and Context Aware Semantic Association Ranking P.Ilakiya Abstract The growth of information in the web is too large, so search engine come

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion

More information

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation I.Ceema *1, M.Kavitha *2, G.Renukadevi *3, G.sripriya *4, S. RajeshKumar #5 * Assistant Professor, Bon Secourse College

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany Information Systems & University of Koblenz Landau, Germany Semantic Search examples: Swoogle and Watson Steffen Staad credit: Tim Finin (swoogle), Mathieu d Aquin (watson) and their groups 2009-07-17

More information

DBpedia-An Advancement Towards Content Extraction From Wikipedia

DBpedia-An Advancement Towards Content Extraction From Wikipedia DBpedia-An Advancement Towards Content Extraction From Wikipedia Neha Jain Government Degree College R.S Pura, Jammu, J&K Abstract: DBpedia is the research product of the efforts made towards extracting

More information

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018

More information

WordNet-based User Profiles for Semantic Personalization

WordNet-based User Profiles for Semantic Personalization PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM

More information

An Improving for Ranking Ontologies Based on the Structure and Semantics

An Improving for Ranking Ontologies Based on the Structure and Semantics An Improving for Ranking Ontologies Based on the Structure and Semantics S.Anusuya, K.Muthukumaran K.S.R College of Engineering Abstract Ontology specifies the concepts of a domain and their semantic relationships.

More information

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness UvA-DARE (Digital Academic Repository) Exploring topic structure: Coherence, diversity and relatedness He, J. Link to publication Citation for published version (APA): He, J. (211). Exploring topic structure:

More information

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of

More information

Putting ontologies to work in NLP

Putting ontologies to work in NLP Putting ontologies to work in NLP The lemon model and its future John P. McCrae National University of Ireland, Galway Introduction In natural language processing we are doing three main things Understanding

More information

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

A service based on Linked Data to classify Web resources using a Knowledge Organisation System A service based on Linked Data to classify Web resources using a Knowledge Organisation System A proof of concept in the Open Educational Resources domain Abstract One of the reasons why Web resources

More information

Semantically Driven Snippet Selection for Supporting Focused Web Searches

Semantically Driven Snippet Selection for Supporting Focused Web Searches Semantically Driven Snippet Selection for Supporting Focused Web Searches IRAKLIS VARLAMIS Harokopio University of Athens Department of Informatics and Telematics, 89, Harokopou Street, 176 71, Athens,

More information

OWLIM Reasoning over FactForge

OWLIM Reasoning over FactForge OWLIM Reasoning over FactForge Barry Bishop, Atanas Kiryakov, Zdravko Tashev, Mariana Damova, Kiril Simov Ontotext AD, 135 Tsarigradsko Chaussee, Sofia 1784, Bulgaria Abstract. In this paper we present

More information

Papers for comprehensive viva-voce

Papers for comprehensive viva-voce Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

Feature selection. LING 572 Fei Xia

Feature selection. LING 572 Fei Xia Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection

More information

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Tree Models of Similarity and Association. Clustering and Classification Lecture 5 Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph

More information

DBPedia (dbpedia.org)

DBPedia (dbpedia.org) Matt Harbers Databases and the Web April 22 nd, 2011 DBPedia (dbpedia.org) What is it? DBpedia is a community whose goal is to provide a web based open source data set of RDF triples based on Wikipedia

More information

Linked Data and RDF. COMP60421 Sean Bechhofer

Linked Data and RDF. COMP60421 Sean Bechhofer Linked Data and RDF COMP60421 Sean Bechhofer sean.bechhofer@manchester.ac.uk Building a Semantic Web Annotation Associating metadata with resources Integration Integrating information sources Inference

More information

Linked Open Data: a short introduction

Linked Open Data: a short introduction International Workshop Linked Open Data & the Jewish Cultural Heritage Rome, 20 th January 2015 Linked Open Data: a short introduction Oreste Signore (W3C Italy) Slides at: http://www.w3c.it/talks/2015/lodjch/

More information

Semantic Similarity Measures in MeSH Ontology and their application to Information Retrieval on Medline. Angelos Hliaoutakis

Semantic Similarity Measures in MeSH Ontology and their application to Information Retrieval on Medline. Angelos Hliaoutakis Semantic Similarity Measures in MeSH Ontology and their application to Information Retrieval on Medline Angelos Hliaoutakis November 1, 2005 Contents List of Tables List of Figures Abstract Acknowledgements

More information

Semantics and Ontologies for Geospatial Information. Dr Kristin Stock

Semantics and Ontologies for Geospatial Information. Dr Kristin Stock Semantics and Ontologies for Geospatial Information Dr Kristin Stock Introduction The study of semantics addresses the issue of what data means, including: 1. The meaning and nature of basic geospatial

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

Nearest Neighbor Search by Branch and Bound

Nearest Neighbor Search by Branch and Bound Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to

More information

A Study of Future Internet Applications based on Semantic Web Technology Configuration Model

A Study of Future Internet Applications based on Semantic Web Technology Configuration Model Indian Journal of Science and Technology, Vol 8(20), DOI:10.17485/ijst/2015/v8i20/79311, August 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 A Study of Future Internet Applications based on

More information

Automatically Annotating Text with Linked Open Data

Automatically Annotating Text with Linked Open Data Automatically Annotating Text with Linked Open Data Delia Rusu, Blaž Fortuna, Dunja Mladenić Jožef Stefan Institute Motivation: Annotating Text with LOD Open Cyc DBpedia WordNet Overview Related work Algorithms

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

DBpedia Extracting structured data from Wikipedia

DBpedia Extracting structured data from Wikipedia DBpedia Extracting structured data from Wikipedia Anja Jentzsch, Freie Universität Berlin Köln. 24. November 2009 DBpedia DBpedia is a community effort to extract structured information from Wikipedia

More information

Enhanced Image Retrieval using Distributed Contrast Model

Enhanced Image Retrieval using Distributed Contrast Model Enhanced Image Retrieval using Distributed Contrast Model Mohammed. A. Otair Faculty of Computer Sciences & Informatics Amman Arab University Amman, Jordan Abstract Recent researches about image retrieval

More information

Clustering. Bruno Martins. 1 st Semester 2012/2013

Clustering. Bruno Martins. 1 st Semester 2012/2013 Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts

More information

A Distributional Approach for Terminological Semantic Search on the Linked Data Web

A Distributional Approach for Terminological Semantic Search on the Linked Data Web A Distributional Approach for Terminological Semantic Search on the Linked Data Web André Freitas Digital Enterprise Research Institute (DERI) National University of Ireland, Galway andre.freitas@deri.org

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

Big Mathematical Ideas and Understandings

Big Mathematical Ideas and Understandings Big Mathematical Ideas and Understandings A Big Idea is a statement of an idea that is central to the learning of mathematics, one that links numerous mathematical understandings into a coherent whole.

More information

Improving Difficult Queries by Leveraging Clusters in Term Graph

Improving Difficult Queries by Leveraging Clusters in Term Graph Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu

More information

A Novel PAT-Tree Approach to Chinese Document Clustering

A Novel PAT-Tree Approach to Chinese Document Clustering A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Falcon-AO: Aligning Ontologies with Falcon

Falcon-AO: Aligning Ontologies with Falcon Falcon-AO: Aligning Ontologies with Falcon Ningsheng Jian, Wei Hu, Gong Cheng, Yuzhong Qu Department of Computer Science and Engineering Southeast University Nanjing 210096, P. R. China {nsjian, whu, gcheng,

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

ONTOPARK: ONTOLOGY BASED PAGE RANKING FRAMEWORK USING RESOURCE DESCRIPTION FRAMEWORK

ONTOPARK: ONTOLOGY BASED PAGE RANKING FRAMEWORK USING RESOURCE DESCRIPTION FRAMEWORK Journal of Computer Science 10 (9): 1776-1781, 2014 ISSN: 1549-3636 2014 doi:10.3844/jcssp.2014.1776.1781 Published Online 10 (9) 2014 (http://www.thescipub.com/jcs.toc) ONTOPARK: ONTOLOGY BASED PAGE RANKING

More information

Linked Data and RDF. COMP60421 Sean Bechhofer

Linked Data and RDF. COMP60421 Sean Bechhofer Linked Data and RDF COMP60421 Sean Bechhofer sean.bechhofer@manchester.ac.uk Building a Semantic Web Annotation Associating metadata with resources Integration Integrating information sources Inference

More information