CHAPTER 6 SEMANTIC ASSOCIATION THROUGH CONTEXT CLOSENESS

Size: px

Start display at page:

Download "CHAPTER 6 SEMANTIC ASSOCIATION THROUGH CONTEXT CLOSENESS"

Aldous Sanders
5 years ago
Views:

1 69 CHAPTER 6 SEMANTIC ASSOCIATION THROUGH CONTEXT CLOSENESS 6.1 INTRODUCTION Precision in ranking semantic association can be improved by filtering the irrelevant paths that was discussed in the previous chapter. This chapter discusses about the context closeness and how it is used in improving the precision in finding semantic associations. In the proposed work assume that the properties in the RDF graph are bidirectional ie. every relationship has a corresponding inverse relationship. This assumption is necessary because two resources may not be connected by a directed path but by a path which contains inverse relations. 6.2 MOTIVATION Assume that there are two persons X and Y. Suppose user knows something about person X or Y such as person X is involved more in financial activities or person Y is involved in politics etc. In that case, the user may be interested in finding these types of relationships between X and Y. Based on the expectation of the user the relationships between two entities are ranked. Consider the example for the relationships between person John and movie Slumdog_Millionaire in the RDF shown in Table 6.1.

2 70 Table 6.1 shows the possible relationships between the entity John and Slumdog_Millionaire. There are four intermediate entities between John and Slumdog_Millionaire. Table 6.1 Relationships between person John and movie Slumdog_Millionaire John edit_music-human_movie support_fund-tara_funding_agencies support_fund- Ragam_Music member_of-arrahman provides_music- Slumdog_Millionaire John member_of- Tara_Funding_Agency -supports_fund- Human_Movie associated_with- Ragam_Music member_of- ARRahman provides_music- Slumdog_Millionaire John edits_music Human_Movie associated_with-ragam_music support_fund- Tara_Funding_Agencies -member of- ARRahman provides_music- Slumdog_Millionaire These intermediate entities are scattered over the path in different possible combinations. If the user is interested in Music and Finance, Anyanwu et al (2005) method produce the same weights for all paths, since the component values are calculated independently and then summed up. In this case, all the paths are having equal weights and are ranked arbitrarily. Here, the user has to go through this subset of paths to find the relevant paths. Suppose a user is interested in finding and ranking relevant association according to their domain of interest which is closer to either the left entity John or the right entity Slumdog Millionaire, and to rank these paths accordingly, the existing system provides the flexibility to the user for selecting the choice for favoring long path or favoring short path, favoring popularity or favoring unpopular or favoring rarity. But there are no ways to select the choice for context closeness. The proposed method, rank the

3 71 semantic association paths between two entities according to the users needs that support context closeness. Definition (Context closeness): In an RDF graph, the sequence e 1,P 1,e 2,P 2, e 3,P 3 e n-1, P n-1,e n, where e i (1 i n) are called as the entities and P j (1 j< n) are called as properties. If the sum of context weight of user entities in the first half of the sequence is greater than second half of the sequence, then the context is said to be closer to left entity e 1. Otherwise, it is said to be closer to right entity e n. To calculate the context value based on the choice of the user selection in context closer, the Equation (4.3) has been modified based on the above definition which is as follows: C cv c D (6.1) i i i c /2 1 n #c D p i i i 1 c / 2 1 MC C 0.1 C x(1 ) c c (6.2) c /2 1 n #c D p i i i 1 c /2 1 MC 0.1 C C x(1 ) c c (6.3) where c is the total no of components in the path (excluding the start and end entities). The Equations (6.2) and (6.3) are used to calculate the context weights closer to the left entity and the right entity respectively. The context weight MC p is used as a parameter to calculate the weight of the semantic association paths.

4 PSEUDO CODE FOR RANKING THE SEMANTIC ASSOCIATION Figure 6.1 shows the pseudo code to find the semantic association paths between two entities. For each resources (entities and relations) in an association path, subsumption weight, popularity weight, rarity weight, popularity weight and association length are calculated as discussed in Chapter 2. The context weight is calculated using the equation given in Section 6.2. Finally, overall weight is calculated by adding all the resources weight in the association path. One path in an RDF graph is considered in each iteration and weights are calculated for all the resources in this path. The algorithm terminates when all the paths are considered. Input : RDF graph and two entities Output : Semantic association paths with ranking according to the user s Interest /* Initialization */ // ws - Subsumption weight // wc - Context weight // wr - Rarity weight // wp - Popularity weight // wt - Trust weight // wl - Association length weight Rank, ws,wc,wr,wp,wt,wl =0 for each resource r, in association_path /* Loop through the association */ { /* Trust calculation */ If wt < resource.trust wt = resource.trust TwP = TwP + No of incoming and outgoing relationship of entity / max (No of incoming and outgoing relationship among all entities) TwR = (No of instances and relationships in knowledgebase - instance and relationship with same type) / No of instances and relationships in knowledgebase /* context weight of the resource */ cw=context.relevance(resource) if resource.num <= association.length()/2 if cw ==0.0 notcontextl++ wcl = wcl+cw Figure 6.1 (Continued)

5 73 Else if cw ==0.0 notcontextr++ wcr = wcr+cw} /* Subsumption Calculation */ ws = ws/association.length() /* Context Calculation */ // context weight for left entity wcl=(1.0/association.length())* (wcl* (1.0 (notcontextl/association.length() ) ) ) // context weight for right entity wcr=(1.0/association.length())* (wcr*(1.0 (notcontextr/association.length()) ) ) if context closed to left wc = wcl+(0.1) wcr wc =(0.1) wcl+ wcr /* Rarity Calculation */ if favor rare associations wr = TwR/ association.length() wr = 1.0 ( TwR/ association.length()) /* Popularity Calculation */ if favor popular associations wp = TwP/ association.length() wp = 1.0 (TwP/ association.length() ) /* Association Length Calculation */ if favor long associations wl = 1.0 (1.0/association.length()) wl = 1.0/association.length() /* Overall ranking */ Rank =(k1*wc)+(k2*ws)+(k3*wp)+(k4*wr)+(k5*wl)+(k6*wt) Figure 6.1 Pseudo code to find the semantic association between two entities

74 6.4 EXPERIMENTAL EVALUATION For finding semantic association paths, the proposed system used an RDF consisting of 52 classes, 70 properties and 3000 entities covering various domains such as

6 EXPERIMENTAL EVALUATION For finding semantic association paths, the proposed system used an RDF consisting of 52 classes, 70 properties and 3000 entities covering various domains such as Music, Finance, Terrorism, Sports. To test the performance of our system, 40 pairs of entities are selected in the RDF. Figure 6.2 Paths between two entities John and Slumdog Millionaire with context closer to John Semantic association paths have been generated and ranked under the various criteria such as favor short association or favor long association, favor popular entities or favor unpopular entities, favor rarity, context closer to right entity or context closer to left entity. Criteria have been selected through user interface. Semantic association paths ranking has been done by the 50 users through the system as well as manually.

7 75 Figure 6.2 shows the paths between two entities John and Slumdog Millionaire with context closer to John and Figure 6.3 shows the Paths between two entities John and Slumdog Millionaire with context closer to Slumdog Millionaire. Figure 6.3 Paths between two entities John and Slumdog Millionaire with context closer to Slumdog Millionaire According to our experiments, the average correlation coefficients between proposed system ranking and user-human s ranking is Figure 6.4 shows the correlation between human ranking and the proposed system ranking for the top-k semantic association paths. Since the average correlation coefficient is greater than 0.5, the proposed system s ranking and user-human ranking are highly correlated. Figure 6.5 shows the comparison of correlation between human rankings with proposed system and other existing association ranking methods. It explains that correlation between

8 76 human ranking and the proposed approach is higher than other existing methods. Figure 6.4 Comparison of human and proposed system ranking with top-k results The precision rates are evaluated to the top-k semantic association paths from the ranked results of the proposed system and other existing methods. Precision represents the fraction of the relevant paths from top-k semantic association paths. Figure 6.6 shows the comparison of precision of the proposed method with the existing methods. Irrespective of value of k, precision will increase or decrease. Among the five methods which show the same phenomenon, the proposed method which is adapted is more significant and provides a high rate of precision.

9 77 Figure 6.5 Comparison of correlation between human ranking and existing association ranking methods Figure 6.6 Comparison of precision between proposed method and existing methods

10 SUMMARY AND DISCUSSION Semantic data contains entities and heterogeneous relationships among them. The number of relationships between entities might be much greater than the number of entities. Ranking these relationship paths are required to find the relevant relationships between entities with respect to the user s domain of interest. In some situation, users may expect the relationships between two entities in which his/her context is closer to any one of the end points either left entity or right entity. The proposed method provides the flexibility to the user to find and rank the semantic association paths with their interested domain components which are closer to either to the left entity or right entity. The proposed method is compared with existing methods through the Spearman Correlation Coefficient and Precision. The average correlation coefficient between the proposed system ranking, Aleman-Meza et al (2005), Anyanwu et al (2005), Lee et al (2009) and Vidal et al (2010) with human ranking are 0.70, 0.61, 0.58, and 0.6 respectively. It is evident that the proposed system ranking is highly correlated with human ranking. The precision for the top-k ranking of the proposed system ranking and other existing methods are evaluated. According to the experiments, the proposed system achieves high precision with top-k ranking than other methods.

Entity-based Semantic Association Ranking on the Semantic Web

Entity-based Semantic Association Ranking on the Semantic Web S Narayana Gudlavalleru Engineering College Gudlavalleru, Andhra Pradesh, India S Sivaleela Gudlavalleru Engineering College Gudlavalleru,