Web Science. Introduction to Information Integration

Size: px
Start display at page:

Download "Web Science. Introduction to Information Integration"

Transcription

1 Web Science Introduction to Information Integration Julien Gaugaz, October 26, 2010

2 Topics 2

3 Topics 1. Information Integration 2

4 Topics 1. Information Integration 2. Web Information Retrieval 2

5 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 2

6 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 2

7 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 5. Collaborative Web 2

8 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 5. Collaborative Web 6. Web Archiving 2

9 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 5. Collaborative Web 6. Web Archiving 7. Medical Social Web 2

10 Why Integrating Information? Scenarios

11 Company Mergers 4

12 Company Mergers 4

13 Company Mergers 4

14 Company Mergers 4

15 Travelling Agent Agent 5

16 Booking Flights Agent 6

17 Leveraging Wikipedia Infoboxes Query 7 Data Contribution

18 Evolution 1E+06 Number of Sources 1E+05 1E+04 1E+03 1E+02 1E+01 1E Beginning of Databases Rise of Internet & Wrapping Websites Wikipedia & Social Web 8

19 What is the Problem? Kinds of discrepancies

20 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =

21 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =

22 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =

23 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =

24 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =

25 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =

26 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =

27 Wikipedia Infoboxes leader_title!! = [[List of mayors of Berlin Governing Mayor]] leader!!! = Klaus Wowereit elevation! = pop_date! = population! = pop_metro! = leader_title! = [[Mayor of San Francisco Mayor]] leader_name! = [[Gavin Newsom]] ([[Democratic [...] D]]) elevation_ft!! = 52 elevation_max_ft!= 925 elevation_min_ft! = 0 population_as_of = 2008 population_total = population_metro = population_urban =

28 Wikipedia Infoboxes leader_title!! = [[List of mayors of Berlin Governing Mayor]] leader!!! = Klaus Wowereit elevation! = pop_date! = population! = pop_metro! = leader_title! = [[Mayor of San Francisco Mayor]] leader_name! = [[Gavin Newsom]] ([[Democratic [...] D]]) elevation_ft!! = 52 elevation_max_ft!= 925 elevation_min_ft! = 0 population_as_of = 2008 population_total = population_metro = population_urban =

29 Wikipedia Infoboxes leader_title!! = [[List of mayors of Berlin Governing Mayor]] leader!!! = Klaus Wowereit elevation! = pop_date! = population! = pop_metro! = leader_title! = [[Mayor of San Francisco Mayor]] leader_name! = [[Gavin Newsom]] ([[Democratic [...] D]]) elevation_ft!! = 52 elevation_max_ft!= 925 elevation_min_ft! = 0 population_as_of = 2008 population_total = population_metro = population_urban =

30 Causes of Discrepancies 12

31 Causes of Discrepancies Information sources are diverse Different cultural background Different domain of activity Different model of information 12

32 Causes of Discrepancies Information sources are diverse Different cultural background Different domain of activity Different model of information Typos and other kinds of errors 12

33 Causes of Discrepancies Information sources are diverse Different cultural background Different domain of activity Different model of information Typos and other kinds of errors Evolution over time Use, usage and users of one source may change of over time 12

34 Places of Discrepancies 13

35 Places of Discrepancies Information level where discrepancies appear: 13

36 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense 13

37 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense Representational Lexical: word / term representing the meaning Structural: how are the terms arranged to represent the meaning 13

38 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense Representational Lexical: word / term representing the meaning Structural: how are the terms arranged to represent the meaning Syntactic: how is the lexical and structural encoded into characters (and bits) 13

39 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense Representational Lexical: word / term representing the meaning Structural: how are the terms arranged to represent the meaning Syntactic: how is the lexical and structural encoded into characters (and bits) Discrepancies may concern: Schema elements (properties and structure) and values 13

40 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Representational Syntactic 14

41 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name first Albert last Einstein Representational Syntactic 14

42 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name last first Albert Einstein Einstein full_name Albert Einstein Representational Syntactic 14

43 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name last first Albert Einstein Einstein full_name Albert Einstein Representational <Einstein> <full_name> Albert Einstein. Syntactic 14

44 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name last first Albert Einstein Einstein full_name Albert Einstein Representational <Einstein> <full_name> Albert Einstein. Syntactic <Einstein> <full_name>albert Einstein</full_name> </Einstein> 14

45 Schema Ambiguity Semantic Representational xyz title The Theory of Relativity xyz title Prof. Dr. techn. 15

46 Schema Ambiguity Person title Semantic Representational xyz title The Theory of Relativity xyz title Prof. Dr. techn. 15

47 Schema Ambiguity Person title Article title Semantic Representational xyz title The Theory of Relativity xyz title Prof. Dr. techn. 15

48 Value Discrepancies Einstein s full name is Albert Einstein Semantic Representational Einstein full_name Albert Einstein Albert Einstin A. Einstein Einstein, Albert 16

49 Syntactic Level Where discrepancies are addressed with standards

50 Encoding Bytes 18

51 Encoding Bytes Basic unit Universal standard: Bit (binary digit) Ternary digit (base 3, USSR 50 s, out of use) 18

52 Encoding Bytes Basic unit Universal standard: Bit (binary digit) Ternary digit (base 3, USSR 50 s, out of use) Bits into bytes Big or small endian System wise convention, easily convertible, defined in communication protocols 18

53 Encoding Characters 19

54 Encoding Characters De facto standards: UTF-8/16 19

55 Encoding Characters De facto standards: UTF-8/16 Many others exist: ASCII, ISO-8859 s, KOI-8,... 19

56 Encoding Characters De facto standards: UTF-8/16 Many others exist: ASCII, ISO-8859 s, KOI-8,... Trivial dictionary-based translation When the corresponding code exists in the target character map... 19

57 Encoding Lexico-Structural 20

58 Encoding Lexico-Structural XML, XML Schema Structured document serialization format Base for: (X)HTML SVG: Scalable Vector Graphics DOCX: Microsoft Office Word

59 RDF Resource Description Framework Encoding information

60 source: 22

61 source: <subject> <property> <object> 22

62 source: <subject> <property> <object> <subject> URI or blank node 22

63 source: <subject> <property> <object> <property> URI <subject> URI or blank node 22

64 source: <subject> <property> <object> <subject> URI or blank node 22 <property> URI <object> URI or blank node or (typed) literal

65 URI 23

66 URI URI: Universal Resource Identifiers URL s are URI s scheme:scheme-specific-part RDF encourage using URL s 23

67 URI URI: Universal Resource Identifiers URL s are URI s scheme:scheme-specific-part RDF encourage using URL s URL scheme://usr:passwd@domain:port/path? query_string#anchor 23

68 RDF 24

69 RDF Resource Description Framework 24

70 RDF Resource Description Framework Data model specialized in conceptual information modeling 24

71 RDF Resource Description Framework Data model specialized in conceptual information modeling Supported by various serialization formats: XML Notation3 (N3) Turtle... 24

72 RDF Schema (RDF/S) 25

73 RDF Schema (RDF/S) Expressed in RDF 25

74 RDF Schema (RDF/S) Expressed in RDF Types subjects and objects with classes Class hierarchy (with multiple inheritance) Type of properties of a class 25

75 RDF Schema (RDF/S) Expressed in RDF Types subjects and objects with classes Class hierarchy (with multiple inheritance) Type of properties of a class Types properties Domain: type of property s subject Range: type of property s object 25

76 RDF Schema (RDF/S) Expressed in RDF Types subjects and objects with classes Class hierarchy (with multiple inheritance) Type of properties of a class Types properties Domain: type of property s subject Range: type of property s object OWL2 is more expressive: cardinality, etc... 25

77 When to use RDF? 26

78 When to use RDF? RDF is good at Modeling information Especially when schema is unknown or changing When there is multiple schemas 26

79 When to use RDF? RDF is good at Modeling information Especially when schema is unknown or changing When there is multiple schemas RDF is not for Representing documents (XHTML, CSS) Internal data management when schema is known and fixed (Relational Databases) 26

80 Schema Matching Discrepancies between the representational and semantic levels in the schema

81 Boxer Taxpayer name boxer id weight birthdate total fights residence Trainer first name last name age address street city tax id Company Tax Office... 28

82 Boxer Taxpayer name boxer id weight birthdate total fights residence Trainer... Input: Schemas to match... first name last name age address street city tax id Company Tax Office... Possibly data instantiating those schemas 28

83 Boxer Taxpayer name boxer id weight birthdate total fights residence Trainer... Input: Schemas to match first name last name age address street city tax id Company... Tax Office... Possibly data instantiating those schemas Output: Mappings between schema elements Possibly with confidence values and alternatives Possibly with value conversion rules (matchings) 28

84 Mappings or Matching? 29

85 Mappings or Matching? Schema mapping identifies correspondences between schema elements 29

86 Mappings or Matching? Schema mapping identifies correspondences between schema elements Schema matching actually transforms an instance of one schema into an instance of another schema 29

87 How to Use Mappings? General architectures

88 Mediated Schemas Schema2 Schema1 Schema3 31

89 Mediated Schemas Mediated Schema Schema2 Schema1 Schema3 31

90 Mediated Schemas Query Mediated Schema Schema2 Schema1 Schema3 31

91 Mediated Schemas Query Mediated Schema Schema2 Schema1 Schema3 31

92 Mediated Schemas Query Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31

93 Mediated Schemas Query Mediated Schema Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31

94 Mediated Schemas Query Mediated Schema Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31

95 Mediated Schemas Query Query Mediated Schema Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31

96 Mediated Schemas Query Query Mediated Schema Mediated Schema Query Schema x Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31

97 Peer Data Management Local Source Local Mapping Peer Schema Local Schema Peer Mapping 32

98 Why not by hand? source: 33 source:

99 Why not by hand? Size and complexity of source schemas Number of schemas sources Leveraging data instance values Schemas not known in advance source: 33 source:

100 Why not by hand? Size and complexity of source schemas Number of schemas sources Leveraging data instance values Schemas not known in advance source: 33 source:

101 Why not by hand? Size and complexity of source schemas Number of schemas sources Leveraging data instance values Schemas not known in advance source: 33 source:

102 Schema Matching Features 34

103 Schema Matching Features Schema-only vs schema & instances 34

104 Schema Matching Features Schema-only vs schema & instances Representational Lexical vs structural 34

105 Schema Matching Features Schema-only vs schema & instances Representational Lexical vs structural Internal vs external 34

106 Schema Matching Features Schema-only vs schema & instances Representational Lexical vs structural Internal vs external More in: Rahm E, Bernstein PA. A survey of approaches to automatic schema matching. The VLDB Journal. 2001;10(4): Shvaiko P, Euzenat J. A Survey of Schema-Based Matching Approaches. Journal on Data Semantics IV. 2005;3730:

107 Schema Matching Techniques 35

108 Schema Matching Techniques String-based 35

109 Schema Matching Techniques String-based Language-based 35

110 Schema Matching Techniques String-based Language-based Linguistic resources 35

111 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based 35

112 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse 35

113 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies 35

114 Schema Matching Techniques String-based Language-based Graph-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies 35

115 Schema Matching Techniques String-based Language-based Linguistic resources Graph-based Taxonomy-based Constraint-based Alignment reuse Upper-level formal ontologies 35

116 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Graph-based Taxonomy-based Repository of structures Upper-level formal ontologies 35

117 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35

118 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35

119 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35

120 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35

121 A String-Based Technique Leveraging lexical features

122 Edit Distance 37

123 Edit Distance String distance: measures distance between two strings 37

124 Edit Distance String distance: measures distance between two strings Edit distance: number of operations needed to transform one string into the other 37

125 Edit Distance String distance: measures distance between two strings Edit distance: number of operations needed to transform one string into the other Common basic operations: Insert, delete or substitute one character Possibly with different weights depending on the operation and characters involved 37

126 Edit Distance String distance: measures distance between two strings Edit distance: number of operations needed to transform one string into the other Common basic operations: Insert, delete or substitute one character Possibly with different weights depending on the operation and characters involved Java libraries: SecondString, SimMetrics 37

127 Levenshtein Distance S u n d a y s S a t u r d a y

128 Levenshtein Distance Edit operations: insert, delete, substitute S a t u r d a y S u n d a y s

129 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

130 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

131 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

132 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

133 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

134 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

135 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

136 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

137 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

138 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y

139 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a y s S a t u r d a y

140 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays y s

141 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays y s

142 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays y s

143 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays y s

144 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays y s

145 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays Saturdays y s

146 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays Saturdays y Saturdays s

147 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays Saturdays y s Saturdays Saturdays 38

148 A Linguistic Resource WordNet

149 WordNet 40

150 WordNet Fundamental components: Synonyn Sets (Synsets) 40

151 WordNet Fundamental components: Synonyn Sets (Synsets) {car, auto, automobile, machine, motorcar} a motor vehicle with four wheels; usually propelled by an internal combustion engine 40

152 WordNet Fundamental components: Synonyn Sets (Synsets) {car, auto, automobile, machine, motorcar} a motor vehicle with four wheels; usually propelled by an internal combustion engine {car, railcar, railway car, railroad car} a wheeled vehicle adapted to the rails of railroad 40

153 Hypernyms / Hyponyms Hypernyms: superordinates, isa relationships. A synset may have more than one hypernym. Hyponyms: subordinates {motor vehicle, automotive vehicle} hypernym {car, auto, automobile, machine, motorcar} hyponyms {cab, hack, taxi, taxicab} {ambulance} 41

154 Holonym / Meronym Meronym: name of a constituent part of, the substance of, or a member of something. X is a meronym of Y if X is a part of Y. Holonym: name of the whole of which the meronym names a part. Y is a holonym of X if X is a part of Y. {car, auto, automobile, machine, motorcar} holonym meronym { accelerator, accelerator pedal, gas pedal, gas, throttle, gun} 42

155 Other relationships in WN 43

156 Other relationships in WN Antonym 43

157 Other relationships in WN Antonym Entailment (for verbs) A verb X entails Y if X cannot be done unless Y is, or has been, done. 43

158 Other relationships in WN Antonym Entailment (for verbs) A verb X entails Y if X cannot be done unless Y is, or has been, done. Attribute (for adjectives) A noun for which adjectives express values. The noun weight is an attribute, for which the adjectives light and heavy express values. 43

159 A Graph-Matching Technique Leveraging structure

160 Similarity Flooding 45

161 Similarity Flooding Uses structure of the data to help matching schemas 45

162 Similarity Flooding Uses structure of the data to help matching schemas Similarity Flooding in Melnik et al. (2002) First maps schema elements with lexical similarity Then improves matching assuming that: If two elements are similar, then the elements adjacent to them are more probable to be similar 45

163 Similarity Flooding Uses structure of the data to help matching schemas Similarity Flooding in Melnik et al. (2002) First maps schema elements with lexical similarity Then improves matching assuming that: If two elements are similar, then the elements adjacent to them are more probable to be similar Selected paper 1: Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: a versatile graph matching algorithm and its application to schema matching. IEEE Comput. Soc; 2002:

164 Deduplication Detecting duplicate entries

165 Why is there Duplicates? name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Sport Authorities Administrationwide database first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # Taxes Authorities 47

166 name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY R M name: Muhammad Ali address: city: Cairo country: Egypt tax id: # first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # U Input: 2 entities with matched attributes Output: M for matched or U for unmatched. Possibly R for reject between M and U for cases where supervised decision is necessary. 48

167 Deduplication Features

168 Field Distance Metrics 50

169 Field Distance Metrics Value metrics Character-based Token-based Phonetic Numeric 50

170 Field Distance Metrics Value metrics Character-based String-based metrics seen for schema matching Token-based Phonetic Numeric 50

171 Field Distance Metrics Value metrics Character-based Token-based String-based metrics seen for schema matching Similar to Information Retrieval techniques (Topic 2 next week) Phonetic Numeric 50

172 Field Distance Metrics Value metrics Character-based Token-based String-based metrics seen for schema matching Similar to Information Retrieval techniques (Topic 2 next week) Phonetic Numeric 50 Not much techniques other than considering them as strings or direct difference

173 Field Distance Metrics Value metrics Character-based Token-based String-based metrics seen for schema matching Similar to Information Retrieval techniques (Topic 2 next week) Phonetic Numeric 50 Not much techniques other than considering them as strings or direct difference

174 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary 51

175 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 51

176 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 51

177 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 2.A2 26a132o5 51

178 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 2.A2 26a132o5 3.A26a132o5 51

179 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 2.A2 26a132o5 3.A26a132o5 4.A261 51

180 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Rupert 1.Rupert 2.Ro1e63 3.Ro1e63 4.R163 Ashcraftson 1.Ashcraftson 2.A2 26a132o5 3.A26a132o5 4.A261 Robert 1.Robert 2.Ro1e63 3.Ro1e63 4.R163 51

181 Other Phonetic Codes 52

182 Other Phonetic Codes NYSIIS Developed and still in use at the New York State Division of Criminal Justice Services Encodes vowels (mostly to A) Codes are letters instead of digits Longer codes (6 instead of 4) 52

183 Other Phonetic Codes 53

184 Other Phonetic Codes Metaphone Codes are letters instead of digits No maximum code length More elaborated coding rules 53

185 Other Phonetic Codes Metaphone Codes are letters instead of digits No maximum code length More elaborated coding rules Double Metaphone Returns a secondary code to help disambiguate 53

186 Detecting Duplicates

187 Bayes Decision Rule 55

188 Bayes Decision Rule M: match, U: unmatch M if p(m x) p(u x) U otherwise 55

189 Bayes Decision Rule M: match, U: unmatch M U Using Bayes rule if p(m x) p(u x) otherwise p(m x) p(u x) p(m x) p(x) p(u x) p(x) p(m)p(x M) p(u)p(x U) l(x) = p(x M) p(x U) p(u) p(m) 55

190 Bayes Decision Rule M: match, U: unmatch M if p(m x) p(u x) U otherwise Using Bayes rule p(m x) p(u x) Decision rule: likelihood ratio l(x) = p(x M) p(x U) p(u) p(m) p(m x) p(x) p(u x) p(x) p(m)p(x M) p(u)p(x U) l(x) = p(x M) p(x U) p(u) p(m) 55

191 Bayes Decision Rule M: match, U: unmatch M U Using Bayes rule if p(m x) p(u x) otherwise p(m x) p(u x) p(m x) p(x) p(u x) p(x) p(m)p(x M) p(u)p(x U) l(x) = p(x M) p(x U) p(u) p(m) Decision rule: likelihood ratio l(x) = p(x M) p(x U) p(u) p(m) Using independence assumption p(x M) = i p(x U) = i p(x i M) p(x i U) 55

192 Bayes Decision Rule p(x i M) p(x i U) 56

193 Bayes Decision Rule Priors ( p(x i M) and p(x i U) ) can be learned on a training set 56

194 Bayes Decision Rule Priors ( p(x i M) and p(x i U) ) can be learned on a training set Other methods based on Expectation- Maximisation (EM) algorithm can estimate priors without training set 56

195 Clustering-Based Decision Selected paper 2: Chaudhuri S, Ganti V, Motwani R. Robust Identification of Fuzzy Duplicates. ICDE :

196 Clustering-Based Decision X-Means Variant of K-Means without a fixed K Chauduri et al. observed that duplicates tend Using clustering techniques with appropriate parameters 1. to have small distances from each other (compact set property), and 2. 2) to have only a small number of other neighbors within a small distance (sparse neighborhood property). Selected paper 2: Chaudhuri S, Ganti V, Motwani R. Robust Identification of Fuzzy Duplicates. ICDE :

197 Dealing with O(n 2 ) Number of comparisons 1E E+11 5E E+11 0E ' ' ' '000 1'000'000 Number of entities in repository 58

198 Canopies 59

199 Canopies Create canopies using a cheap similarity metric Overlapping clusters 59

200 Canopies Create canopies using a cheap similarity metric Overlapping clusters Compare entities pairwise using a more expensive similarity metric 59

201 Dataspaces Pay-as-you-go Information Integration

202 Dataspaces Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:

203 Dataspaces Note a data integration approach per se Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:

204 Dataspaces Note a data integration approach per se Data co-existence appraoch Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:

205 Dataspaces Note a data integration approach per se Data co-existence appraoch Pay-as-you-go data integration Leveraging human contributions for data integration in a non-invasive manner Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:

206 Relationship between Schema Matching and Deduplication first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:

207 Relationship between Schema Matching and Deduplication Are they duplicates? first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:

208 Relationship between Schema Matching and Deduplication Are they duplicates? To compare field values we need schema matches first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:

209 Relationship between Schema Matching and Deduplication Are they duplicates? To compare field values we need schema matches first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # To find schema matches we need duplicates 62 name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:

210 Relationship between Schema Matching and Deduplication Are they duplicates? To compare field values we need schema matches first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # To find schema matches we need duplicates etc name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:

211 Selected Topic Papers 1. Schema Matching Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: a versatile graph matching algorithm and its application to schema matching. IEEE Comput. Soc; 2002: Deduplication Chaudhuri S, Ganti V, Motwani R. Robust Identification of Fuzzy Duplicates. ICDE : Dataspaces Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006: Interdependence between schema matching and deduplication Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:

Automatic Construction of WordNets by Using Machine Translation and Language Modeling

Automatic Construction of WordNets by Using Machine Translation and Language Modeling Automatic Construction of WordNets by Using Machine Translation and Language Modeling Martin Saveski, Igor Trajkovski Information Society Language Technologies Ljubljana 2010 1 Outline WordNet Motivation

More information

A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation

A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation Dimitris Manakanatas, Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. P.O. Box 1385, GR 71110, Heraklion, Greece

More information

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web What you have learned so far Interoperability Introduction to the Semantic Web Tutorial at ISWC 2010 Jérôme Euzenat Data can be expressed in RDF Linked through URIs Modelled with OWL ontologies & Retrieved

More information

Evaluation Two Label Matching Approaches For Indonesian Language

Evaluation Two Label Matching Approaches For Indonesian Language Evaluation Two Label Matching Approaches For Indonesian Language Lintang Yuniar Banowosari, I Wayan Simri Wicaksana Gunadarma University Jl. Margonda Raya 00 Pondok Cina Depok 6424 (62-2) 78882 ext.309

More information

Limitations of XPath & XQuery in an Environment with Diverse Schemes

Limitations of XPath & XQuery in an Environment with Diverse Schemes Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany 23.06.2003

More information

PRIOR System: Results for OAEI 2006

PRIOR System: Results for OAEI 2006 PRIOR System: Results for OAEI 2006 Ming Mao, Yefei Peng University of Pittsburgh, Pittsburgh, PA, USA {mingmao,ypeng}@mail.sis.pitt.edu Abstract. This paper summarizes the results of PRIOR system, which

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Relevance Feedback 2 February 2016 Prof. Chris Clifton Project 1 Start Now Project 1 is at the course web site Took a little longer than we expected Due date is Feb. 22,

More information

Proposal for Implementing Linked Open Data on Libraries Catalogue

Proposal for Implementing Linked Open Data on Libraries Catalogue Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:

More information

RiMOM Results for OAEI 2008

RiMOM Results for OAEI 2008 RiMOM Results for OAEI 2008 Xiao Zhang 1, Qian Zhong 1, Juanzi Li 1, Jie Tang 1, Guotong Xie 2 and Hanyu Li 2 1 Department of Computer Science and Technology, Tsinghua University, China {zhangxiao,zhongqian,ljz,tangjie}@keg.cs.tsinghua.edu.cn

More information

Contributions to the Study of Semantic Interoperability in Multi-Agent Environments - An Ontology Based Approach

Contributions to the Study of Semantic Interoperability in Multi-Agent Environments - An Ontology Based Approach Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. V (2010), No. 5, pp. 946-952 Contributions to the Study of Semantic Interoperability in Multi-Agent Environments -

More information

A Semantic Role Repository Linking FrameNet and WordNet

A Semantic Role Repository Linking FrameNet and WordNet A Semantic Role Repository Linking FrameNet and WordNet Volha Bryl, Irina Sergienya, Sara Tonelli, Claudio Giuliano {bryl,sergienya,satonelli,giuliano}@fbk.eu Fondazione Bruno Kessler, Trento, Italy Abstract

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

arxiv: v1 [cs.db] 23 Feb 2016

arxiv: v1 [cs.db] 23 Feb 2016 SIFT: An Algorithm for Extracting Structural Information From Taxonomies Jorge Martinez-Gil, Software Competence Center Hagenberg (Austria), jorgemar@acm.org Keywords: Algorithms; Knowledge Engineering;

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 95-96 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity (Matching)

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful

More information

Ontologies for Agents

Ontologies for Agents Agents on the Web Ontologies for Agents Michael N. Huhns and Munindar P. Singh November 1, 1997 When we need to find the cheapest airfare, we call our travel agent, Betsi, at Prestige Travel. We are able

More information

ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources

ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources Dimitre A. Dimitrov, Roopa Pundaleeka Tech-X Corp. Boulder, CO 80303, USA Email: {dad, roopa}@txcorp.com

More information

SEMANTIC MATCHING APPROACHES

SEMANTIC MATCHING APPROACHES CHAPTER 4 SEMANTIC MATCHING APPROACHES 4.1 INTRODUCTION Semantic matching is a technique used in computer science to identify information which is semantically related. In order to broaden recall, a matching

More information

Similarity Flooding: A versatile Graph Matching Algorithm and its Application to Schema Matching

Similarity Flooding: A versatile Graph Matching Algorithm and its Application to Schema Matching Similarity Flooding: A versatile Graph Matching Algorithm and its Application to Schema Matching Sergey Melnik, Hector Garcia-Molina (Stanford University), and Erhard Rahm (University of Leipzig), ICDE

More information

DBpedia-An Advancement Towards Content Extraction From Wikipedia

DBpedia-An Advancement Towards Content Extraction From Wikipedia DBpedia-An Advancement Towards Content Extraction From Wikipedia Neha Jain Government Degree College R.S Pura, Jammu, J&K Abstract: DBpedia is the research product of the efforts made towards extracting

More information

(Big Data Integration) : :

(Big Data Integration) : : (Big Data Integration) : : 3 # $%&'! ()* +$,- 2/30 ()* + # $%&' = 3 : $ 2 : 17 ;' $ # < 2 6 ' $%&',# +'= > 0 - '? @0 A 1 3/30 3?. - B 6 @* @(C : E6 - > ()* (C :(C E6 1' +'= - ''3-6 F :* 2G '> H-! +'-?

More information

ONTOLOGY MATCHING: A STATE-OF-THE-ART SURVEY

ONTOLOGY MATCHING: A STATE-OF-THE-ART SURVEY ONTOLOGY MATCHING: A STATE-OF-THE-ART SURVEY December 10, 2010 Serge Tymaniuk - Emanuel Scheiber Applied Ontology Engineering WS 2010/11 OUTLINE Introduction Matching Problem Techniques Systems and Tools

More information

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95 ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 94-95 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods

More information

OPPA European Social Fund Prague & EU: We invest in your future.

OPPA European Social Fund Prague & EU: We invest in your future. OPPA European Social Fund Prague & EU: We invest in your future. Introduction, Semantic Networks and the Others... Petr Křemen petr.kremen@fel.cvut.cz FEL ČVUT 1 / 162 Our plan Course Information Crisp

More information

3 Classifications of ontology matching techniques

3 Classifications of ontology matching techniques 3 Classifications of ontology matching techniques Having defined what the matching problem is, we attempt at classifying the techniques that can be used for solving this problem. The major contributions

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2018/19 with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz January 7 th 2019 Overview What is Semantic Web? Technology

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: eb Information Search and Management Query Expansion Prof. Chris Clifton 13 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group Idea: Query Expansion

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) The Entity Relationship Model Lecture 2, January 15, 2014 Mohammad Hammoud Today Last Session: Course overview and a brief introduction on databases and database systems

More information

An Improving for Ranking Ontologies Based on the Structure and Semantics

An Improving for Ranking Ontologies Based on the Structure and Semantics An Improving for Ranking Ontologies Based on the Structure and Semantics S.Anusuya, K.Muthukumaran K.S.R College of Engineering Abstract Ontology specifies the concepts of a domain and their semantic relationships.

More information

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent Semantic Technologies and CDISC Standards Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent Part I Introduction to Semantic Technology Resource Description Framework

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation

More information

Multi-agent and Semantic Web Systems: RDF Data Structures

Multi-agent and Semantic Web Systems: RDF Data Structures Multi-agent and Semantic Web Systems: RDF Data Structures Fiona McNeill School of Informatics 31st January 2013 Fiona McNeill Multi-agent Semantic Web Systems: RDF Data Structures 31st January 2013 0/25

More information

Outline A Survey of Approaches to Automatic Schema Matching. Outline. What is Schema Matching? An Example. Another Example

Outline A Survey of Approaches to Automatic Schema Matching. Outline. What is Schema Matching? An Example. Another Example A Survey of Approaches to Automatic Schema Matching Mihai Virtosu CS7965 Advanced Database Systems Spring 2006 April 10th, 2006 2 What is Schema Matching? A basic problem found in many database application

More information

Punjabi WordNet Relations and Categorization of Synsets

Punjabi WordNet Relations and Categorization of Synsets Punjabi WordNet Relations and Categorization of Synsets Rupinderdeep Kaur Computer Science Engineering Department, Thapar University, rupinderdeep@thapar.edu Suman Preet Department of Linguistics and Punjabi

More information

Knowledge Graph Completion. Mayank Kejriwal (USC/ISI)

Knowledge Graph Completion. Mayank Kejriwal (USC/ISI) Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An intelligent way of doing data cleaning Deduplicating entity nodes (entity resolution) Collective reasoning (probabilistic

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.

Optimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents. Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this

More information

Semantic Web and Natural Language Processing

Semantic Web and Natural Language Processing Semantic Web and Natural Language Processing Wiltrud Kessler Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Semantic Web Winter 2014/2015 This work is licensed under a Creative Commons

More information

Dealing with Uncertain Entities in Ontology Alignment using Rough Sets

Dealing with Uncertain Entities in Ontology Alignment using Rough Sets IEEE TRANSACTIONS ON SYSTEMS,MAN AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, SMCC-2-0-005 Dealing with Uncertain Entities in Ontology Alignment using Rough Sets Sadaqat Jan, Maozhen Li, Hamed Al-Raweshidy,

More information

Semantic Web Fundamentals

Semantic Web Fundamentals Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2017/18 Vedran Sabol with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz December 11 th 2017 Overview What is Semantic

More information

Annotation Science From Theory to Practice and Use Introduction A bit of history

Annotation Science From Theory to Practice and Use Introduction A bit of history Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora

More information

BSC Smart Cities Initiative

BSC Smart Cities Initiative www.bsc.es BSC Smart Cities Initiative José Mª Cela CASE Director josem.cela@bsc.es CITY DATA ACCESS 2 City Data Access 1. Standardize data access (City Semantics) Define a software layer to keep independent

More information

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA Ernesto William De Luca Overview 2 Motivation EuroWordNet RDF/OWL EuroWordNet RDF/OWL LexiRes Tool Conclusions Overview 3 Motivation EuroWordNet

More information

Prof. Dr. Christian Bizer

Prof. Dr. Christian Bizer STI Summit July 6 th, 2011, Riga, Latvia Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline 1. Topology of the Web of Data What data

More information

MIA - Master on Artificial Intelligence

MIA - Master on Artificial Intelligence MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use

More information

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Ornsiri Thonggoom, Il-Yeol Song, Yuan An The ischool at Drexel Philadelphia, PA USA Outline Long Term Research

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data

More information

Data Integration: Schema Mapping

Data Integration: Schema Mapping Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki

More information

Terminologies, Knowledge Organization Systems, Ontologies

Terminologies, Knowledge Organization Systems, Ontologies Terminologies, Knowledge Organization Systems, Ontologies Gerhard Budin University of Vienna TSS July 2012, Vienna Motivation and Purpose Knowledge Organization Systems In this unit of TSS 12, we focus

More information

Semantic Overlay Networks

Semantic Overlay Networks Semantic Overlay Networks Arturo Crespo and Hector Garcia-Molina Write-up by Pavel Serdyukov Saarland University, Department of Computer Science Saarbrücken, December 2003 Content 1 Motivation... 3 2 Introduction

More information

Towards Exploring Semantic Similarity based on WordNet Semantic Dictionary

Towards Exploring Semantic Similarity based on WordNet Semantic Dictionary Towards Exploring Semantic Similarity based on WordNet Semantic Dictionary Alaa Qasim Mohammed Salih Aston University/School of Engineering & Applied Science Oakville, 2238 Whitworth Dr., L6M0B4, Canada

More information

Schema Quality Improving Tasks in the Schema Integration Process

Schema Quality Improving Tasks in the Schema Integration Process 468 Schema Quality Improving Tasks in the Schema Integration Process Peter Bellström Information Systems Karlstad University Karlstad, Sweden e-mail: peter.bellstrom@kau.se Christian Kop Institute for

More information

Knowledge Engineering with Semantic Web Technologies

Knowledge Engineering with Semantic Web Technologies This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning

More information

A Comprehensive Analysis of using Semantic Information in Text Categorization

A Comprehensive Analysis of using Semantic Information in Text Categorization A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

On semi-automated matching and integration of database schemas Ünal-Karakas, Ö.

On semi-automated matching and integration of database schemas Ünal-Karakas, Ö. UvA-DARE (Digital Academic Repository) On semi-automated matching and integration of database schemas Ünal-Karakas, Ö. Link to publication Citation for published version (APA): Ünal Karaka, Ö. (2010).

More information

System Analysis And Design Methods ENTITY RELATIONSHIP DIAGRAM (ERD) Prof. Ali Khaleghi Eng. Hadi Haedar

System Analysis And Design Methods ENTITY RELATIONSHIP DIAGRAM (ERD) Prof. Ali Khaleghi Eng. Hadi Haedar 1 System Analysis And Design Methods ENTITY RELATIONSHIP DIAGRAM (ERD) Prof. Ali Khaleghi Eng. Hadi Haedar Overview DATABASE ARCHITECTURE 2 External level concerned with the way individual users see the

More information

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

Knowledge Representations. How else can we represent knowledge in addition to formal logic? Knowledge Representations How else can we represent knowledge in addition to formal logic? 1 Common Knowledge Representations Formal Logic Production Rules Semantic Nets Schemata and Frames 2 Production

More information

Programming Technologies for Web Resource Mining

Programming Technologies for Web Resource Mining Programming Technologies for Web Resource Mining SoftLang Team, University of Koblenz-Landau Prof. Dr. Ralf Lämmel Msc. Johannes Härtel Msc. Marcel Heinz Motivation What are interesting web resources??

More information

INTEROPERABILITY IN COLLABORATIVE NETWORK OF BIODIVERSITY ORGANIZATIONS

INTEROPERABILITY IN COLLABORATIVE NETWORK OF BIODIVERSITY ORGANIZATIONS 54 INTEROPERABILITY IN COLLABORATIVE NETWORK OF BIODIVERSITY ORGANIZATIONS Ozgul Unal and Hamideh Afsarmanesh University of Amsterdam, THE NETHERLANDS {ozgul, hamideh}@science.uva.nl Schematic and semantic

More information

Computational Cost of Querying for Related Entities in Different Ontologies

Computational Cost of Querying for Related Entities in Different Ontologies Computational Cost of Querying for Related Entities in Different Ontologies Chung Ming Cheung Yinuo Zhang Anand Panangadan Viktor K. Prasanna University of Southern California Los Angeles, CA 90089, USA

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Matching Schemas for Geographical Information Systems Using Semantic Information

Matching Schemas for Geographical Information Systems Using Semantic Information Matching Schemas for Geographical Information Systems Using Semantic Information Christoph Quix, Lemonia Ragia, Linlin Cai, and Tian Gan Informatik V, RWTH Aachen University, Germany {quix, ragia, cai,

More information

1 Definition of Ontologies

1 Definition of Ontologies Ontologies and Urban Databases Ontologies and Urban Databases 1 Definitions of Ontologies 2 Necessity of Ontologies for Urban Applications 3 Why different! 4 Towards Ontologies of Space 5 My own vision

More information

The HMatch 2.0 Suite for Ontology Matchmaking

The HMatch 2.0 Suite for Ontology Matchmaking The HMatch 2.0 Suite for Ontology Matchmaking S. Castano, A. Ferrara, D. Lorusso, and S. Montanelli Università degli Studi di Milano DICo - Via Comelico, 39, 20135 Milano - Italy {castano,ferrara,lorusso,montanelli}@dico.unimi.it

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Engineering and Evaluation Morteza Amini Sharif University of Technology Fall 95-96 Outline Ontology Engineering Class and Class Hierarchy Ontology Evaluation 2 Outline Ontology

More information

Entity Relationship Data Model. Slides by: Shree Jaswal

Entity Relationship Data Model. Slides by: Shree Jaswal Entity Relationship Data Model Slides by: Shree Jaswal Topics: Conceptual Modeling of a database, The Entity-Relationship (ER) Model, Entity Types, Entity Sets, Attributes, and Keys, Relationship Types,

More information

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day

More information

Semantic Web. Tahani Aljehani

Semantic Web. Tahani Aljehani Semantic Web Tahani Aljehani Motivation: Example 1 You are interested in SOAP Web architecture Use your favorite search engine to find the articles about SOAP Keywords-based search You'll get lots of information,

More information

Extensible Dynamic Form Approach for Supplier Discovery

Extensible Dynamic Form Approach for Supplier Discovery Extensible Dynamic Form Approach for Supplier Discovery Yan Kang, Jaewook Kim, and Yun Peng Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County {kangyan1,

More information

A. The following is a tentative list of parts of speech we will use to match an existing parser:

A. The following is a tentative list of parts of speech we will use to match an existing parser: API Functions available under technology owned by ACI A. The following is a tentative list of parts of speech we will use to match an existing parser: adjective adverb interjection noun verb auxiliary

More information

Matching and Alignment: What is the Cost of User Post-match Effort?

Matching and Alignment: What is the Cost of User Post-match Effort? Matching and Alignment: What is the Cost of User Post-match Effort? (Short paper) Fabien Duchateau 1 and Zohra Bellahsene 2 and Remi Coletta 2 1 Norwegian University of Science and Technology NO-7491 Trondheim,

More information

0.1 Knowledge Organization Systems for Semantic Web

0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization

More information

Ontology matching using vector space

Ontology matching using vector space University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 2008 Ontology matching using vector space Zahra Eidoon University of Tehran, Iran Nasser

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Query Expansion Prof. Chris Clifton 28 September 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group Idea: Query Expansion

More information

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual

More information

OWLIM Reasoning over FactForge

OWLIM Reasoning over FactForge OWLIM Reasoning over FactForge Barry Bishop, Atanas Kiryakov, Zdravko Tashev, Mariana Damova, Kiril Simov Ontotext AD, 135 Tsarigradsko Chaussee, Sofia 1784, Bulgaria Abstract. In this paper we present

More information

THE GETTY VOCABULARIES TECHNICAL UPDATE

THE GETTY VOCABULARIES TECHNICAL UPDATE AAT TGN ULAN CONA THE GETTY VOCABULARIES TECHNICAL UPDATE International Working Group Meetings January 7-10, 2013 Joan Cobb Gregg Garcia Information Technology Services J. Paul Getty Trust International

More information

[MS-PICSL]: Internet Explorer PICS Label Distribution and Syntax Standards Support Document

[MS-PICSL]: Internet Explorer PICS Label Distribution and Syntax Standards Support Document [MS-PICSL]: Internet Explorer PICS Label Distribution and Syntax Standards Support Document Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft

More information

Protégé-2000: A Flexible and Extensible Ontology-Editing Environment

Protégé-2000: A Flexible and Extensible Ontology-Editing Environment Protégé-2000: A Flexible and Extensible Ontology-Editing Environment Natalya F. Noy, Monica Crubézy, Ray W. Fergerson, Samson Tu, Mark A. Musen Stanford Medical Informatics Stanford University Stanford,

More information

SOA: Service-Oriented Architecture

SOA: Service-Oriented Architecture SOA: Service-Oriented Architecture Dr. Kanda Runapongsa (krunapon@kku.ac.th) Department of Computer Engineering Khon Kaen University 1 Gartner Prediction The industry analyst firm Gartner recently reported

More information

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD 10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection

More information

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL

QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es

More information

06. Analysis Modeling

06. Analysis Modeling 06. Analysis Modeling Division of Computer Science, College of Computing Hanyang University ERICA Campus 1 st Semester 2017 Overview of Analysis Modeling 1 Requirement Analysis 2 Analysis Modeling Approaches

More information

Knowledge Representation in Social Context. CS227 Spring 2011

Knowledge Representation in Social Context. CS227 Spring 2011 7. Knowledge Representation in Social Context CS227 Spring 2011 Outline Vision for Social Machines From Web to Semantic Web Two Use Cases Summary The Beginning g of Social Machines Image credit: http://www.lifehack.org

More information

Named Entity Detection and Entity Linking in the Context of Semantic Web

Named Entity Detection and Entity Linking in the Context of Semantic Web [1/52] Concordia Seminar - December 2012 Named Entity Detection and in the Context of Semantic Web Exploring the ambiguity question. Eric Charton, Ph.D. [2/52] Concordia Seminar - December 2012 Challenge

More information

SCALABLE MATCHING OF ONTOLOGY GRAPHS USING PARTITIONING

SCALABLE MATCHING OF ONTOLOGY GRAPHS USING PARTITIONING SCALABLE MATCHING OF ONTOLOGY GRAPHS USING PARTITIONING by RAVIKANTH KOLLI (Under the Direction of Prashant Doshi) ABSTRACT The problem of ontology matching is crucial due to decentralized development

More information

Conceptual Data Models for Database Design

Conceptual Data Models for Database Design Conceptual Data Models for Database Design Entity Relationship (ER) Model The most popular high-level conceptual data model is the ER model. It is frequently used for the conceptual design of database

More information

A Rule-Based Approach for the Recognition of Similarities and Differences in the Integration of Structural Karlstad Enterprise Modeling Schemata

A Rule-Based Approach for the Recognition of Similarities and Differences in the Integration of Structural Karlstad Enterprise Modeling Schemata A Rule-Based Approach for the Recognition of Similarities and Differences in the Integration of Structural Karlstad Enterprise Modeling Schemata Peter Bellström Department of Information Systems, Karlstad

More information

IBM Research Report. Model-Driven Business Transformation and Semantic Web

IBM Research Report. Model-Driven Business Transformation and Semantic Web RC23731 (W0509-110) September 30, 2005 Computer Science IBM Research Report Model-Driven Business Transformation and Semantic Web Juhnyoung Lee IBM Research Division Thomas J. Watson Research Center P.O.

More information

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data: Introduction to Semantic Web Angelica Lo Duca IIT-CNR angelica.loduca@iit.cnr.it Linked Open Data: a paradigm for the Semantic Web Course Outline Introduction to SW Give a structure to data (RDF Data Model)

More information

Overview of Record Linkage Techniques

Overview of Record Linkage Techniques Overview of Record Linkage Techniques Record linkage or data matching refers to the process used to identify records which relate to the same entity (e.g. patient, customer, household) in one or more data

More information

UNIK Multiagent systems Lecture 3. Communication. Jonas Moen

UNIK Multiagent systems Lecture 3. Communication. Jonas Moen UNIK4950 - Multiagent systems Lecture 3 Communication Jonas Moen Highlights lecture 3 Communication* Communication fundamentals Reproducing data vs. conveying meaning Ontology and knowledgebase Speech

More information

Semantic Interoperability. Being serious about the Semantic Web

Semantic Interoperability. Being serious about the Semantic Web Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA 1 Being serious about the Semantic Web It is not one person s ontology It is not several people s common

More information