Web Science. Introduction to Information Integration
|
|
- Nathan Austin
- 6 years ago
- Views:
Transcription
1 Web Science Introduction to Information Integration Julien Gaugaz, October 26, 2010
2 Topics 2
3 Topics 1. Information Integration 2
4 Topics 1. Information Integration 2. Web Information Retrieval 2
5 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 2
6 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 2
7 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 5. Collaborative Web 2
8 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 5. Collaborative Web 6. Web Archiving 2
9 Topics 1. Information Integration 2. Web Information Retrieval 3. Entity Search 4. Web Usage 5. Collaborative Web 6. Web Archiving 7. Medical Social Web 2
10 Why Integrating Information? Scenarios
11 Company Mergers 4
12 Company Mergers 4
13 Company Mergers 4
14 Company Mergers 4
15 Travelling Agent Agent 5
16 Booking Flights Agent 6
17 Leveraging Wikipedia Infoboxes Query 7 Data Contribution
18 Evolution 1E+06 Number of Sources 1E+05 1E+04 1E+03 1E+02 1E+01 1E Beginning of Databases Rise of Internet & Wrapping Websites Wikipedia & Social Web 8
19 What is the Problem? Kinds of discrepancies
20 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =
21 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =
22 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =
23 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =
24 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =
25 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =
26 Wikipedia Infoboxes [[(...) Reg. Bürgermeister]]: [[Klaus Wowereit]] [[Höhe]] : m ü. NN [[Einwohner]] : {{Metadaten Einwohnerzahl DE-BE Berlin}}[...] (rendered as: (31. Mai 2010)) leader_title = [[List of mayors of Berlin Governing Mayor]] leader = Klaus Wowereit elevation = pop_date = population = pop_metro =
27 Wikipedia Infoboxes leader_title!! = [[List of mayors of Berlin Governing Mayor]] leader!!! = Klaus Wowereit elevation! = pop_date! = population! = pop_metro! = leader_title! = [[Mayor of San Francisco Mayor]] leader_name! = [[Gavin Newsom]] ([[Democratic [...] D]]) elevation_ft!! = 52 elevation_max_ft!= 925 elevation_min_ft! = 0 population_as_of = 2008 population_total = population_metro = population_urban =
28 Wikipedia Infoboxes leader_title!! = [[List of mayors of Berlin Governing Mayor]] leader!!! = Klaus Wowereit elevation! = pop_date! = population! = pop_metro! = leader_title! = [[Mayor of San Francisco Mayor]] leader_name! = [[Gavin Newsom]] ([[Democratic [...] D]]) elevation_ft!! = 52 elevation_max_ft!= 925 elevation_min_ft! = 0 population_as_of = 2008 population_total = population_metro = population_urban =
29 Wikipedia Infoboxes leader_title!! = [[List of mayors of Berlin Governing Mayor]] leader!!! = Klaus Wowereit elevation! = pop_date! = population! = pop_metro! = leader_title! = [[Mayor of San Francisco Mayor]] leader_name! = [[Gavin Newsom]] ([[Democratic [...] D]]) elevation_ft!! = 52 elevation_max_ft!= 925 elevation_min_ft! = 0 population_as_of = 2008 population_total = population_metro = population_urban =
30 Causes of Discrepancies 12
31 Causes of Discrepancies Information sources are diverse Different cultural background Different domain of activity Different model of information 12
32 Causes of Discrepancies Information sources are diverse Different cultural background Different domain of activity Different model of information Typos and other kinds of errors 12
33 Causes of Discrepancies Information sources are diverse Different cultural background Different domain of activity Different model of information Typos and other kinds of errors Evolution over time Use, usage and users of one source may change of over time 12
34 Places of Discrepancies 13
35 Places of Discrepancies Information level where discrepancies appear: 13
36 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense 13
37 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense Representational Lexical: word / term representing the meaning Structural: how are the terms arranged to represent the meaning 13
38 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense Representational Lexical: word / term representing the meaning Structural: how are the terms arranged to represent the meaning Syntactic: how is the lexical and structural encoded into characters (and bits) 13
39 Places of Discrepancies Information level where discrepancies appear: Semantic: meaning, sense Representational Lexical: word / term representing the meaning Structural: how are the terms arranged to represent the meaning Syntactic: how is the lexical and structural encoded into characters (and bits) Discrepancies may concern: Schema elements (properties and structure) and values 13
40 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Representational Syntactic 14
41 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name first Albert last Einstein Representational Syntactic 14
42 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name last first Albert Einstein Einstein full_name Albert Einstein Representational Syntactic 14
43 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name last first Albert Einstein Einstein full_name Albert Einstein Representational <Einstein> <full_name> Albert Einstein. Syntactic 14
44 Schema Discrepancies Semantic Einstein s full name is Albert Einstein Einstein name last first Albert Einstein Einstein full_name Albert Einstein Representational <Einstein> <full_name> Albert Einstein. Syntactic <Einstein> <full_name>albert Einstein</full_name> </Einstein> 14
45 Schema Ambiguity Semantic Representational xyz title The Theory of Relativity xyz title Prof. Dr. techn. 15
46 Schema Ambiguity Person title Semantic Representational xyz title The Theory of Relativity xyz title Prof. Dr. techn. 15
47 Schema Ambiguity Person title Article title Semantic Representational xyz title The Theory of Relativity xyz title Prof. Dr. techn. 15
48 Value Discrepancies Einstein s full name is Albert Einstein Semantic Representational Einstein full_name Albert Einstein Albert Einstin A. Einstein Einstein, Albert 16
49 Syntactic Level Where discrepancies are addressed with standards
50 Encoding Bytes 18
51 Encoding Bytes Basic unit Universal standard: Bit (binary digit) Ternary digit (base 3, USSR 50 s, out of use) 18
52 Encoding Bytes Basic unit Universal standard: Bit (binary digit) Ternary digit (base 3, USSR 50 s, out of use) Bits into bytes Big or small endian System wise convention, easily convertible, defined in communication protocols 18
53 Encoding Characters 19
54 Encoding Characters De facto standards: UTF-8/16 19
55 Encoding Characters De facto standards: UTF-8/16 Many others exist: ASCII, ISO-8859 s, KOI-8,... 19
56 Encoding Characters De facto standards: UTF-8/16 Many others exist: ASCII, ISO-8859 s, KOI-8,... Trivial dictionary-based translation When the corresponding code exists in the target character map... 19
57 Encoding Lexico-Structural 20
58 Encoding Lexico-Structural XML, XML Schema Structured document serialization format Base for: (X)HTML SVG: Scalable Vector Graphics DOCX: Microsoft Office Word
59 RDF Resource Description Framework Encoding information
60 source: 22
61 source: <subject> <property> <object> 22
62 source: <subject> <property> <object> <subject> URI or blank node 22
63 source: <subject> <property> <object> <property> URI <subject> URI or blank node 22
64 source: <subject> <property> <object> <subject> URI or blank node 22 <property> URI <object> URI or blank node or (typed) literal
65 URI 23
66 URI URI: Universal Resource Identifiers URL s are URI s scheme:scheme-specific-part RDF encourage using URL s 23
67 URI URI: Universal Resource Identifiers URL s are URI s scheme:scheme-specific-part RDF encourage using URL s URL scheme://usr:passwd@domain:port/path? query_string#anchor 23
68 RDF 24
69 RDF Resource Description Framework 24
70 RDF Resource Description Framework Data model specialized in conceptual information modeling 24
71 RDF Resource Description Framework Data model specialized in conceptual information modeling Supported by various serialization formats: XML Notation3 (N3) Turtle... 24
72 RDF Schema (RDF/S) 25
73 RDF Schema (RDF/S) Expressed in RDF 25
74 RDF Schema (RDF/S) Expressed in RDF Types subjects and objects with classes Class hierarchy (with multiple inheritance) Type of properties of a class 25
75 RDF Schema (RDF/S) Expressed in RDF Types subjects and objects with classes Class hierarchy (with multiple inheritance) Type of properties of a class Types properties Domain: type of property s subject Range: type of property s object 25
76 RDF Schema (RDF/S) Expressed in RDF Types subjects and objects with classes Class hierarchy (with multiple inheritance) Type of properties of a class Types properties Domain: type of property s subject Range: type of property s object OWL2 is more expressive: cardinality, etc... 25
77 When to use RDF? 26
78 When to use RDF? RDF is good at Modeling information Especially when schema is unknown or changing When there is multiple schemas 26
79 When to use RDF? RDF is good at Modeling information Especially when schema is unknown or changing When there is multiple schemas RDF is not for Representing documents (XHTML, CSS) Internal data management when schema is known and fixed (Relational Databases) 26
80 Schema Matching Discrepancies between the representational and semantic levels in the schema
81 Boxer Taxpayer name boxer id weight birthdate total fights residence Trainer first name last name age address street city tax id Company Tax Office... 28
82 Boxer Taxpayer name boxer id weight birthdate total fights residence Trainer... Input: Schemas to match... first name last name age address street city tax id Company Tax Office... Possibly data instantiating those schemas 28
83 Boxer Taxpayer name boxer id weight birthdate total fights residence Trainer... Input: Schemas to match first name last name age address street city tax id Company... Tax Office... Possibly data instantiating those schemas Output: Mappings between schema elements Possibly with confidence values and alternatives Possibly with value conversion rules (matchings) 28
84 Mappings or Matching? 29
85 Mappings or Matching? Schema mapping identifies correspondences between schema elements 29
86 Mappings or Matching? Schema mapping identifies correspondences between schema elements Schema matching actually transforms an instance of one schema into an instance of another schema 29
87 How to Use Mappings? General architectures
88 Mediated Schemas Schema2 Schema1 Schema3 31
89 Mediated Schemas Mediated Schema Schema2 Schema1 Schema3 31
90 Mediated Schemas Query Mediated Schema Schema2 Schema1 Schema3 31
91 Mediated Schemas Query Mediated Schema Schema2 Schema1 Schema3 31
92 Mediated Schemas Query Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31
93 Mediated Schemas Query Mediated Schema Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31
94 Mediated Schemas Query Mediated Schema Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31
95 Mediated Schemas Query Query Mediated Schema Mediated Schema Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31
96 Mediated Schemas Query Query Mediated Schema Mediated Schema Query Schema x Schema2 Schema1 Schema3 Schema2 Schema1 Schema3 31
97 Peer Data Management Local Source Local Mapping Peer Schema Local Schema Peer Mapping 32
98 Why not by hand? source: 33 source:
99 Why not by hand? Size and complexity of source schemas Number of schemas sources Leveraging data instance values Schemas not known in advance source: 33 source:
100 Why not by hand? Size and complexity of source schemas Number of schemas sources Leveraging data instance values Schemas not known in advance source: 33 source:
101 Why not by hand? Size and complexity of source schemas Number of schemas sources Leveraging data instance values Schemas not known in advance source: 33 source:
102 Schema Matching Features 34
103 Schema Matching Features Schema-only vs schema & instances 34
104 Schema Matching Features Schema-only vs schema & instances Representational Lexical vs structural 34
105 Schema Matching Features Schema-only vs schema & instances Representational Lexical vs structural Internal vs external 34
106 Schema Matching Features Schema-only vs schema & instances Representational Lexical vs structural Internal vs external More in: Rahm E, Bernstein PA. A survey of approaches to automatic schema matching. The VLDB Journal. 2001;10(4): Shvaiko P, Euzenat J. A Survey of Schema-Based Matching Approaches. Journal on Data Semantics IV. 2005;3730:
107 Schema Matching Techniques 35
108 Schema Matching Techniques String-based 35
109 Schema Matching Techniques String-based Language-based 35
110 Schema Matching Techniques String-based Language-based Linguistic resources 35
111 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based 35
112 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse 35
113 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies 35
114 Schema Matching Techniques String-based Language-based Graph-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies 35
115 Schema Matching Techniques String-based Language-based Linguistic resources Graph-based Taxonomy-based Constraint-based Alignment reuse Upper-level formal ontologies 35
116 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Graph-based Taxonomy-based Repository of structures Upper-level formal ontologies 35
117 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35
118 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35
119 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35
120 Schema Matching Techniques String-based Language-based Linguistic resources Constraint-based Alignment reuse Upper-level formal ontologies Graph-based Taxonomy-based Repository of structures Model-based 35
121 A String-Based Technique Leveraging lexical features
122 Edit Distance 37
123 Edit Distance String distance: measures distance between two strings 37
124 Edit Distance String distance: measures distance between two strings Edit distance: number of operations needed to transform one string into the other 37
125 Edit Distance String distance: measures distance between two strings Edit distance: number of operations needed to transform one string into the other Common basic operations: Insert, delete or substitute one character Possibly with different weights depending on the operation and characters involved 37
126 Edit Distance String distance: measures distance between two strings Edit distance: number of operations needed to transform one string into the other Common basic operations: Insert, delete or substitute one character Possibly with different weights depending on the operation and characters involved Java libraries: SecondString, SimMetrics 37
127 Levenshtein Distance S u n d a y s S a t u r d a y
128 Levenshtein Distance Edit operations: insert, delete, substitute S a t u r d a y S u n d a y s
129 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
130 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
131 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
132 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
133 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
134 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
135 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
136 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
137 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
138 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 S u n d a y s S a t u r d a y
139 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a y s S a t u r d a y
140 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays y s
141 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays y s
142 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays y s
143 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays y s
144 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays y s
145 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays Saturdays y s
146 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays Saturdays y Saturdays s
147 Levenshtein Distance Edit operations: insert, delete, substitute Each has a weight of 1 delete from Sundays insert to Sundays substitute in Sundays S u n d a S a t u r d a y Sundays Satundays Satundays Saturdays Saturdays Saturdays y s Saturdays Saturdays 38
148 A Linguistic Resource WordNet
149 WordNet 40
150 WordNet Fundamental components: Synonyn Sets (Synsets) 40
151 WordNet Fundamental components: Synonyn Sets (Synsets) {car, auto, automobile, machine, motorcar} a motor vehicle with four wheels; usually propelled by an internal combustion engine 40
152 WordNet Fundamental components: Synonyn Sets (Synsets) {car, auto, automobile, machine, motorcar} a motor vehicle with four wheels; usually propelled by an internal combustion engine {car, railcar, railway car, railroad car} a wheeled vehicle adapted to the rails of railroad 40
153 Hypernyms / Hyponyms Hypernyms: superordinates, isa relationships. A synset may have more than one hypernym. Hyponyms: subordinates {motor vehicle, automotive vehicle} hypernym {car, auto, automobile, machine, motorcar} hyponyms {cab, hack, taxi, taxicab} {ambulance} 41
154 Holonym / Meronym Meronym: name of a constituent part of, the substance of, or a member of something. X is a meronym of Y if X is a part of Y. Holonym: name of the whole of which the meronym names a part. Y is a holonym of X if X is a part of Y. {car, auto, automobile, machine, motorcar} holonym meronym { accelerator, accelerator pedal, gas pedal, gas, throttle, gun} 42
155 Other relationships in WN 43
156 Other relationships in WN Antonym 43
157 Other relationships in WN Antonym Entailment (for verbs) A verb X entails Y if X cannot be done unless Y is, or has been, done. 43
158 Other relationships in WN Antonym Entailment (for verbs) A verb X entails Y if X cannot be done unless Y is, or has been, done. Attribute (for adjectives) A noun for which adjectives express values. The noun weight is an attribute, for which the adjectives light and heavy express values. 43
159 A Graph-Matching Technique Leveraging structure
160 Similarity Flooding 45
161 Similarity Flooding Uses structure of the data to help matching schemas 45
162 Similarity Flooding Uses structure of the data to help matching schemas Similarity Flooding in Melnik et al. (2002) First maps schema elements with lexical similarity Then improves matching assuming that: If two elements are similar, then the elements adjacent to them are more probable to be similar 45
163 Similarity Flooding Uses structure of the data to help matching schemas Similarity Flooding in Melnik et al. (2002) First maps schema elements with lexical similarity Then improves matching assuming that: If two elements are similar, then the elements adjacent to them are more probable to be similar Selected paper 1: Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: a versatile graph matching algorithm and its application to schema matching. IEEE Comput. Soc; 2002:
164 Deduplication Detecting duplicate entries
165 Why is there Duplicates? name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Sport Authorities Administrationwide database first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # Taxes Authorities 47
166 name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY R M name: Muhammad Ali address: city: Cairo country: Egypt tax id: # first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # U Input: 2 entities with matched attributes Output: M for matched or U for unmatched. Possibly R for reject between M and U for cases where supervised decision is necessary. 48
167 Deduplication Features
168 Field Distance Metrics 50
169 Field Distance Metrics Value metrics Character-based Token-based Phonetic Numeric 50
170 Field Distance Metrics Value metrics Character-based String-based metrics seen for schema matching Token-based Phonetic Numeric 50
171 Field Distance Metrics Value metrics Character-based Token-based String-based metrics seen for schema matching Similar to Information Retrieval techniques (Topic 2 next week) Phonetic Numeric 50
172 Field Distance Metrics Value metrics Character-based Token-based String-based metrics seen for schema matching Similar to Information Retrieval techniques (Topic 2 next week) Phonetic Numeric 50 Not much techniques other than considering them as strings or direct difference
173 Field Distance Metrics Value metrics Character-based Token-based String-based metrics seen for schema matching Similar to Information Retrieval techniques (Topic 2 next week) Phonetic Numeric 50 Not much techniques other than considering them as strings or direct difference
174 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary 51
175 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 51
176 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 51
177 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 2.A2 26a132o5 51
178 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 2.A2 26a132o5 3.A26a132o5 51
179 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Ashcraftson 1.Ashcraftson 2.A2 26a132o5 3.A26a132o5 4.A261 51
180 Phonex 1. First letter as prefix 2. Encode non-prefix consonants 3. Remove duplicate adjacent codes not separated by a vowel 4. Drop vowels and truncate to prefix and max 3 codes, resp. pad with zero if necessary consonant code b, f, p, v 1 c, g, j, k, q, s, x, z 2 d, t 3 l 4 m, n 5 r 6 h, w dropped Rupert 1.Rupert 2.Ro1e63 3.Ro1e63 4.R163 Ashcraftson 1.Ashcraftson 2.A2 26a132o5 3.A26a132o5 4.A261 Robert 1.Robert 2.Ro1e63 3.Ro1e63 4.R163 51
181 Other Phonetic Codes 52
182 Other Phonetic Codes NYSIIS Developed and still in use at the New York State Division of Criminal Justice Services Encodes vowels (mostly to A) Codes are letters instead of digits Longer codes (6 instead of 4) 52
183 Other Phonetic Codes 53
184 Other Phonetic Codes Metaphone Codes are letters instead of digits No maximum code length More elaborated coding rules 53
185 Other Phonetic Codes Metaphone Codes are letters instead of digits No maximum code length More elaborated coding rules Double Metaphone Returns a secondary code to help disambiguate 53
186 Detecting Duplicates
187 Bayes Decision Rule 55
188 Bayes Decision Rule M: match, U: unmatch M if p(m x) p(u x) U otherwise 55
189 Bayes Decision Rule M: match, U: unmatch M U Using Bayes rule if p(m x) p(u x) otherwise p(m x) p(u x) p(m x) p(x) p(u x) p(x) p(m)p(x M) p(u)p(x U) l(x) = p(x M) p(x U) p(u) p(m) 55
190 Bayes Decision Rule M: match, U: unmatch M if p(m x) p(u x) U otherwise Using Bayes rule p(m x) p(u x) Decision rule: likelihood ratio l(x) = p(x M) p(x U) p(u) p(m) p(m x) p(x) p(u x) p(x) p(m)p(x M) p(u)p(x U) l(x) = p(x M) p(x U) p(u) p(m) 55
191 Bayes Decision Rule M: match, U: unmatch M U Using Bayes rule if p(m x) p(u x) otherwise p(m x) p(u x) p(m x) p(x) p(u x) p(x) p(m)p(x M) p(u)p(x U) l(x) = p(x M) p(x U) p(u) p(m) Decision rule: likelihood ratio l(x) = p(x M) p(x U) p(u) p(m) Using independence assumption p(x M) = i p(x U) = i p(x i M) p(x i U) 55
192 Bayes Decision Rule p(x i M) p(x i U) 56
193 Bayes Decision Rule Priors ( p(x i M) and p(x i U) ) can be learned on a training set 56
194 Bayes Decision Rule Priors ( p(x i M) and p(x i U) ) can be learned on a training set Other methods based on Expectation- Maximisation (EM) algorithm can estimate priors without training set 56
195 Clustering-Based Decision Selected paper 2: Chaudhuri S, Ganti V, Motwani R. Robust Identification of Fuzzy Duplicates. ICDE :
196 Clustering-Based Decision X-Means Variant of K-Means without a fixed K Chauduri et al. observed that duplicates tend Using clustering techniques with appropriate parameters 1. to have small distances from each other (compact set property), and 2. 2) to have only a small number of other neighbors within a small distance (sparse neighborhood property). Selected paper 2: Chaudhuri S, Ganti V, Motwani R. Robust Identification of Fuzzy Duplicates. ICDE :
197 Dealing with O(n 2 ) Number of comparisons 1E E+11 5E E+11 0E ' ' ' '000 1'000'000 Number of entities in repository 58
198 Canopies 59
199 Canopies Create canopies using a cheap similarity metric Overlapping clusters 59
200 Canopies Create canopies using a cheap similarity metric Overlapping clusters Compare entities pairwise using a more expensive similarity metric 59
201 Dataspaces Pay-as-you-go Information Integration
202 Dataspaces Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:
203 Dataspaces Note a data integration approach per se Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:
204 Dataspaces Note a data integration approach per se Data co-existence appraoch Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:
205 Dataspaces Note a data integration approach per se Data co-existence appraoch Pay-as-you-go data integration Leveraging human contributions for data integration in a non-invasive manner Selected paper 3: Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006:
206 Relationship between Schema Matching and Deduplication first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:
207 Relationship between Schema Matching and Deduplication Are they duplicates? first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:
208 Relationship between Schema Matching and Deduplication Are they duplicates? To compare field values we need schema matches first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:
209 Relationship between Schema Matching and Deduplication Are they duplicates? To compare field values we need schema matches first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # To find schema matches we need duplicates 62 name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:
210 Relationship between Schema Matching and Deduplication Are they duplicates? To compare field values we need schema matches first name: Mohamed last name: Ali age: 68 address: street: Nicestreet 17 city: Wondercity tax id: # To find schema matches we need duplicates etc name: Muhammad Ali boxer id: weight: 200 lb total fights: 61 residence: 17, Nicestreet Louisville, KY Selected paper 4: Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:
211 Selected Topic Papers 1. Schema Matching Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: a versatile graph matching algorithm and its application to schema matching. IEEE Comput. Soc; 2002: Deduplication Chaudhuri S, Ganti V, Motwani R. Robust Identification of Fuzzy Duplicates. ICDE : Dataspaces Halevy AY, Franklin M, Maier D. Principles of dataspace systems. In: PODS 06. New York, NY, USA; 2006: Interdependence between schema matching and deduplication Zhou X, Gaugaz J, Balke W-T, Nejdl W. Query relaxation using malleable schemas. SIGMOD Beijing, China; 2007:
Automatic Construction of WordNets by Using Machine Translation and Language Modeling
Automatic Construction of WordNets by Using Machine Translation and Language Modeling Martin Saveski, Igor Trajkovski Information Society Language Technologies Ljubljana 2010 1 Outline WordNet Motivation
More informationA Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation
A Tool for Semi-Automated Semantic Schema Mapping: Design and Implementation Dimitris Manakanatas, Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. P.O. Box 1385, GR 71110, Heraklion, Greece
More informationWhat you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web
What you have learned so far Interoperability Introduction to the Semantic Web Tutorial at ISWC 2010 Jérôme Euzenat Data can be expressed in RDF Linked through URIs Modelled with OWL ontologies & Retrieved
More informationEvaluation Two Label Matching Approaches For Indonesian Language
Evaluation Two Label Matching Approaches For Indonesian Language Lintang Yuniar Banowosari, I Wayan Simri Wicaksana Gunadarma University Jl. Margonda Raya 00 Pondok Cina Depok 6424 (62-2) 78882 ext.309
More informationLimitations of XPath & XQuery in an Environment with Diverse Schemes
Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany 23.06.2003
More informationPRIOR System: Results for OAEI 2006
PRIOR System: Results for OAEI 2006 Ming Mao, Yefei Peng University of Pittsburgh, Pittsburgh, PA, USA {mingmao,ypeng}@mail.sis.pitt.edu Abstract. This paper summarizes the results of PRIOR system, which
More informationWEIGHTING QUERY TERMS USING WORDNET ONTOLOGY
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Relevance Feedback 2 February 2016 Prof. Chris Clifton Project 1 Start Now Project 1 is at the course web site Took a little longer than we expected Due date is Feb. 22,
More informationProposal for Implementing Linked Open Data on Libraries Catalogue
Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:
More informationRiMOM Results for OAEI 2008
RiMOM Results for OAEI 2008 Xiao Zhang 1, Qian Zhong 1, Juanzi Li 1, Jie Tang 1, Guotong Xie 2 and Hanyu Li 2 1 Department of Computer Science and Technology, Tsinghua University, China {zhangxiao,zhongqian,ljz,tangjie}@keg.cs.tsinghua.edu.cn
More informationContributions to the Study of Semantic Interoperability in Multi-Agent Environments - An Ontology Based Approach
Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. V (2010), No. 5, pp. 946-952 Contributions to the Study of Semantic Interoperability in Multi-Agent Environments -
More informationA Semantic Role Repository Linking FrameNet and WordNet
A Semantic Role Repository Linking FrameNet and WordNet Volha Bryl, Irina Sergienya, Sara Tonelli, Claudio Giuliano {bryl,sergienya,satonelli,giuliano}@fbk.eu Fondazione Bruno Kessler, Trento, Italy Abstract
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationarxiv: v1 [cs.db] 23 Feb 2016
SIFT: An Algorithm for Extracting Structural Information From Taxonomies Jorge Martinez-Gil, Software Competence Center Hagenberg (Austria), jorgemar@acm.org Keywords: Algorithms; Knowledge Engineering;
More informationSemantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96
ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 95-96 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity (Matching)
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationA GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS
A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in
More informationCOMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE
COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful
More informationOntologies for Agents
Agents on the Web Ontologies for Agents Michael N. Huhns and Munindar P. Singh November 1, 1997 When we need to find the cheapest airfare, we call our travel agent, Betsi, at Prestige Travel. We are able
More informationISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources
ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources Dimitre A. Dimitrov, Roopa Pundaleeka Tech-X Corp. Boulder, CO 80303, USA Email: {dad, roopa}@txcorp.com
More informationSEMANTIC MATCHING APPROACHES
CHAPTER 4 SEMANTIC MATCHING APPROACHES 4.1 INTRODUCTION Semantic matching is a technique used in computer science to identify information which is semantically related. In order to broaden recall, a matching
More informationSimilarity Flooding: A versatile Graph Matching Algorithm and its Application to Schema Matching
Similarity Flooding: A versatile Graph Matching Algorithm and its Application to Schema Matching Sergey Melnik, Hector Garcia-Molina (Stanford University), and Erhard Rahm (University of Leipzig), ICDE
More informationDBpedia-An Advancement Towards Content Extraction From Wikipedia
DBpedia-An Advancement Towards Content Extraction From Wikipedia Neha Jain Government Degree College R.S Pura, Jammu, J&K Abstract: DBpedia is the research product of the efforts made towards extracting
More information(Big Data Integration) : :
(Big Data Integration) : : 3 # $%&'! ()* +$,- 2/30 ()* + # $%&' = 3 : $ 2 : 17 ;' $ # < 2 6 ' $%&',# +'= > 0 - '? @0 A 1 3/30 3?. - B 6 @* @(C : E6 - > ()* (C :(C E6 1' +'= - ''3-6 F :* 2G '> H-! +'-?
More informationONTOLOGY MATCHING: A STATE-OF-THE-ART SURVEY
ONTOLOGY MATCHING: A STATE-OF-THE-ART SURVEY December 10, 2010 Serge Tymaniuk - Emanuel Scheiber Applied Ontology Engineering WS 2010/11 OUTLINE Introduction Matching Problem Techniques Systems and Tools
More informationSemantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 94-95
ه عا ی Semantic Web Ontology Alignment Morteza Amini Sharif University of Technology Fall 94-95 Outline The Problem of Ontologies Ontology Heterogeneity Ontology Alignment Overall Process Similarity Methods
More informationOPPA European Social Fund Prague & EU: We invest in your future.
OPPA European Social Fund Prague & EU: We invest in your future. Introduction, Semantic Networks and the Others... Petr Křemen petr.kremen@fel.cvut.cz FEL ČVUT 1 / 162 Our plan Course Information Crisp
More information3 Classifications of ontology matching techniques
3 Classifications of ontology matching techniques Having defined what the matching problem is, we attempt at classifying the techniques that can be used for solving this problem. The major contributions
More informationSemantic Web Fundamentals
Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2018/19 with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz January 7 th 2019 Overview What is Semantic Web? Technology
More informationCS47300: Web Information Search and Management
CS47300: eb Information Search and Management Query Expansion Prof. Chris Clifton 13 September 2017 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group Idea: Query Expansion
More informationDatabase Applications (15-415)
Database Applications (15-415) The Entity Relationship Model Lecture 2, January 15, 2014 Mohammad Hammoud Today Last Session: Course overview and a brief introduction on databases and database systems
More informationAn Improving for Ranking Ontologies Based on the Structure and Semantics
An Improving for Ranking Ontologies Based on the Structure and Semantics S.Anusuya, K.Muthukumaran K.S.R College of Engineering Abstract Ontology specifies the concepts of a domain and their semantic relationships.
More informationSemantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent
Semantic Technologies and CDISC Standards Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent Part I Introduction to Semantic Technology Resource Description Framework
More informationOntology Matching with CIDER: Evaluation Report for the OAEI 2008
Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task
More informationTaxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company
Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation
More informationMulti-agent and Semantic Web Systems: RDF Data Structures
Multi-agent and Semantic Web Systems: RDF Data Structures Fiona McNeill School of Informatics 31st January 2013 Fiona McNeill Multi-agent Semantic Web Systems: RDF Data Structures 31st January 2013 0/25
More informationOutline A Survey of Approaches to Automatic Schema Matching. Outline. What is Schema Matching? An Example. Another Example
A Survey of Approaches to Automatic Schema Matching Mihai Virtosu CS7965 Advanced Database Systems Spring 2006 April 10th, 2006 2 What is Schema Matching? A basic problem found in many database application
More informationPunjabi WordNet Relations and Categorization of Synsets
Punjabi WordNet Relations and Categorization of Synsets Rupinderdeep Kaur Computer Science Engineering Department, Thapar University, rupinderdeep@thapar.edu Suman Preet Department of Linguistics and Punjabi
More informationKnowledge Graph Completion. Mayank Kejriwal (USC/ISI)
Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An intelligent way of doing data cleaning Deduplicating entity nodes (entity resolution) Collective reasoning (probabilistic
More informationCHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS
82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the
More informationOptimal Query. Assume that the relevant set of documents C r. 1 N C r d j. d j. Where N is the total number of documents.
Optimal Query Assume that the relevant set of documents C r are known. Then the best query is: q opt 1 C r d j C r d j 1 N C r d j C r d j Where N is the total number of documents. Note that even this
More informationSemantic Web and Natural Language Processing
Semantic Web and Natural Language Processing Wiltrud Kessler Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Semantic Web Winter 2014/2015 This work is licensed under a Creative Commons
More informationDealing with Uncertain Entities in Ontology Alignment using Rough Sets
IEEE TRANSACTIONS ON SYSTEMS,MAN AND CYBERNETICS PART C: APPLICATIONS AND REVIEWS, SMCC-2-0-005 Dealing with Uncertain Entities in Ontology Alignment using Rough Sets Sadaqat Jan, Maozhen Li, Hamed Al-Raweshidy,
More informationSemantic Web Fundamentals
Semantic Web Fundamentals Web Technologies (706.704) 3SSt VU WS 2017/18 Vedran Sabol with acknowledgements to P. Höfler, V. Pammer, W. Kienreich ISDS, TU Graz December 11 th 2017 Overview What is Semantic
More informationAnnotation Science From Theory to Practice and Use Introduction A bit of history
Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora
More informationBSC Smart Cities Initiative
www.bsc.es BSC Smart Cities Initiative José Mª Cela CASE Director josem.cela@bsc.es CITY DATA ACCESS 2 City Data Access 1. Standardize data access (City Semantics) Define a software layer to keep independent
More informationINTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca
INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA Ernesto William De Luca Overview 2 Motivation EuroWordNet RDF/OWL EuroWordNet RDF/OWL LexiRes Tool Conclusions Overview 3 Motivation EuroWordNet
More informationProf. Dr. Christian Bizer
STI Summit July 6 th, 2011, Riga, Latvia Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline 1. Topology of the Web of Data What data
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use
More informationSemi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories
Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories Ornsiri Thonggoom, Il-Yeol Song, Yuan An The ischool at Drexel Philadelphia, PA USA Outline Long Term Research
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Data
More informationData Integration: Schema Mapping
Data Integration: Schema Mapping Jan Chomicki University at Buffalo and Warsaw University March 8, 2007 Jan Chomicki (UB/UW) Data Integration: Schema Mapping March 8, 2007 1 / 13 Data integration Jan Chomicki
More informationTerminologies, Knowledge Organization Systems, Ontologies
Terminologies, Knowledge Organization Systems, Ontologies Gerhard Budin University of Vienna TSS July 2012, Vienna Motivation and Purpose Knowledge Organization Systems In this unit of TSS 12, we focus
More informationSemantic Overlay Networks
Semantic Overlay Networks Arturo Crespo and Hector Garcia-Molina Write-up by Pavel Serdyukov Saarland University, Department of Computer Science Saarbrücken, December 2003 Content 1 Motivation... 3 2 Introduction
More informationTowards Exploring Semantic Similarity based on WordNet Semantic Dictionary
Towards Exploring Semantic Similarity based on WordNet Semantic Dictionary Alaa Qasim Mohammed Salih Aston University/School of Engineering & Applied Science Oakville, 2238 Whitworth Dr., L6M0B4, Canada
More informationSchema Quality Improving Tasks in the Schema Integration Process
468 Schema Quality Improving Tasks in the Schema Integration Process Peter Bellström Information Systems Karlstad University Karlstad, Sweden e-mail: peter.bellstrom@kau.se Christian Kop Institute for
More informationKnowledge Engineering with Semantic Web Technologies
This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning
More informationA Comprehensive Analysis of using Semantic Information in Text Categorization
A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationOn semi-automated matching and integration of database schemas Ünal-Karakas, Ö.
UvA-DARE (Digital Academic Repository) On semi-automated matching and integration of database schemas Ünal-Karakas, Ö. Link to publication Citation for published version (APA): Ünal Karaka, Ö. (2010).
More informationSystem Analysis And Design Methods ENTITY RELATIONSHIP DIAGRAM (ERD) Prof. Ali Khaleghi Eng. Hadi Haedar
1 System Analysis And Design Methods ENTITY RELATIONSHIP DIAGRAM (ERD) Prof. Ali Khaleghi Eng. Hadi Haedar Overview DATABASE ARCHITECTURE 2 External level concerned with the way individual users see the
More informationKnowledge Representations. How else can we represent knowledge in addition to formal logic?
Knowledge Representations How else can we represent knowledge in addition to formal logic? 1 Common Knowledge Representations Formal Logic Production Rules Semantic Nets Schemata and Frames 2 Production
More informationProgramming Technologies for Web Resource Mining
Programming Technologies for Web Resource Mining SoftLang Team, University of Koblenz-Landau Prof. Dr. Ralf Lämmel Msc. Johannes Härtel Msc. Marcel Heinz Motivation What are interesting web resources??
More informationINTEROPERABILITY IN COLLABORATIVE NETWORK OF BIODIVERSITY ORGANIZATIONS
54 INTEROPERABILITY IN COLLABORATIVE NETWORK OF BIODIVERSITY ORGANIZATIONS Ozgul Unal and Hamideh Afsarmanesh University of Amsterdam, THE NETHERLANDS {ozgul, hamideh}@science.uva.nl Schematic and semantic
More informationComputational Cost of Querying for Related Entities in Different Ontologies
Computational Cost of Querying for Related Entities in Different Ontologies Chung Ming Cheung Yinuo Zhang Anand Panangadan Viktor K. Prasanna University of Southern California Los Angeles, CA 90089, USA
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationMatching Schemas for Geographical Information Systems Using Semantic Information
Matching Schemas for Geographical Information Systems Using Semantic Information Christoph Quix, Lemonia Ragia, Linlin Cai, and Tian Gan Informatik V, RWTH Aachen University, Germany {quix, ragia, cai,
More information1 Definition of Ontologies
Ontologies and Urban Databases Ontologies and Urban Databases 1 Definitions of Ontologies 2 Necessity of Ontologies for Urban Applications 3 Why different! 4 Towards Ontologies of Space 5 My own vision
More informationThe HMatch 2.0 Suite for Ontology Matchmaking
The HMatch 2.0 Suite for Ontology Matchmaking S. Castano, A. Ferrara, D. Lorusso, and S. Montanelli Università degli Studi di Milano DICo - Via Comelico, 39, 20135 Milano - Italy {castano,ferrara,lorusso,montanelli}@dico.unimi.it
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationSemantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96
ه عا ی Semantic Web Ontology Engineering and Evaluation Morteza Amini Sharif University of Technology Fall 95-96 Outline Ontology Engineering Class and Class Hierarchy Ontology Evaluation 2 Outline Ontology
More informationEntity Relationship Data Model. Slides by: Shree Jaswal
Entity Relationship Data Model Slides by: Shree Jaswal Topics: Conceptual Modeling of a database, The Entity-Relationship (ER) Model, Entity Types, Entity Sets, Attributes, and Keys, Relationship Types,
More informationPresented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu
Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day
More informationSemantic Web. Tahani Aljehani
Semantic Web Tahani Aljehani Motivation: Example 1 You are interested in SOAP Web architecture Use your favorite search engine to find the articles about SOAP Keywords-based search You'll get lots of information,
More informationExtensible Dynamic Form Approach for Supplier Discovery
Extensible Dynamic Form Approach for Supplier Discovery Yan Kang, Jaewook Kim, and Yun Peng Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County {kangyan1,
More informationA. The following is a tentative list of parts of speech we will use to match an existing parser:
API Functions available under technology owned by ACI A. The following is a tentative list of parts of speech we will use to match an existing parser: adjective adverb interjection noun verb auxiliary
More informationMatching and Alignment: What is the Cost of User Post-match Effort?
Matching and Alignment: What is the Cost of User Post-match Effort? (Short paper) Fabien Duchateau 1 and Zohra Bellahsene 2 and Remi Coletta 2 1 Norwegian University of Science and Technology NO-7491 Trondheim,
More information0.1 Knowledge Organization Systems for Semantic Web
0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization
More informationOntology matching using vector space
University of Wollongong Research Online University of Wollongong in Dubai - Papers University of Wollongong in Dubai 2008 Ontology matching using vector space Zahra Eidoon University of Tehran, Iran Nasser
More informationCS47300: Web Information Search and Management
CS47300: Web Information Search and Management Query Expansion Prof. Chris Clifton 28 September 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group Idea: Query Expansion
More informationLinked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library
Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual
More informationOWLIM Reasoning over FactForge
OWLIM Reasoning over FactForge Barry Bishop, Atanas Kiryakov, Zdravko Tashev, Mariana Damova, Kiril Simov Ontotext AD, 135 Tsarigradsko Chaussee, Sofia 1784, Bulgaria Abstract. In this paper we present
More informationTHE GETTY VOCABULARIES TECHNICAL UPDATE
AAT TGN ULAN CONA THE GETTY VOCABULARIES TECHNICAL UPDATE International Working Group Meetings January 7-10, 2013 Joan Cobb Gregg Garcia Information Technology Services J. Paul Getty Trust International
More information[MS-PICSL]: Internet Explorer PICS Label Distribution and Syntax Standards Support Document
[MS-PICSL]: Internet Explorer PICS Label Distribution and Syntax Standards Support Document Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft
More informationProtégé-2000: A Flexible and Extensible Ontology-Editing Environment
Protégé-2000: A Flexible and Extensible Ontology-Editing Environment Natalya F. Noy, Monica Crubézy, Ray W. Fergerson, Samson Tu, Mark A. Musen Stanford Medical Informatics Stanford University Stanford,
More informationSOA: Service-Oriented Architecture
SOA: Service-Oriented Architecture Dr. Kanda Runapongsa (krunapon@kku.ac.th) Department of Computer Engineering Khon Kaen University 1 Gartner Prediction The industry analyst firm Gartner recently reported
More informationText Mining. Munawar, PhD. Text Mining - Munawar, PhD
10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection
More informationQUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL
QUERY EXPANSION USING WORDNET WITH A LOGICAL MODEL OF INFORMATION RETRIEVAL David Parapar, Álvaro Barreiro AILab, Department of Computer Science, University of A Coruña, Spain dparapar@udc.es, barreiro@udc.es
More information06. Analysis Modeling
06. Analysis Modeling Division of Computer Science, College of Computing Hanyang University ERICA Campus 1 st Semester 2017 Overview of Analysis Modeling 1 Requirement Analysis 2 Analysis Modeling Approaches
More informationKnowledge Representation in Social Context. CS227 Spring 2011
7. Knowledge Representation in Social Context CS227 Spring 2011 Outline Vision for Social Machines From Web to Semantic Web Two Use Cases Summary The Beginning g of Social Machines Image credit: http://www.lifehack.org
More informationNamed Entity Detection and Entity Linking in the Context of Semantic Web
[1/52] Concordia Seminar - December 2012 Named Entity Detection and in the Context of Semantic Web Exploring the ambiguity question. Eric Charton, Ph.D. [2/52] Concordia Seminar - December 2012 Challenge
More informationSCALABLE MATCHING OF ONTOLOGY GRAPHS USING PARTITIONING
SCALABLE MATCHING OF ONTOLOGY GRAPHS USING PARTITIONING by RAVIKANTH KOLLI (Under the Direction of Prashant Doshi) ABSTRACT The problem of ontology matching is crucial due to decentralized development
More informationConceptual Data Models for Database Design
Conceptual Data Models for Database Design Entity Relationship (ER) Model The most popular high-level conceptual data model is the ER model. It is frequently used for the conceptual design of database
More informationA Rule-Based Approach for the Recognition of Similarities and Differences in the Integration of Structural Karlstad Enterprise Modeling Schemata
A Rule-Based Approach for the Recognition of Similarities and Differences in the Integration of Structural Karlstad Enterprise Modeling Schemata Peter Bellström Department of Information Systems, Karlstad
More informationIBM Research Report. Model-Driven Business Transformation and Semantic Web
RC23731 (W0509-110) September 30, 2005 Computer Science IBM Research Report Model-Driven Business Transformation and Semantic Web Juhnyoung Lee IBM Research Division Thomas J. Watson Research Center P.O.
More informationa paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:
Introduction to Semantic Web Angelica Lo Duca IIT-CNR angelica.loduca@iit.cnr.it Linked Open Data: a paradigm for the Semantic Web Course Outline Introduction to SW Give a structure to data (RDF Data Model)
More informationOverview of Record Linkage Techniques
Overview of Record Linkage Techniques Record linkage or data matching refers to the process used to identify records which relate to the same entity (e.g. patient, customer, household) in one or more data
More informationUNIK Multiagent systems Lecture 3. Communication. Jonas Moen
UNIK4950 - Multiagent systems Lecture 3 Communication Jonas Moen Highlights lecture 3 Communication* Communication fundamentals Reproducing data vs. conveying meaning Ontology and knowledgebase Speech
More informationSemantic Interoperability. Being serious about the Semantic Web
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA 1 Being serious about the Semantic Web It is not one person s ontology It is not several people s common
More information