Relational Clustering for Multi-type Entity Resolution

Size: px
Start display at page:

Download "Relational Clustering for Multi-type Entity Resolution"

Transcription

1 Relational Clustering for Multi-type Entity Resolution Indrajit Bhattacharya and Lise Getoor Department of Computer Science, University of Maryland Presented by Martin Leginus 13th of March, 2013

2 Agenda Motivation Related work Use case scenarios Problem formulation Relational clustering Similarity measures Results Discussion

3 Why there is a need for entity resolution? The correspondence problem - 2 pictures refer to the same entity. Natural language processing - recognizing which noun phrases refer to the same entity. Data preprocessing - detection of duplicates.

4 Why there is a need for relational entity resolution? Traditional approaches utilize textual similarity measures. Collective Entity Resolution in Relational Data 3 Jim Doe Jason Doe J Doe James Doe James Doe Jonathan Doe Jonathan Doe Jason Doe Jackie Doe Jon Doe Jeanette Doe Jason Doe Jeanette Doe Jackie Doe Jean Doe (a) Relational evidences might improve the accuracy of the resolution. Fig. 1. Example of (a) a reference graph for simple example given in the text and (b) the resolved tit h (b)

5 Related work Textual similarity calculated for the descriptions of two entities. Supervised alg. that learn string similarity measures from labelled data. Performance is improved with blocking approach. Relational features considered for data integration problems.

6 Use case example Two citation examples of the same paper: Fast algorithms for mining association rules in large databases. Agrawal, Rakesh and Srikant, Ramakrishnan. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 1994 Fast algorithms for mining association rules. Agrawal, R., Srikant, R. in VLDB-94,1994 String edit distance does not work. Multiple entity resolution problem i.e., author, paper and venue entities.

7 $\(-8QL4L7M;LR2Q\50F0L9", D?F"F0"5+96+0"26ST$;CRL3KM$/ 0"5;F %S0_' (-+EKMQ+"+U#L$]+M0C,; ž 1X(*mf $'.BEEe >D?0 74^'.*$]4""()Ce Joint resolution using entity relations R Agrawal R Srikant Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Fast Algorithms for Mining Association Rules in Large Databases r1 e1 r5 GÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇ GÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇ ÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ GÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ GÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ r2 e2 r6 GÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ ËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGË r3 e3 r7 ËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGË ËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGË VLDB 94 C1 Proc of the 20th Intl. Conference... C2 GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ r4 e4 r8 GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ h1 h2 ko"d kmkmd Local and global resolution. ç-üs Ü^Ó:Ò!ßÏUßÝ[Ó ásãvåmó+ánñió&âßòï.áiðmâ"èvö HØéFæIÓÞ"à7Ò#Ò#Ó ê Ï[Ð7ÑIÒ#ÓcÔ7Õ&Ö% MØÙAÚn 7ÛÜÝ[Ó2Þ?Ï[ß ßÏ[àMásâ!ÜI ÒâÓ+ãvÏ[áNßàZà7äâÓ:Ò#åMÓ+ã= 7ÑIßænà7Ò ç^üs ÜHÓ:Ò 7áIãcåMÓ+ánÑIÓTÒ#Ó+ë`Ó:Ò#ÓáIÞ"Ó+âQì4Ï[ßærßæIÓ\Ò#Óâà7Ý[åMÓ+ã@Ó+áNßÏ[ßÏ[Ó+â"è Ü^àMáIãsÏ.áIÐY 7ÑnßæIà7Ò Positive and negative relational evidence. K F0" :1Qe F;: D" f 76e >D 0 9D f ' ž 1X(-mf+ 8%Q9G0"+04a$B9WX>'I8-41" F(**#*4"Ws" i 6$:;IJ*C6L5naKM?)e,; $'[$aq_68op$$:ee a#j% ž 1X(*mf ; 1X(-mfs76e >D?0 ž 1X(*mfs#LQ4L5+n0

8 Problem formulation Entities and references are denoted by e and r. Assigned variables of e and r are denoted by e.a and r.a. References are typed and r.t is observed. Each reference r corresponds to a hidden entity so that each r has assigned entity label r.e. The problem is to discover the hidden set of entities E = {e i } and entity labels r.e for each reference. References are observed as members of hyper-edges. The membership of a reference is stored in hyper-edge label r.h = h (if reference r h).

9 GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ Problem formulation R Agrawal R Srikant Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Fast Algorithms for Mining Association Rules in Large Databases GÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇ r1 e1 r5 GÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇ GÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇGÇ ÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ GÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ GÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ r2 e2 r6 GÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉGÉ ËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGË r3 e3 r7 ËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGË ËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGËGË VLDB 94 C1 Proc of the 20th Intl. Conference... C2 GÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍGÍ r4 e4 r8 h1 h2 ko"d kmkmd The set of hidden entities is E = {e 1, e 2, e 3, e 4 } where ç-üs Ü^Ó:Ò!ßÏUßÝ[Ó ásãvåmó+ánñió&âßòï.áiðmâ"èvö HØéFæIÓÞ"à7Ò#Ò#Ó ê Î)Ï[Ð7ÑIÒ#ÓcÔ7Õ&Ö% MØÙAÚn 7ÛÜÝ[Ó2Þ?Ï[ß ßÏ[àMásâ!ÜI ÒâÓ+ãvÏ[áNßàZà7äâÓ:Ò#åMÓ+ã= 7ÑIßænà7Ò ç^üs ÜHÓ:Ò 7áIãcåMÓ+ánÑIÓTÒ#Ó+ë`Ó:Ò#ÓáIÞ"Ó+âQì4Ï[ßærßæIÓ\Ò#Óâà7Ý[åMÓ+ã@Ó+áNßÏ[ßÏ[Ó+â"è âü^àmáiãsï.áiðy 7ÑnßæIà7Ò r 1.E = r 5.E = e 1, $\(-8QL4L7M;LR2Q\50F0L9", r 2.E = r 6.E = e 2, D?F"F0"5+96+0"26ST$;CRL3KM$/ F0" :1Qe F;: D" f 76e >D 0 9D f ' 1K i 6$:;IJ*C6L5naKM?)e,; ()F(**#*4"Ws" 1X(-mfs76e >D?0 ž 1X(*mfs#LQ4L5+n0 ž r 3.E = r 7.E = e 3, $b ±". º±ƒW ()8c!0"?6r5+$:8=e8 9D f ; r 4.E = r 8.E = e 4 0"5;F %S0_' (-+EKMQ+"+U#L$]+M0C,; ž 1X(*mf $'.BEEe >D?0 74^'.*$]4""()Ce 1X(-mf+ 8%Q9G0"+04a$B9WX>'I8-41" ž a#j% ž 1X(*mf ; $'[$aq_68op$$:ee 3.4 Positive And Negative Relational Evidence

10 Resolution by clustering The goal is to group all the references corresponding to the same entity into one cluster. The membership of a reference to a cluster is represented with r.c. All references from the cluster are of the same type. 1 At the beginning, each reference belongs to the separate cluster. 2 At each step, the cluster pair, with the highest similarity to be the same entity, is merged. The general similarity is defined as: sim(c i, c j ) = (1 α) sim attr (c i, c j ) + α sim rel (c i, c j ) where 0 α 1

11 Attribute a relational similarity Attribute similarity Any basic similarity measure for two reference attributes. The similarity for two clusters is calculated between two most representative attributes of those clusters. Relational similarity The measure between two clusters considering the clusters that they link to via observed edges. Edge detail similarity Neighborhood similarity

12 Edge detail similarity Each cluster is associated with the set of hyper-edges: c.h = {h r.h = h r.c = c} The similarity between two edges is defined as: sim(h i, h j ) = t (sim t (h i, h j )) where: sim t (h i, h j ) = Jaccard(π t (h i ), π t (h j ))) and π t (h) = {c r.c = c c.t = t r.h = h} The final similarity is defined as: sim rel (c i, c j ) = max(h i, h j ){sim(h i, h j )} where h i c i.h, h j c j.h

13 Neighborhood similarity The similarity between two clusters is defined as: sim rel (c i, c j ) = Jaccard(N t (c i ), N t (c j ))) where N t (c) = m π t (h), h c.h The obtained neighborhoods are multisets.

14 Implementation Greedy agglomerative clustering that merges closest cluster pair at each step. All candidate pairs are sorted by their similarities in a priority queue - blocking approach. During the initial phase, references with the identical attributes v 1 = v 2 or with a reference which is initialed form of the other are merged.

15 Datasets and baseline methods CiteSeer dataset contains 2892 references with 1165 authors, contained in 1504 documents. arxiv dataset contains references with 9200 authors, contained in papers. Baseline method ATTR based on SoftTF-IDF where the secondary distance measures can be Jaro-Winkler, Jaro or Scaled Levenstein distance.

16 Accuracy results with different similarity measures

17 Precision, recall and F1 results for both datasets

18 4Š () EhH s; Performance '[ > 0/?$X 1 4 :5 QKMC ' $91 :KM : ,9 F 24 ñ Š# 4'., B +$C 76\ ñ 4'. $91 :KM EM$> 8% 7()9X&9Cˆ L + D" % 1 / mcê 76 CPU time (secs) Execution time ATTR* RC (Nbr) with Bootstrap RC (Nbr) w/o Bootstrap RC (Edge) with Bootstrap RC (Edge) w/o Bootstrap 200 1C$/ Number of References (in Thousands) " Î)Ï[ÐMÑIÒ#ÓíìIÕrÙaÚNÓ+Þ?ÑIßÏ[à7ágßÏmÛÓ=ë`à7Ò0 CéFé/lœ Z 7ásãîl>m,êÙHl >" (-/

19 Attribute vs relational similarity effects on accuracy Varying alpha: Jaro for CiteSeer Varying alpha: Jaro Winkler for CiteSeer Varying alpha: Scaled Levenstein for CiteSeer best F RC-ER (Nbr) RC-ER (Edge) ATTR ATTR* best F RC-ER (Nbr) RC-ER (Edge) ATTR ATTR* best F RC-ER (Nbr) RC-ER (Edge) ATTR ATTR* ko"d alpha kmkmd alpha kmd alpha Varying alpha: Jaro for HEP Varying alpha: Jaro Winkler for HEP Varying alpha: Scaled Levenstein for HEP best F RC-ER (Nbr) RC-ER (Edge) ATTR ATTR* best F RC-ER (Nbr) RC-ER (Edge) ATTR ATTR* best F RC-ER (Nbr) RC-ER (Edge) ATTR ATTR* ko67d alpha km#d alpha k.'xd alpha Î)Ï[Ð7ÑIÒ#Ó`ÏNÕZéFæIÓcä^Ó+âß2ÎEÔcÛÓ: 7âÑnÒÓ+âS 7ÞæsÏ[Ó:åMÓ+ã ä4 ïl>m,êùhl ì4ï[ßætáió+ï[ðmæpä^à7òæiàpànãt 7ásã Ó+ãIÐ7ÓrãnÓ+ß$ 7Ï.ÝFâÏmÛ2Ï.ÝU ÒÏ[ßÏ[Ó+âVà+åMÓ+Ò å+ Ò nï.áið2þ"àpû\äï.ái ßÏ[àMáZìLÓ+Ï[ÐMæPß<"@ë`à7Ò\Ö êþ:ø/mlï[ßóz NÓ:Ó:Ò_ ásãyöwãnê0ëø/ó3ùjÿgñiâï.áiðâï.áiðmý[ó!ý.ï.ágªvë`à7ò_ ß$ßÒÏ[äÑIßÓSâÏ.Û2Ï.Ý[ ÒÏUßM Zì4Ï[ßæ N Òànç n Ò#àê Ï.ágªNÝ[Ó:ÒT ásã` nþ 7Ý[Ó+ã= HÓ+åMÓ+ásâß$ÓÏ[á@Ò#Ó+âÜHÓ+Þ"ßÏ[åMÓ+Ý' 7è 05+BQKMA506_C0"?)$'.$*:, 8BK Q5W/ 91Q$!;Hh^]+M$4 Indrajit *3, B8# :$a63 Bhattacharya, Lise Getoor 6$9T501Q#7KEW01L09498T405$;: % Relational ób JẼMˆ " "Œ:; Clustering for Multi-type Entity Resolution

20 Conclusions Introduced two relational similarity measures Relational similarity in combination with attributes similarity outperform other non-relational approaches. Successful usage of bootstrapping and blocking approach for improved performance.

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous

More information

Collective Entity Resolution in Relational Data

Collective Entity Resolution in Relational Data Collective Entity Resolution in Relational Data I. Bhattacharya, L. Getoor University of Maryland Presented by: Srikar Pyda, Brett Walenz CS590.01 - Duke University Parts of this presentation from: http://www.norc.org/pdfs/may%202011%20personal%20validation%20and%20entity%20resolution%20conference/getoorcollectiveentityresolution

More information

Query-Time Entity Resolution

Query-Time Entity Resolution Query-Time Entity Resolution Indrajit Bhattacharya University of Maryland, College Park MD, USA 20742 indrajit@cs.umd.edu Lise Getoor University of Maryland, College Park MD, USA 20742 getoor@cs.umd.edu

More information

Query-time Entity Resolution

Query-time Entity Resolution Journal of Artificial Intelligence Research 30 (2007) 621-657 Submitted 03/07; published 12/07 Query-time Entity Resolution Indrajit Bhattacharya IBM India Research Laboratory Vasant Kunj, New Delhi 110

More information

Entity Resolution over Graphs

Entity Resolution over Graphs Entity Resolution over Graphs Bingxin Li Supervisor: Dr. Qing Wang Australian National University Semester 1, 2014 Acknowledgements I would take this opportunity to thank my supervisor, Dr. Qing Wang,

More information

SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE)

SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE) SQL Based Association Rule Mining using Commercial RDBMS (IBM DB2 UDB EEE) Takeshi Yoshizawa, Iko Pramudiono, Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 7-22-1 Roppongi,

More information

Novel Hybrid k-d-apriori Algorithm for Web Usage Mining

Novel Hybrid k-d-apriori Algorithm for Web Usage Mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 4, Ver. VI (Jul.-Aug. 2016), PP 01-10 www.iosrjournals.org Novel Hybrid k-d-apriori Algorithm for Web

More information

A framework of identity resolution: evaluating identity attributes and matching algorithms

A framework of identity resolution: evaluating identity attributes and matching algorithms Li and Wang Security Informatics (2015) 4:6 DOI 10.1186/s13388-015-0021-0 RESEARCH A framework of identity resolution: evaluating identity attributes and matching algorithms Jiexun Li 1 and Alan G. Wang

More information

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Efficient Remining of Generalized Multi-supported Association Rules under Support Update Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou

More information

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Understanding Rule Behavior through Apriori Algorithm over Social Network Data Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect:

Outline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect: Outline Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries Dongwon Lee, Byung-Won On Penn State University, USA Jaewoo Kang North Carolina State University, USA

More information

Privacy. University of Maryland, College Park

Privacy. University of Maryland, College Park Graph Identification & Privacy Lise Getoor University of Maryland, College Park Stanford InfoSeminar January 16, 2009 Graphs and Networks everywhere The Web, social networks, communication networks, financial

More information

Deduplication of Hospital Data using Genetic Programming

Deduplication of Hospital Data using Genetic Programming Deduplication of Hospital Data using Genetic Programming P. Gujar Department of computer engineering Thakur college of engineering and Technology, Kandiwali, Maharashtra, India Priyanka Desai Department

More information

Where we are. Exploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)

Where we are. Exploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min) Where we are Background (15 min) Graph models, subgraph isomorphism, subgraph mining, graph clustering Eploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

Information Integration of Partially Labeled Data

Information Integration of Partially Labeled Data Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de

More information

Database system development lifecycles

Database system development lifecycles Database system development lifecycles 2009 Yunmook Nah Department of Electronics and Computer Engineering School of Computer Science & Engineering Dankook University 이석호 ä ± Á Ç ºÐ ¼ ¼³ è ± Çö î µ ½Ã

More information

COFI Approach for Mining Frequent Itemsets Revisited

COFI Approach for Mining Frequent Itemsets Revisited COFI Approach for Mining Frequent Itemsets Revisited Mohammad El-Hajj Department of Computing Science University of Alberta,Edmonton AB, Canada mohammad@cs.ualberta.ca Osmar R. Zaïane Department of Computing

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse

More information

Generating Cross level Rules: An automated approach

Generating Cross level Rules: An automated approach Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Leveraging Data and Structure in Ontology Integration

Leveraging Data and Structure in Ontology Integration Leveraging Data and Structure in Ontology Integration O. Udrea L. Getoor R.J. Miller Group 15 Enrico Savioli Andrea Reale Andrea Sorbini DEIS University of Bologna Searching Information in Large Spaces

More information

Fuzzy Cognitive Maps application for Webmining

Fuzzy Cognitive Maps application for Webmining Fuzzy Cognitive Maps application for Webmining Andreas Kakolyris Dept. Computer Science, University of Ioannina Greece, csst9942@otenet.gr George Stylios Dept. of Communications, Informatics and Management,

More information

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD)

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD) American-Eurasian Journal of Scientific Research 12 (5): 255-259, 2017 ISSN 1818-6785 IDOSI Publications, 2017 DOI: 10.5829/idosi.aejsr.2017.255.259 Rule-Based Method for Entity Resolution Using Optimized

More information

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics

More information

Object Distinction: Distinguishing Objects with Identical Names

Object Distinction: Distinguishing Objects with Identical Names Object Distinction: Distinguishing Objects with Identical Names Xiaoxin Yin Univ. of Illinois xyin1@uiuc.edu Jiawei Han Univ. of Illinois hanj@cs.uiuc.edu Philip S. Yu IBM T. J. Watson Research Center

More information

Knowledge Graph Completion. Mayank Kejriwal (USC/ISI)

Knowledge Graph Completion. Mayank Kejriwal (USC/ISI) Knowledge Graph Completion Mayank Kejriwal (USC/ISI) What is knowledge graph completion? An intelligent way of doing data cleaning Deduplicating entity nodes (entity resolution) Collective reasoning (probabilistic

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information

More information

Introduction Entity Match Service. Step-by-Step Description

Introduction Entity Match Service. Step-by-Step Description Introduction Entity Match Service In order to incorporate as much institutional data into our central alumni and donor database (hereafter referred to as CADS ), we ve developed a comprehensive suite of

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Being Prepared In A Sparse World: The Case of KNN Graph Construction. Antoine Boutet DRIM LIRIS, Lyon

Being Prepared In A Sparse World: The Case of KNN Graph Construction. Antoine Boutet DRIM LIRIS, Lyon Being Prepared In A Sparse World: The Case of KNN Graph Construction Antoine Boutet DRIM LIRIS, Lyon Co-authors Joint work with François Taiani Nupur Mittal Anne-Marie Kermarrec Published at ICDE 2016

More information

Concept-Based Document Similarity Based on Suffix Tree Document

Concept-Based Document Similarity Based on Suffix Tree Document Concept-Based Document Similarity Based on Suffix Tree Document *P.Perumal Sri Ramakrishna Engineering College Associate Professor Department of CSE, Coimbatore perumalsrec@gmail.com R. Nedunchezhian Sri

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

A Versatile Record Linkage Method by Term Matching Model Using CRF

A Versatile Record Linkage Method by Term Matching Model Using CRF A Versatile Record Linkage Method by Term Matching Model Using CRF Quang Minh Vu, Atsuhiro Takasu, and Jun Adachi National Insitute of Informatics, Tokyo 101-8430, Japan {vuminh,takasu,adachi}@nii.ac.jp

More information

DATABASES often contain uncertain and imprecise references

DATABASES often contain uncertain and imprecise references IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 5, SEPTEMBER/OCTOBER 2008 999 Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation Hyunmo Kang,

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

Data Structure for Association Rule Mining: T-Trees and P-Trees

Data Structure for Association Rule Mining: T-Trees and P-Trees IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 1 Data Structure for Association Rule Mining: T-Trees and P-Trees Frans Coenen, Paul Leng, and Shakil Ahmed Abstract Two new

More information

Lecture 8 May 7, Prabhakar Raghavan

Lecture 8 May 7, Prabhakar Raghavan Lecture 8 May 7, 2001 Prabhakar Raghavan Clustering documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics Given the set of docs from the results of

More information

r v i e w o f s o m e r e c e n t d e v e l o p m

r v i e w o f s o m e r e c e n t d e v e l o p m O A D O 4 7 8 O - O O A D OA 4 7 8 / D O O 3 A 4 7 8 / S P O 3 A A S P - * A S P - S - P - A S P - - - - L S UM 5 8 - - 4 3 8 -F 69 - V - F U 98F L 69V S U L S UM58 P L- SA L 43 ˆ UéL;S;UéL;SAL; - - -

More information

Effective Sequential Pattern Mining Algorithms for Dense Database

Effective Sequential Pattern Mining Algorithms for Dense Database DEWS2006 3A-o4 Abstract Effective Sequential Pattern Mining Algorithms for Dense Database Zhenglu YANG, Yitong WANG, and Masaru KITSUREGAWA Institute of Industrial Science, The Univeristy of Tokyo Komaba

More information

A Novel Method of Optimizing Website Structure

A Novel Method of Optimizing Website Structure A Novel Method of Optimizing Website Structure Mingjun Li 1, Mingxin Zhang 2, Jinlong Zheng 2 1 School of Computer and Information Engineering, Harbin University of Commerce, Harbin, 150028, China 2 School

More information

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,

More information

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects Borg, Markus; Runeson, Per; Johansson, Jens; Mäntylä, Mika Published in: [Host publication title missing]

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Mining Vague Association Rules

Mining Vague Association Rules Mining Vague Association Rules An Lu, Yiping Ke, James Cheng, and Wilfred Ng Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China {anlu,keyiping,csjames,wilfred}@cse.ust.hk

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

A Mixed Fragmentation Algorithm for Distributed Object Oriented Databases 1

A Mixed Fragmentation Algorithm for Distributed Object Oriented Databases 1 A Mixed Fragmentation Algorithm for Distributed Object Oriented Databases 1 Fernanda Baião Department of Computer Science - COPPE/UFRJ Abstract Federal University of Rio de Janeiro - Brazil baiao@cos.ufrj.br

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

Outline. Data Integration. Entity Matching/Identification. Duplicate Detection. More Resources. Duplicates Detection in Database Integration

Outline. Data Integration. Entity Matching/Identification. Duplicate Detection. More Resources. Duplicates Detection in Database Integration Outline Duplicates Detection in Database Integration Background HumMer Automatic Data Fusion System Duplicate Detection methods An efficient method using priority queue Approach based on Extended key Approach

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

A DISTRIBUTED ALGORITHM FOR MINING ASSOCIATION RULES

A DISTRIBUTED ALGORITHM FOR MINING ASSOCIATION RULES A DISTRIBUTED ALGORITHM FOR MINING ASSOCIATION RULES Pham Nguyen Anh Huy *, Ho Tu Bao ** * Department of Information Technology, Natural Sciences University of HoChiMinh city 227 Nguyen Van Cu Street,

More information

Entity Resolution with Heavy Indexing

Entity Resolution with Heavy Indexing Entity Resolution with Heavy Indexing Csaba István Sidló Data Mining and Web Search Group, Informatics Laboratory Institute for Computer Science and Control, Hungarian Academy of Sciences sidlo@ilab.sztaki.hu

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

A Systems View of Large- Scale 3D Reconstruction

A Systems View of Large- Scale 3D Reconstruction Lecture 23: A Systems View of Large- Scale 3D Reconstruction Visual Computing Systems Goals and motivation Construct a detailed 3D model of the world from unstructured photographs (e.g., Flickr, Facebook)

More information

To provide state and district level PARCC assessment data for the administration of Grades 3-8 Math and English Language Arts.

To provide state and district level PARCC assessment data for the administration of Grades 3-8 Math and English Language Arts. 200 West Baltimore Street Baltimore, MD 21201 410-767-0100 410-333-6442 TTY/TDD msde.maryland.gov TO: FROM: Members of the Maryland State Board of Education Jack R. Smith, Ph.D. DATE: December 8, 2015

More information

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search.

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search. Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search. Dmitri V. Kalashnikov Rabia Nuray-Turan Sharad Mehrotra Dept of Computer Science University of California, Irvine

More information

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2017 CPSC 340: Machine Learning and Data Mining Hierarchical Clustering Fall 2017 Assignment 1 is due Friday. Admin Follow the assignment guidelines naming convention (a1.zip/a1.pdf). Assignment 0 grades posted

More information

Efficient Incremental Mining of Top-K Frequent Closed Itemsets

Efficient Incremental Mining of Top-K Frequent Closed Itemsets Efficient Incremental Mining of Top- Frequent Closed Itemsets Andrea Pietracaprina and Fabio Vandin Dipartimento di Ingegneria dell Informazione, Università di Padova, Via Gradenigo 6/B, 35131, Padova,

More information

Comparison of Online Record Linkage Techniques

Comparison of Online Record Linkage Techniques International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 p-issn: 2395-0072 www.irjet.net Comparison of Online Record Linkage Techniques Ms. SRUTHI.

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Holistic Query Evaluation over Information Extraction Pipelines

Holistic Query Evaluation over Information Extraction Pipelines Holistic Query Evaluation over Information Extraction Pipelines ABSTRACT Ekaterini Ioannou Open University of Cyprus ekaterini.ioannou@ouc.ac.cy We introduce holistic in-database query processing over

More information

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India

More information

Assume we are given a tissue sample =, and a feature vector

Assume we are given a tissue sample =, and a feature vector MA 751 Part 6 Support Vector Machines 3. An example: Gene expression arrays Assume we are given a tissue sample =, and a feature vector x œ F Ð=Ñ $!ß!!! consisting of 30,000 gene expression levels as read

More information

CPSC 425: Computer Vision

CPSC 425: Computer Vision 1 / 31 CPSC 425: Computer Vision Instructor: Jim Little little@cs.ubc.ca Department of Computer Science University of British Columbia Lecture Notes 2016/2017 Term 2 2 / 31 Menu March 16, 2017 Topics:

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

RECORD DEDUPLICATION USING GENETIC PROGRAMMING APPROACH

RECORD DEDUPLICATION USING GENETIC PROGRAMMING APPROACH Int. J. Engg. Res. & Sci. & Tech. 2013 V Karthika et al., 2013 Research Paper ISSN 2319-5991 www.ijerst.com Vol. 2, No. 2, May 2013 2013 IJERST. All Rights Reserved RECORD DEDUPLICATION USING GENETIC PROGRAMMING

More information

A Modified Apriori Algorithm for Fast and Accurate Generation of Frequent Item Sets

A Modified Apriori Algorithm for Fast and Accurate Generation of Frequent Item Sets INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 6, ISSUE 08, AUGUST 2017 ISSN 2277-8616 A Modified Apriori Algorithm for Fast and Accurate Generation of Frequent Item Sets K.A.Baffour,

More information

Papers for comprehensive viva-voce

Papers for comprehensive viva-voce Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India

More information

ALIN Results for OAEI 2016

ALIN Results for OAEI 2016 ALIN Results for OAEI 2016 Jomar da Silva, Fernanda Araujo Baião and Kate Revoredo Department of Applied Informatics Federal University of the State of Rio de Janeiro (UNIRIO), Rio de Janeiro, Brazil {jomar.silva,fernanda.baiao,katerevoredo}@uniriotec.br

More information

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu, Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk

More information

Privacy Preserving in Knowledge Discovery and Data Publishing

Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao, G.V Konda Reddy and G.Yedukondalu 33 Privacy Preserving in Knowledge Discovery and Data Publishing B.Lakshmana Rao 1, G.V Konda Reddy 2, G.Yedukondalu 3 Abstract Knowledge Discovery is

More information

Multi-component Similarity Method for Web Product Duplicate Detection

Multi-component Similarity Method for Web Product Duplicate Detection Multi-component Similarity Method for Web Product Duplicate Detection Ronald van Bezu ronaldvanbezu@gmail.com Jim Verhagen j.m.verhagen@gmail.com Sjoerd Borst s.v.borst@gmail.com Damir Vandic vandic@ese.eur.nl

More information

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Visual Analysis of Lagrangian Particle Data from Combustion Simulations Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang

More information

Comprehensive and Progressive Duplicate Entities Detection

Comprehensive and Progressive Duplicate Entities Detection Comprehensive and Progressive Duplicate Entities Detection Veerisetty Ravi Kumar Dept of CSE, Benaiah Institute of Technology and Science. Nagaraju Medida Assistant Professor, Benaiah Institute of Technology

More information

APPLESHARE PC UPDATE INTERNATIONAL SUPPORT IN APPLESHARE PC

APPLESHARE PC UPDATE INTERNATIONAL SUPPORT IN APPLESHARE PC APPLESHARE PC UPDATE INTERNATIONAL SUPPORT IN APPLESHARE PC This update to the AppleShare PC User's Guide discusses AppleShare PC support for the use of international character sets, paper sizes, and date

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Hierarchical Clustering and Outlier Detection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 2 is due

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

AROMA results for OAEI 2009

AROMA results for OAEI 2009 AROMA results for OAEI 2009 Jérôme David 1 Université Pierre-Mendès-France, Grenoble Laboratoire d Informatique de Grenoble INRIA Rhône-Alpes, Montbonnot Saint-Martin, France Jerome.David-at-inrialpes.fr

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

Network Based Hard/Soft Information Fusion Data Association Process Gregory Tauer, Kedar Sambhoos, Rakesh Nagi (co-pi), Moises Sudit (co-pi)

Network Based Hard/Soft Information Fusion Data Association Process Gregory Tauer, Kedar Sambhoos, Rakesh Nagi (co-pi), Moises Sudit (co-pi) Network Based Hard/Soft Information Fusion Data Association Process Gregory Tauer, Kedar Sambhoos, Rakesh Nagi (co-pi), Moises Sudit (co-pi) Objectives: Formulate and implement a workable, quantitativelybased

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Feature Subset Selection using Clusters & Informed Search. Team 3

Feature Subset Selection using Clusters & Informed Search. Team 3 Feature Subset Selection using Clusters & Informed Search Team 3 THE PROBLEM [This text box to be deleted before presentation Here I will be discussing exactly what the prob Is (classification based on

More information

A Modified Apriori Algorithm

A Modified Apriori Algorithm A Modified Apriori Algorithm K.A.Baffour, C.Osei-Bonsu, A.F. Adekoya Abstract: The Classical Apriori Algorithm (CAA), which is used for finding frequent itemsets in Association Rule Mining consists of

More information