Learning Statistical Models From Relational Data

Size: px
Start display at page:

Download "Learning Statistical Models From Relational Data"

Transcription

1

2 Slides taken from the presentation (subset only) Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne Koller, Stanford Avi Pfeffer, Harvard Ben Taskar, Stanford

3 Outline Motivation and Background PRMs w/ Attribute Uncertainty PRMs w/ Link Uncertainty PRMs w/ Class Hierarchies

4 Discovering Patterns in Structured Data Strain Patient Treatment

5 Learning Statistical Models Traditional approaches work well with flat representations fixed length attribute-value vectors assume independent (IID) sample Problems: introduces statistical skew loses relational structure incapable of detecting link-based patterns must fix attributes in advance Patient flatten

6 Probabilistic Relational Models Combine advantages of relational logic & Bayesian networks: natural domain modeling: objects, properties, relations; generalization over a variety of situations; compact, natural probability models. Integrate uncertainty with relational model: properties of domain entities can depend on properties of related entities; uncertainty over relational structure of domain.

7 Relational Schema Infected with Strain Unique Infectivity -Type Close- Patient Skin-Test Homeless Age HIV-Result Interacted with Ethnicity Disease-Site Describes the types of objects and relations in the database

8 Probabilistic Relational Model Patient Homeless POB Strain Unique Infectivity HIV-Result Disease Site P H, C P(T H, C) Cont.Transmitted f, f Cont.Close- f, t Cont.or.HIV t, f t, t Age -Type Close- Transmitted

9 Relational Skeleton Strain s2 Strain s1 Patient p1 Patient p2 c1 c2 c3 Patient p3 Fixed relational skeleton σ set of objects in each class relations between them Uncertainty over assignment of values to attributes PRM defines distribution over instantiations of attributes

10 A Portion of the BN P1.POB C1.Age P1.Homeless P1.HIV-Result true P1.Disease Site C1.-Type C1.Close- false C1.Transmitted C2.Age H,, C f,, f f,, t t t t,, f t t,, t t P(T H, C) C2.-Type C2.Close- true C2.Transmitted

11 PRM: Aggregate Dependencies Patient Homeless POB -Type HIV-Result Disease Site Age Age Close- Transmitted Patient Jane Doe POB US Homeless no HIV-Result negative Age??? A Disease Site pulmonary mode #5077 -Type coworker Close- no Age. middle-aged Transmitted false sum, min, max, avg, count, #5076 -Type spouse Close- yes Age. middle-aged Transmitted true #5075 -Type friend Close- no Age. middle-aged Transmitted false

12 PRM with AU Semantics Patient Strain Strain s1 Strain s2 Patient p2 Patient p1 Patient p3 c1 c2 c3 PRM + relational skeleton σ = probability distribution over completions I: Objects Attributes

13 Learning PRMs w/ AU Database Strain Patient Strain Patient Relational Schema Parameter estimation Structure selection

14 Parameter Estimation in PRMs Assume known dependency structure S Goal: estimate PRM parameters θ entries in local probability models, θ is good if it is likely to generate the observed data, instance I. MLE Principle: Choose θ so as to maximize l

15 ML Parameter Estimation Patient HIV DiseaseSite Close θ = Transmitted H, C P(T H, C) f, Cont.Transmitted f?? P P f, Cont.Close- t?? Cont.or.HIV t, f?? t, t?? Query for counts: Count Patient table table

16 Idea: Structure Selection define scoring function do local search over legal structures Key Components: legal models scoring models searching model space

17 Idea: Structure Selection define scoring function do local search over legal structures Key Components:» legal models scoring models searching model space

18 Legal Models PRM defines a coherent probability model over a skeleton σ if the dependencies between object attributes are acyclic Researcher Prof. Gump Reputation high sum author-of P1 Accepted yes P2 Accepted yes How do we guarantee that a PRM is acyclic for every skeleton?

19 Attribute Stratification PRM dependency structure S.Accecpted Researcher.Reputation dependency graph if Researcher.Reputation depends directly on.accepted Attribute stratification: dependency graph acyclic acyclic for any σ Algorithm more flexible; allows certain cycles along guaranteed acyclic relations

20 Idea: Structure Selection define scoring function do local search over legal structures Key Components: legal models» scoring models searching model space

21 Scoring Models Bayesian approach: Standard approach to scoring models; used in Bayesian network learning

22 Idea: Structure Selection define scoring function do local search over legal structures Key Components: legal models scoring models» searching model space

23 Searching Model Space Phase 0: consider only dependencies within a class Strain Patient Strain Patient Strain Patient

24 Phased Structure Search Phase 1: consider dependencies from neighboring classes, via schema relations Strain Patient Strain Patient Strain Patient

25 Phased Structure Search Phase 2: consider dependencies from further classes, via relation chains Strain Patient Strain Patient Strain Patient

26 Issue PRM w/ AU applicable only in domains where we have full knowledge of the relational structure Next we introduce PRMs which allow uncertainty over relational structure

27 PRMs w/ Link Uncertainty Advantages: Applicable in cases where we do not have full knowledge of relational structure Incorporating uncertainty over relational structure into probabilistic model can improve predictive accuracy Two approaches: Reference uncertainty Existence uncertainty Different probabilistic models; varying amount of background knowledge required for each

28 Citation Relational Schema Author Institution Research Area Wrote Word1 Word2 WordN Citing Cites Count Cited Word1 Word2 WordN

29 Attribute Uncertainty Author Research Area Institution P( Institution Research Area) Wrote P(.Author.Research Area P( WordN ) Word1... WordN

30 Reference Uncertainty Bibliography ? ` ? ? Scientific Document Collection

31 PRM w/ Reference Uncertainty Words Cites Cited Citing Words Dependency model for foreign keys Naïve Approach: multinomial over primary key noncompact limits ability to generalize

32 Reference Uncertainty Example P5 P4 P3 M2 P1 AI AI AI AI Theory P5 AI P3 AI P1. = AI P4 P2 Theory Theory P1 Theory P2. = Theory Cited. P1 P2 Words Cites P1 P2 Theory Cited AI Citing

33 PRMs w/ RU Semantics Words Cites Cited Citing Words P2 P5 P4 AI Theory P3 P1 Theory AI??? Reg Reg Reg Cites Reg P2 P5 P4 AI Theory P3 P1 Theory AI??? PRM RU entity skeleton σ PRM-RU + entity skeleton σ probability distribution over full instantiations I

34 Learning PRMs w/ RU Idea: just like in PRMs w/ AU define scoring function do greedy local structure search Issues: expanded search space construct partitions new operators

35 Learning Idea: define scoring function do phased local search over legal structures Key Components: legal models scoring models PRMs w/ RU model new dependencies unchanged searching model space new operators

36 Structure Search: New Operators Words Cites Cited Citing Words Author Institution Citing s 1.0 Institution = MIT = AI

37 PRMs w/ RU Summary Define semantics for uncertainty over foreign-key values Search now includes operators Refine and Abstract for constructing foreign-key dependency model Provides one simple mechanism for link uncertainty

38 Existence Uncertainty??? Document Collection Document Collection

39 PRM w/ Exists Uncertainty Words Cites Exists Words Dependency model for existence of relationship

40 Exists Uncertainty Example Words Cites Exists Words Citer. Cited. False True Theory Theory Theory AI AI Theory AI AI

41 PRMs w/ EU Semantics Words Cites Exists Words P2 P5 P4 AI Theory P3 P1 Theory AI?????? P2 P5 P4 AI Theory P3 P1 Theory AI??? PRM EU object skeleton σ PRM-EU + object skeleton σ probability distribution over full instantiations I

42 Learning PRMs w/ EU Idea: just like in PRMs w/ AU define scoring function do greedy local structure search Issues: efficiency Computation of sufficient statistics for exists attribute Do not explicitly consider relations that do not exist

43 Structure Selection PRMs w/ EU Idea: define scoring function do phased local search over legal structures Key Components: legal models model new dependencies scoring models unchanged searching model space unchanged

44 PRMs w/ Class Hierarchies Allows us to: Refine a heterogenous class into more coherent subclasses Refine probabilistic model along class hierarchy Can specialize/inherit CPDs Construct new dependencies that were originally acyclic Provides bridge from class-based model to instance-based model

45 PRM-CH TV-Program Genre Budget Time-slot Network Vote Program Voter Ranking Person Age Gender Education Income TV-Program Relational Schema SitCom Drama Documentary Budget TV -Program Legal-Drama Medical-Drama SoapOpera Class Hierarchy Budget SitCom Budget Drama Budget Documentary Budget Legal-Drama Budget Medical-Drama Budget SoapOpera Dependency Model Koller & Pfeffer 1998 Pfeffer 2000

46 Learning PRM-CHs Vote Database: Instance I TVProgram Person Vote TVProgram Relational Schema Person Class hierarchy provided Learn class hierarchy

47 Bayesian Model Selection for PRMs PRM-CHs Idea: define scoring function do phased local search over legal structures Key Components: scoring models unchanged searching model space new operators

48 Guaranteeing Acyclicity with Subclasses Vote Program Voter Ranking Soap-Vote Program Voter Ranking Doc-Vote Program Voter Ranking Vote.Ranking Soap-Vote.Ranking Doc-Vote.Ranking Vote.Class

49 Scenario 1: Class hierarchy is provided New Operators Specialize/Inherit Learning PRM-CH Budget TV -Program Budget SitCom Budget Drama Budget Documentary Budget Legal-Drama Budget Medical-Drama Budget SoapOpera

50 Learning Class Hierarchy Issue: partially observable data set Construct decision tree for class defined over attributes observed in training set New operator Split on class attribute Related class attribute documentary class1 English TV-Program.Genre sitcom TV-.Network.Nationality class2 French drama class3 American class4 class5 class6

51 PRM-CH Summary PRMs with class hierarchies are a natural extension of PRMs: Specialization/Inheritance of CPDs Allows new dependency structures Provide bridge from class-based to instancebased models Learning techniques proposed Need efficient heuristics Empirical validation on real-world domains

52 Conclusions PRMs can represent distribution over attributes from multiple tables PRMs can capture link uncertainty PRMs allow inferences about individuals while taking into account relational structure (they do not make inapproriate independence assuptions)

53 Selected Publications Learning Probabilistic Models of Link Structure, L. Getoor, N. Friedman, D. Koller and B. Taskar, JMLR Probabilistic Models of Text and Link Structure for Hypertext Classification, L. Getoor, E. Segal, B. Taskar and D. Koller, IJCAI WS Text Learning: Beyond Classification, Selectivity Estimation using Probabilistic Models, L. Getoor, B. Taskar and D. Koller, SIGMOD-01. Learning Probabilistic Relational Models, L. Getoor, N. Friedman, D. Koller, and A. Pfeffer, chapter in Relation Data Mining, eds. S. Dzeroski and N. Lavrac, see also N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, IJCAI-99. Learning Probabilistic Models of Relational Structure, L. Getoor, N. Friedman, D. Koller, and B. Taskar, ICML-01. From Instances to Classes in Probabilistic Relational Models, L. Getoor, D. Koller and N. Friedman, ICML Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries, Notes from AAAI Workshop on Learning Statistical Models from Relational Data, eds. L.Getoor and D. Jensen, Notes from IJCAI Workshop on Learning Statistical Models from Relational Data, eds. L.Getoor and D. Jensen, See

Multi-Relational Data Mining

Multi-Relational Data Mining Multi-Relational Data Mining Outline [Dzeroski 2003] [Dzeroski & De Raedt 2003] Introduction Inductive Logic Programmming (ILP) Relational Association Rules Relational Decision Trees Relational Distance-Based

More information

Learning Probabilistic Models of Relational Structure

Learning Probabilistic Models of Relational Structure Learning Probabilistic Models of Relational Structure Lise Getoor Computer Science Dept., Stanford University, Stanford, CA 94305 Nir Friedman School of Computer Sci. & Eng., Hebrew University, Jerusalem,

More information

Learning Probabilistic Models of Relational Structure

Learning Probabilistic Models of Relational Structure Learning Probabilistic Models of Relational Structure Lise Getoor Computer Science Dept., Stanford University, Stanford, CA 94305 Nir Friedman School of Computer Sci. & Eng., Hebrew University, Jerusalem,

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 34 Dec, 2, 2015 Slide source: from David Page (IT) (which were from From Lise Getoor, Nir Friedman, Daphne Koller, and Avi Pfeffer) and from

More information

An Introduction to Probabilistic Graphical Models for Relational Data

An Introduction to Probabilistic Graphical Models for Relational Data An Introduction to Probabilistic Graphical Models for Relational Data Lise Getoor Computer Science Department/UMIACS University of Maryland College Park, MD 20740 getoor@cs.umd.edu Abstract We survey some

More information

Learning Probabilistic Relational Models

Learning Probabilistic Relational Models Learning Probabilistic Relational Models Overview Motivation Definitions and semantics of probabilistic relational models (PRMs) Learning PRMs from data Parameter estimation Structure learning Experimental

More information

Learning Probabilistic Models of Relational Structure

Learning Probabilistic Models of Relational Structure Learning Probabilistic Models of Relational Structure Lise Getoor Computer Science Dept, Stanford University, Stanford, CA 94305 Nir Friedman School of Computer Sci & Eng, Hebrew University, Jerusalem,

More information

Learning Probabilistic Relational Models with Structural Uncertainty

Learning Probabilistic Relational Models with Structural Uncertainty From: AAAI Technical Report WS-00-06. Compilation copyright 2000, AAAI (www.aaai.org). All rights reserved. Learning Probabilistic Relational Models with Structural Uncertainty Lise Getoor Computer Science

More information

Learning Probabilistic Relational Models Using Non-Negative Matrix Factorization

Learning Probabilistic Relational Models Using Non-Negative Matrix Factorization Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference Learning Probabilistic Relational Models Using Non-Negative Matrix Factorization Anthony Coutant,

More information

Introduction to Statistical Relational Learning

Introduction to Statistical Relational Learning Introduction to Statistical Relational Learning Series Foreword Preface xi xiii 1 Introduction 1 Lise Getoor, Ben Taskar 1.1 Overview 1 1.2 Brief History of Relational Learning 2 1.3 Emerging Trends 3

More information

STATISTICAL RELATIONAL LEARNING TUTORIAL NOTES

STATISTICAL RELATIONAL LEARNING TUTORIAL NOTES THE 18 TH EUROPEAN CONFERENCE ON MACHINE LEARNING AND THE 11 TH EUROPEAN CONFERENCE ON PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES STATISTICAL RELATIONAL LEARNING TUTORIAL NOTES presented

More information

Learning Probabilistic Relational Models

Learning Probabilistic Relational Models Learning Probabilistic Relational Models Nir Friedman Hebrew University nir@cs.huji.ac.il Lise Getoor Stanford University getoor@cs.stanford.edu Daphne Koller Stanford University koller@cs.stanford.edu

More information

Join Bayes Nets: A New Type of Bayes net for Relational Data

Join Bayes Nets: A New Type of Bayes net for Relational Data Join Bayes Nets: A New Type of Bayes net for Relational Data Oliver Schulte oschulte@cs.sfu.ca Hassan Khosravi hkhosrav@cs.sfu.ca Bahareh Bina bba18@cs.sfu.ca Flavia Moser fmoser@cs.sfu.ca Abstract Many

More information

An Exact Approach to Learning Probabilistic Relational Model

An Exact Approach to Learning Probabilistic Relational Model JMLR: Workshop and Conference Proceedings vol 52, 171-182, 2016 PGM 2016 An Exact Approach to Learning Probabilistic Relational Model Nourhene Ettouzi LARODEC, ISG Sousse, Tunisia Philippe Leray LINA,

More information

Learning Probabilistic Relational Models. Probabilistic Relational Models

Learning Probabilistic Relational Models. Probabilistic Relational Models Learning Probabilistic Relational Models Getoor, Friedman, Koller, Pfeffer Probabilistic Relational Models.Instructor is foreign key for Professor relation Registration. is foreign key for Registration.

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Learning Directed Probabilistic Logical Models using Ordering-search

Learning Directed Probabilistic Logical Models using Ordering-search Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001

More information

Relational Learning. Jan Struyf and Hendrik Blockeel

Relational Learning. Jan Struyf and Hendrik Blockeel Relational Learning Jan Struyf and Hendrik Blockeel Dept. of Computer Science, Katholieke Universiteit Leuven Celestijnenlaan 200A, 3001 Leuven, Belgium 1 Problem definition Relational learning refers

More information

Using Semantic Web and Relational Learning in the Context of Risk Management

Using Semantic Web and Relational Learning in the Context of Risk Management Using Semantic Web and Relational Learning in the Context of Risk Management Thomas Fischer Department of Information Systems University of Jena Carl-Zeiss-Straße 3, 07743 Jena, Germany fischer.thomas@uni-jena.de

More information

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

More information

arxiv: v1 [cs.lg] 2 Mar 2016

arxiv: v1 [cs.lg] 2 Mar 2016 Probabilistic Relational Model Benchmark Generation Mouna Ben Ishak, Rajani Chulyadyo, and Philippe Leray arxiv:1603.00709v1 [cs.lg] 2 Mar 2016 LARODEC Laboratory, ISG, Université de Tunis, Tunisia DUKe

More information

Learning Directed Relational Models With Recursive Dependencies

Learning Directed Relational Models With Recursive Dependencies Learning Directed Relational Models With Recursive Dependencies Oliver Schulte, Hassan Khosravi, and Tong Man oschulte@cs.sfu.ca, hkhosrav@cs.sfu.ca, mantong01@gmail.com School of Computing Science Simon

More information

Modelling Relational Statistics With Bayes Nets (Poster Presentation SRL Workshop)

Modelling Relational Statistics With Bayes Nets (Poster Presentation SRL Workshop) (Poster Presentation SRL Workshop) Oliver Schulte, Hassan Khosravi, Arthur Kirkpatrick, Tianxiang Gao, Yuke Zhu School of Computing Science, Simon Fraser University,VancouverBurnaby, Canada Abstract Classlevel

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

Learning Probabilistic Relational Models using co-clustering methods

Learning Probabilistic Relational Models using co-clustering methods Learning Probabilistic Relational Models using co-clustering methods Anthony Coutant, Philippe Leray, Hoel Le Capitaine To cite this version: Anthony Coutant, Philippe Leray, Hoel Le Capitaine. Learning

More information

Probabilistic Classification and Clustering in Relational Data

Probabilistic Classification and Clustering in Relational Data Probabilistic lassification and lustering in Relational Data Ben Taskar omputer Science Dept. Stanford University Stanford, A 94305 btaskar@cs.stanford.edu Eran Segal omputer Science Dept. Stanford University

More information

Link Prediction in Relational Data

Link Prediction in Relational Data Link Prediction in Relational Data Ben Taskar Ming-Fai Wong Pieter Abbeel Daphne Koller btaskar, mingfai.wong, abbeel, koller @cs.stanford.edu Stanford University Abstract Many real-world domains are relational

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Lecture 5: Exact inference

Lecture 5: Exact inference Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other

More information

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation Pelin Angin Purdue University Department of Computer Science pangin@cs.purdue.edu Jennifer Neville Purdue University Departments

More information

Collective Classification with Relational Dependency Networks

Collective Classification with Relational Dependency Networks Collective Classification with Relational Dependency Networks Jennifer Neville and David Jensen Department of Computer Science 140 Governors Drive University of Massachusetts, Amherst Amherst, MA 01003

More information

BAYESIAN NETWORKS STRUCTURE LEARNING

BAYESIAN NETWORKS STRUCTURE LEARNING BAYESIAN NETWORKS STRUCTURE LEARNING Xiannian Fan Uncertainty Reasoning Lab (URL) Department of Computer Science Queens College/City University of New York http://url.cs.qc.cuny.edu 1/52 Overview : Bayesian

More information

Relational Graphical Models for Collaborative Filtering and Recommendation of Computational Workflow Components

Relational Graphical Models for Collaborative Filtering and Recommendation of Computational Workflow Components Relational Graphical Models for Collaborative Filtering and Recommendation of Computational Workflow Components William H. Hsu Laboratory for Knowledge Discovery in Databases, Kansas State University 234

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Computer-based Tracking Protocols: Improving Communication between Databases

Computer-based Tracking Protocols: Improving Communication between Databases Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability

More information

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values Iman Alodah Computer Science Department Purdue University West Lafayette, Indiana 47906 Email: ialodah@purdue.edu

More information

In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages , Stockholm, Sweden, August 1999

In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages , Stockholm, Sweden, August 1999 In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 541-550, Stockholm, Sweden, August 1999 SPOOK: A system for probabilistic object-oriented knowledge

More information

Dynamic Bayesian network (DBN)

Dynamic Bayesian network (DBN) Readings: K&F: 18.1, 18.2, 18.3, 18.4 ynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University ecember 1 st, 2006 1 ynamic Bayesian network (BN) HMM defined

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

COS 513: Foundations of Probabilistic Modeling. Lecture 5

COS 513: Foundations of Probabilistic Modeling. Lecture 5 COS 513: Foundations of Probabilistic Modeling Young-suk Lee 1 Administrative Midterm report is due Oct. 29 th. Recitation is at 4:26pm in Friend 108. Lecture 5 R is a computer language for statistical

More information

Where we are. Exploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)

Where we are. Exploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min) Where we are Background (15 min) Graph models, subgraph isomorphism, subgraph mining, graph clustering Eploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)

More information

Estimating the Quality of Databases

Estimating the Quality of Databases Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality

More information

A Framework for Securing Databases from Intrusion Threats

A Framework for Securing Databases from Intrusion Threats A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:

More information

Practical Markov Logic Containing First-Order Quantifiers with Application to Identity Uncertainty

Practical Markov Logic Containing First-Order Quantifiers with Application to Identity Uncertainty Practical Markov Logic Containing First-Order Quantifiers with Application to Identity Uncertainty Aron Culotta and Andrew McCallum Department of Computer Science University of Massachusetts Amherst, MA

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due October 15th, beginning of class October 1, 2008 Instructions: There are six questions on this assignment. Each question has the name of one of the TAs beside it,

More information

Bias-free Hypothesis Evaluation in Multirelational Domains

Bias-free Hypothesis Evaluation in Multirelational Domains Bias-free Hypothesis Evaluation in Multirelational Domains Christine Körner Fraunhofer Institut AIS, Germany christine.koerner@ais.fraunhofer.de Stefan Wrobel Fraunhofer Institut AIS and Dept. of Computer

More information

Machine Learning - Clustering. CS102 Fall 2017

Machine Learning - Clustering. CS102 Fall 2017 Machine Learning - Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for

More information

Graph Classification in Heterogeneous

Graph Classification in Heterogeneous Title: Graph Classification in Heterogeneous Networks Name: Xiangnan Kong 1, Philip S. Yu 1 Affil./Addr.: Department of Computer Science University of Illinois at Chicago Chicago, IL, USA E-mail: {xkong4,

More information

Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs

Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Authors: Andreas Wagner, Veli Bicer, Thanh Tran, and Rudi Studer Presenter: Freddy Lecue IBM Research Ireland 2014 International

More information

Multi-label Collective Classification using Adaptive Neighborhoods

Multi-label Collective Classification using Adaptive Neighborhoods Multi-label Collective Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George Mason University Fairfax, Virginia, USA

More information

Reinforcing the Object-Oriented Aspect of Probabilistic Relational Models

Reinforcing the Object-Oriented Aspect of Probabilistic Relational Models Reinforcing the Object-Oriented Aspect of Probabilistic Relational Models Lionel Torti - Pierre-Henri Wuillemin - Christophe Gonzales LIP6 - UPMC - France firstname.lastname@lip6.fr Abstract Representing

More information

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Exact Inference What To Do With Bayesian/Markov Network? Compact representation of a complex model, but Goal: efficient extraction of

More information

Multivariate Prediction for Learning in Relational Graphs

Multivariate Prediction for Learning in Relational Graphs Multivariate Prediction for Learning in Relational Graphs Yi Huang and Volker Tresp Siemens AG, Corporate Technology Otto-Hahn-Ring 6, 81739 München, Germany YiHuang{Volker.Tresp}@siemens.com Hans-Peter

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Causal Modelling for Relational Data. Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

Causal Modelling for Relational Data. Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada Causal Modelling for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada Outline Relational Data vs. Single-Table Data Two key questions Definition of Nodes

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data

Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data Learning Link-Based Naïve Bayes Classifiers from Ontology-Extended Distributed Data Cornelia Caragea 1, Doina Caragea 2, and Vasant Honavar 1 1 Computer Science Department, Iowa State University 2 Computer

More information

Bayesian Networks Inference (continued) Learning

Bayesian Networks Inference (continued) Learning Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning

More information

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Text Categorization (I)

Text Categorization (I) CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

6 Relational Markov Networks

6 Relational Markov Networks 6 Relational Markov Networks Ben Taskar, Pieter Abbeel, Ming-Fai Wong, and Daphne Koller One of the key challenges for statistical relational learning is the design of a representation language that allows

More information

Relational Dependency Networks

Relational Dependency Networks University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Series Computer Science 2007 Relational Dependency Networks Jennifer Neville Purdue University

More information

An Approach to Inference in Probabilistic Relational Models using Block Sampling

An Approach to Inference in Probabilistic Relational Models using Block Sampling JMLR: Workshop and Conference Proceedings 13: 315-330 2nd Asian Conference on Machine Learning (ACML2010), Tokyo, Japan, Nov. 8 10, 2010. An Approach to Inference in Probabilistic Relational Models using

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Causal Models for Scientific Discovery

Causal Models for Scientific Discovery Causal Models for Scientific Discovery Research Challenges and Opportunities David Jensen College of Information and Computer Sciences Computational Social Science Institute Center for Data Science University

More information

Automatically Synthesizing SQL Queries from Input-Output Examples

Automatically Synthesizing SQL Queries from Input-Output Examples Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun Goal: making it easier for non-expert users to write correct SQL queries

More information

Learning Bayesian Networks (part 3) Goals for the lecture

Learning Bayesian Networks (part 3) Goals for the lecture Learning Bayesian Networks (part 3) Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _

Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Computer Science and Engineering B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013 PG Specialisation : _ LP: CS6007 Rev. No: 01 Date: 27/06/2017 Sub.

More information

Computational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich

Computational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich Computational Databases: Inspirations from Statistical Software Linnea Passing, linnea.passing@tum.de Technical University of Munich Data Science Meets Databases Data Cleansing Pipelines Fuzzy joins Data

More information

Semantics and Inference for Recursive Probability Models

Semantics and Inference for Recursive Probability Models From: AAAI-00 Proceedings. Copyright 2000, AAAI (www.aaai.org). All rights reserved. Semantics and Inference for Recursive Probability Models Avi Pfeffer Division of Engineering and Applied Sciences Harvard

More information

Prognosis of Lung Cancer Using Data Mining Techniques

Prognosis of Lung Cancer Using Data Mining Techniques Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,

More information

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data

More information

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method

Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated

More information

Discriminative Probabilistic Models for Relational Data

Discriminative Probabilistic Models for Relational Data UAI2002 T ASKAR ET AL. 485 Discriminative Probabilistic Models for Relational Data Ben Taskar Computer Science Dept. Stanford University Stanford, CA 94305 btaskar@cs.stanford.edu Pieter Abbeel Computer

More information

A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data

A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data A Closest Fit Approach to Missing Attribute Values in Preterm Birth Data Jerzy W. Grzymala-Busse 1, Witold J. Grzymala-Busse 2, and Linda K. Goodwin 3 1 Department of Electrical Engineering and Computer

More information

Generating Social Network Features for Link-Based Classification

Generating Social Network Features for Link-Based Classification Generating Social Network Features for Link-Based Classification Jun Karamon 1, Yutaka Matsuo 2, Hikaru Yamamoto 3, and Mitsuru Ishizuka 1 1 The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan

More information

A Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University

A Machine Learning Approach for Information Retrieval Applications. Luo Si. Department of Computer Science Purdue University A Machine Learning Approach for Information Retrieval Applications Luo Si Department of Computer Science Purdue University Why Information Retrieval: Information Overload: Since the introduction of digital

More information

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University

Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy. Xiaokui Xiao Nanyang Technological University Privacy Preserving Data Publishing: From k-anonymity to Differential Privacy Xiaokui Xiao Nanyang Technological University Outline Privacy preserving data publishing: What and Why Examples of privacy attacks

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Link Mining Applications: Progress and Challenges

Link Mining Applications: Progress and Challenges Link Mining Applications: Progress and Challenges Ted E. Senator* DARPA/IPTO 3701 N. Fairfax Drive Arlington, VA 22203 ted.senator@darpa.mil ABSTRACT This article reviews a decade of progress in the area

More information

Constraint-Based Entity Matching

Constraint-Based Entity Matching Constraint-Based Entity Matching Warren Shen Xin Li AnHai Doan University of Illinois, Urbana, USA {whshen, xli1, anhai}@cs.uiuc.edu Abstract Entity matching is the problem of deciding if two given mentions

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background

More information

A probabilistic logic incorporating posteriors of hierarchic graphical models

A probabilistic logic incorporating posteriors of hierarchic graphical models A probabilistic logic incorporating posteriors of hierarchic graphical models András s Millinghoffer, Gábor G Hullám and Péter P Antal Department of Measurement and Information Systems Budapest University

More information

Mining Trusted Information in Medical Science: An Information Network Approach

Mining Trusted Information in Medical Science: An Information Network Approach Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou

More information

Structure Learning for Markov Logic Networks with Many Descriptive Attributes

Structure Learning for Markov Logic Networks with Many Descriptive Attributes Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Structure Learning for Markov Logic Networks with Many Descriptive Attributes Hassan Khosravi and Oliver Schulte and

More information

Bayes Net Learning. EECS 474 Fall 2016

Bayes Net Learning. EECS 474 Fall 2016 Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Elysium Technologies Private Limited::IEEE Final year Project

Elysium Technologies Private Limited::IEEE Final year Project Elysium Technologies Private Limited::IEEE Final year Project - o n t e n t s Data mining Transactions Rule Representation, Interchange, and Reasoning in Distributed, Heterogeneous Environments Defeasible

More information

Efficient Case Based Feature Construction

Efficient Case Based Feature Construction Efficient Case Based Feature Construction Ingo Mierswa and Michael Wurst Artificial Intelligence Unit,Department of Computer Science, University of Dortmund, Germany {mierswa, wurst}@ls8.cs.uni-dortmund.de

More information