Slides taken from the presentation (subset only) Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne Koller, Stanford Avi Pfeffer, Harvard Ben Taskar, Stanford
Outline Motivation and Background PRMs w/ Attribute Uncertainty PRMs w/ Link Uncertainty PRMs w/ Class Hierarchies
Discovering Patterns in Structured Data Strain Patient Treatment
Learning Statistical Models Traditional approaches work well with flat representations fixed length attribute-value vectors assume independent (IID) sample Problems: introduces statistical skew loses relational structure incapable of detecting link-based patterns must fix attributes in advance Patient flatten
Probabilistic Relational Models Combine advantages of relational logic & Bayesian networks: natural domain modeling: objects, properties, relations; generalization over a variety of situations; compact, natural probability models. Integrate uncertainty with relational model: properties of domain entities can depend on properties of related entities; uncertainty over relational structure of domain.
Relational Schema Infected with Strain Unique Infectivity -Type Close- Patient Skin-Test Homeless Age HIV-Result Interacted with Ethnicity Disease-Site Describes the types of objects and relations in the database
Probabilistic Relational Model Patient Homeless POB Strain Unique Infectivity HIV-Result Disease Site P H, C P(T H, C) Cont.Transmitted f, f 0. 9 0. 1 Cont.Close- f, t 0. 8 0. 2 Cont.or.HIV t, f 0. 7 0. 3 t, t 0. 6 0. 4 Age -Type Close- Transmitted
Relational Skeleton Strain s2 Strain s1 Patient p1 Patient p2 c1 c2 c3 Patient p3 Fixed relational skeleton σ set of objects in each class relations between them Uncertainty over assignment of values to attributes PRM defines distribution over instantiations of attributes
A Portion of the BN P1.POB C1.Age P1.Homeless P1.HIV-Result true P1.Disease Site C1.-Type C1.Close- false C1.Transmitted C2.Age H,, C f,, f f,, t t t t,, f t t,, t t P(T H, C) 0.. 9 0.. 1 0.. 8 0.. 2 0.. 7 0.. 3 0.. 6 0.. 4 C2.-Type C2.Close- true C2.Transmitted
PRM: Aggregate Dependencies Patient Homeless POB -Type HIV-Result Disease Site Age Age Close- Transmitted Patient Jane Doe POB US Homeless no HIV-Result negative Age??? A Disease Site pulmonary mode #5077 -Type coworker Close- no Age. middle-aged Transmitted false sum, min, max, avg, count, #5076 -Type spouse Close- yes Age. middle-aged Transmitted true #5075 -Type friend Close- no Age. middle-aged Transmitted false
PRM with AU Semantics Patient Strain Strain s1 Strain s2 Patient p2 Patient p1 Patient p3 c1 c2 c3 PRM + relational skeleton σ = probability distribution over completions I: Objects Attributes
Learning PRMs w/ AU Database Strain Patient Strain Patient Relational Schema Parameter estimation Structure selection
Parameter Estimation in PRMs Assume known dependency structure S Goal: estimate PRM parameters θ entries in local probability models, θ is good if it is likely to generate the observed data, instance I. MLE Principle: Choose θ so as to maximize l
ML Parameter Estimation Patient HIV DiseaseSite Close θ = Transmitted H, C P(T H, C) f, Cont.Transmitted f?? P P f, Cont.Close- t?? Cont.or.HIV t, f?? t, t?? Query for counts: Count Patient table table
Idea: Structure Selection define scoring function do local search over legal structures Key Components: legal models scoring models searching model space
Idea: Structure Selection define scoring function do local search over legal structures Key Components:» legal models scoring models searching model space
Legal Models PRM defines a coherent probability model over a skeleton σ if the dependencies between object attributes are acyclic Researcher Prof. Gump Reputation high sum author-of P1 Accepted yes P2 Accepted yes How do we guarantee that a PRM is acyclic for every skeleton?
Attribute Stratification PRM dependency structure S.Accecpted Researcher.Reputation dependency graph if Researcher.Reputation depends directly on.accepted Attribute stratification: dependency graph acyclic acyclic for any σ Algorithm more flexible; allows certain cycles along guaranteed acyclic relations
Idea: Structure Selection define scoring function do local search over legal structures Key Components: legal models» scoring models searching model space
Scoring Models Bayesian approach: Standard approach to scoring models; used in Bayesian network learning
Idea: Structure Selection define scoring function do local search over legal structures Key Components: legal models scoring models» searching model space
Searching Model Space Phase 0: consider only dependencies within a class Strain Patient Strain Patient Strain Patient
Phased Structure Search Phase 1: consider dependencies from neighboring classes, via schema relations Strain Patient Strain Patient Strain Patient
Phased Structure Search Phase 2: consider dependencies from further classes, via relation chains Strain Patient Strain Patient Strain Patient
Issue PRM w/ AU applicable only in domains where we have full knowledge of the relational structure Next we introduce PRMs which allow uncertainty over relational structure
PRMs w/ Link Uncertainty Advantages: Applicable in cases where we do not have full knowledge of relational structure Incorporating uncertainty over relational structure into probabilistic model can improve predictive accuracy Two approaches: Reference uncertainty Existence uncertainty Different probabilistic models; varying amount of background knowledge required for each
Citation Relational Schema Author Institution Research Area Wrote Word1 Word2 WordN Citing Cites Count Cited Word1 Word2 WordN
Attribute Uncertainty Author Research Area Institution P( Institution Research Area) Wrote P(.Author.Research Area P( WordN ) Word1... WordN
Reference Uncertainty Bibliography 1. -----? ` 2. -----? 3. -----? Scientific Document Collection
PRM w/ Reference Uncertainty Words Cites Cited Citing Words Dependency model for foreign keys Naïve Approach: multinomial over primary key noncompact limits ability to generalize
Reference Uncertainty Example P5 P4 P3 M2 P1 AI AI AI AI Theory P5 AI P3 AI P1. = AI P4 P2 Theory Theory P1 Theory P2. = Theory Cited. P1 P2 Words Cites P1 P2 Theory Cited 0. 1 0. 9 0. 3 0. 7 AI Citing 0. 99 0. 01
PRMs w/ RU Semantics Words Cites Cited Citing Words P2 P5 P4 AI Theory P3 P1 Theory AI??? Reg Reg Reg Cites Reg P2 P5 P4 AI Theory P3 P1 Theory AI??? PRM RU entity skeleton σ PRM-RU + entity skeleton σ probability distribution over full instantiations I
Learning PRMs w/ RU Idea: just like in PRMs w/ AU define scoring function do greedy local structure search Issues: expanded search space construct partitions new operators
Learning Idea: define scoring function do phased local search over legal structures Key Components: legal models scoring models PRMs w/ RU model new dependencies unchanged searching model space new operators
Structure Search: New Operators Words Cites Cited Citing Words Author Institution Citing s 1.0 Institution = MIT = AI
PRMs w/ RU Summary Define semantics for uncertainty over foreign-key values Search now includes operators Refine and Abstract for constructing foreign-key dependency model Provides one simple mechanism for link uncertainty
Existence Uncertainty??? Document Collection Document Collection
PRM w/ Exists Uncertainty Words Cites Exists Words Dependency model for existence of relationship
Exists Uncertainty Example Words Cites Exists Words Citer. Cited. False True Theory Theory 0.995 0.005 Theory AI 0.999 0.001 AI Theory 0.997 0.003 AI AI 0.992 0.008
PRMs w/ EU Semantics Words Cites Exists Words P2 P5 P4 AI Theory P3 P1 Theory AI?????? P2 P5 P4 AI Theory P3 P1 Theory AI??? PRM EU object skeleton σ PRM-EU + object skeleton σ probability distribution over full instantiations I
Learning PRMs w/ EU Idea: just like in PRMs w/ AU define scoring function do greedy local structure search Issues: efficiency Computation of sufficient statistics for exists attribute Do not explicitly consider relations that do not exist
Structure Selection PRMs w/ EU Idea: define scoring function do phased local search over legal structures Key Components: legal models model new dependencies scoring models unchanged searching model space unchanged
PRMs w/ Class Hierarchies Allows us to: Refine a heterogenous class into more coherent subclasses Refine probabilistic model along class hierarchy Can specialize/inherit CPDs Construct new dependencies that were originally acyclic Provides bridge from class-based model to instance-based model
PRM-CH TV-Program Genre Budget Time-slot Network Vote Program Voter Ranking Person Age Gender Education Income TV-Program Relational Schema SitCom Drama Documentary Budget TV -Program Legal-Drama Medical-Drama SoapOpera Class Hierarchy Budget SitCom Budget Drama Budget Documentary Budget Legal-Drama Budget Medical-Drama Budget SoapOpera Dependency Model Koller & Pfeffer 1998 Pfeffer 2000
Learning PRM-CHs Vote Database: Instance I TVProgram Person Vote TVProgram Relational Schema Person Class hierarchy provided Learn class hierarchy
Bayesian Model Selection for PRMs PRM-CHs Idea: define scoring function do phased local search over legal structures Key Components: scoring models unchanged searching model space new operators
Guaranteeing Acyclicity with Subclasses Vote Program Voter Ranking Soap-Vote Program Voter Ranking Doc-Vote Program Voter Ranking Vote.Ranking Soap-Vote.Ranking Doc-Vote.Ranking Vote.Class
Scenario 1: Class hierarchy is provided New Operators Specialize/Inherit Learning PRM-CH Budget TV -Program Budget SitCom Budget Drama Budget Documentary Budget Legal-Drama Budget Medical-Drama Budget SoapOpera
Learning Class Hierarchy Issue: partially observable data set Construct decision tree for class defined over attributes observed in training set New operator Split on class attribute Related class attribute documentary class1 English TV-Program.Genre sitcom TV-.Network.Nationality class2 French drama class3 American class4 class5 class6
PRM-CH Summary PRMs with class hierarchies are a natural extension of PRMs: Specialization/Inheritance of CPDs Allows new dependency structures Provide bridge from class-based to instancebased models Learning techniques proposed Need efficient heuristics Empirical validation on real-world domains
Conclusions PRMs can represent distribution over attributes from multiple tables PRMs can capture link uncertainty PRMs allow inferences about individuals while taking into account relational structure (they do not make inapproriate independence assuptions)
Selected Publications Learning Probabilistic Models of Link Structure, L. Getoor, N. Friedman, D. Koller and B. Taskar, JMLR 2002. Probabilistic Models of Text and Link Structure for Hypertext Classification, L. Getoor, E. Segal, B. Taskar and D. Koller, IJCAI WS Text Learning: Beyond Classification, 2001. Selectivity Estimation using Probabilistic Models, L. Getoor, B. Taskar and D. Koller, SIGMOD-01. Learning Probabilistic Relational Models, L. Getoor, N. Friedman, D. Koller, and A. Pfeffer, chapter in Relation Data Mining, eds. S. Dzeroski and N. Lavrac, 2001. see also N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, IJCAI-99. Learning Probabilistic Models of Relational Structure, L. Getoor, N. Friedman, D. Koller, and B. Taskar, ICML-01. From Instances to Classes in Probabilistic Relational Models, L. Getoor, D. Koller and N. Friedman, ICML Workshop on Attribute-Value and Relational Learning: Crossing the Boundaries, 2000. Notes from AAAI Workshop on Learning Statistical Models from Relational Data, eds. L.Getoor and D. Jensen, 2000. Notes from IJCAI Workshop on Learning Statistical Models from Relational Data, eds. L.Getoor and D. Jensen, 2003. See http://www.cs.umd.edu/~getoor