Causal Models for Scientific Discovery

Size: px
Start display at page:

Download "Causal Models for Scientific Discovery"

Transcription

1 Causal Models for Scientific Discovery Research Challenges and Opportunities David Jensen College of Information and Computer Sciences Computational Social Science Institute Center for Data Science University of Massachusetts Amherst Symposium on Accelerating Science 18 November 2016

2 Sources: The Guardian, July 2005; Wallace Kirkland, for Time

3 Sources: Wikipedia (pile); Argonne National Laboratory (Fermi)

4 Main points Representing and reasoning about causality is central to science and scientific discovery. Understanding of causal inference has advanced tremendously in the past 25 years through the work of several disparate research communities. Several emerging opportunities and challenges exist: Expressiveness Combining data and knowledge from multiple sources to understand complex phenomena Critique Inferring errors in modeling assumptions or problem construction Empirical evaluation Providing realistic empirical tests of methods for causal modeling

5 Causality is central to science

6 Explanation Causality Explanation is a central activity in science. Effective theories explain previously unexplained phenomena Effective explanations generally take the form of a counterfactual ( What would have happened if conditions had been different? ). explanatory relationships are relationships that are potentially exploitable for purposes of manipulation and control.

7 Control & design Causality Sources: Wikipedia (pile)

8 Models Because of this, models in most scientific fields have causal implications (infer how a system would behave under intervention) In contrast, most models in machine learning and statistics have been defined as having only associational semantics. This leads to substantial confusion among researchers from other fields when first encountering machine learning methods.

9 Progress in causal modeling An explicit theory of causal inference has been worked out over the past 20 years by a small group of computer scientists, philosophers, and statisticians. The theory uses directed graphical models to represent causal dependence among variables. That theory provides a formal correspondence between causal models and their observable statistical implications. This correspondence has been exploited to produce a number of algorithms for reasoning with causal graphical models (CGMs). (Pearl 2000, 2009; Spirtes, Glymour, and Scheines 1993, 2001)

10 Key concepts Only statistical dependence is directly observable in data. Causal dependence is not observable. Statistical dependence underdetermines causal dependence ( correlation is not causation ) The observable statistical consequences of a given causal model can be inferred from structure (d-separation) Multiple causal structures produce the same observed statistical dependencies (Markov equivalence). However, some combinations of conditional independence and known causal dependence imply constraints on the space of causal structures, and some uniquely identify causal structures

11 Main points Representing and reasoning about causality is central to science and scientific discovery. Understanding of causal inference has advanced tremendously in the past 25 years through the work of several disparate research communities. Several emerging opportunities and challenges exist: Expressiveness Combining data and knowledge from multiple sources to understand complex phenomena Critique Inferring errors in modeling assumptions or problem construction Empirical evaluation Providing realistic empirical tests of methods for causal modeling

12 Expressiveness

13 Source: Honavar, Hill, & Yelick (2016), Accelerating Science: A Computing Research Agenda

14 Source: Honavar, Hill, & Yelick (2016), Accelerating Science: A Computing Research Agenda

15 Manual Scientific Practice Rarely searches large spaces of formally represented models Relational, Temporal and Spatial Models Machine Learning Rarely analyzes causal dependence Causal Analysis Automated Discovery Causal Discovery Rarely discovers relational, temporal, or spatial models

16 Causal models of independent outcomes A B. Z Causal Process Outcome Variables

17 Causal models of independent outcomes A B C D E F G H I J

18 Key assumption of simple CGMs A B. Z Causal Process Outcome Variables

19 Key assumption of simple CGMs? x Causal Process Multiple Dependent Outcomes

20 Causal models of independent outcomes A B C D E F G H I J

21 Causal models of dependent outcomes A B C K L D M E F G KH N I KJ O P Q R S T (Friedman, Getoor, Koller, & Pfeffer 1999; Heckerman, Meek, & Koller 2007; Maier, Marazopoulou, and Jensen 2013)

22 (Maier, Marazopoulou, and Jensen 2013)

23 (Maier, Marazopoulou, and Jensen 2013)

24 (Maier, Marazopoulou, and Jensen 2013)

25 Causal models of general processes 1: bool c1, c2; 2: int count = 0; 3: c1 = Bernoulli(0.5); 4: if (c1==true) then 5: count = count + 1; 6: c2 = Bernoulli(0.5); 7: if (c2==true) then 8: count = count + 1; 9: observe(c1==true c2==true); 10: return(count); Causal Process Probabilistic Program

26 Critique

27 [To support science, we would expect] that two different kinds of inferential process would be required to put it into effect. The first, used in estimating parameters from data conditional on the truth of some tentative model, is appropriately called Estimation. The second, used in checking whether, in the light of the data, any model of the kind proposed is plausible, has been aptly named Criticism. George Box (emphasis added)

28 Example assumptions Faithfulness Causal Markov assumption Definitions of variables, entities, relationships, etc. Measurement process Temporal granularity of measurement Latent variables, entities, relationships, etc. Structural form of causal dependence Functional form of probabilistic dependence Compositional form Closed world (or form of open world) and many others

29 Empirical evaluation

30 Goals for Empirical Evaluation Approaches Empirical A pre-existing system created by someone other than the researchers. Stochastic Produces non-deterministic experimental results. Identifiable Amenable to direct experimental investigation to estimate interventional distributions Recoverable Lacks memory or irreversible effects, which enables complete state recovery during experiments. Efficient Generates large amounts of data with relatively few resources. Reproducible Fairly easy to recreate nearly identical data sets without access to one-of-a-kind hardware or software.

31 Simple example: Database configuration

32 ML for database configuration (setup) Assume a fixed database and DB server hardware Questions For a given query, what is the expected performance under each set of configuration parameters? For a given query, which configuration will give me the best performance? Data Run 11,252 queries actually run against the Stack Exchange Data Explorer Each query run using one of many different joint values of the configuration parameters using Postgres (Garant & Jensen 2016)

33 CGM for database configuration Retrieved Row Count Page Cost Indexing Memory Level Join Count Table Count Length Group-by Count Total Row Count Block Hits in Cache Block Writes to RAM Block Reads from Disk Block Reads from RAM Year Created Total Queries by User Runtime

34 CGM for database configuration Query Database Retrieved Row Count Page Cost Indexing Memory Level Join Count Table Count Length Group-by Count Total Row Count Block Hits in Cache Block Writes to RAM Block Reads from Disk Block Reads from RAM Year Created Total Queries by User Runtime User Processing

35 CGM for database configuration Query Database Retrieved Row Count Page Cost Indexing Memory Level Join Count Table Count Length Group-by Count Total Row Count Block Hits in Cache Block Writes to RAM Block Reads from Disk Block Reads from RAM Year Created Total Queries by User Runtime User Processing

36 CGM for database configuration Query Database Retrieved Row Count Page Cost Indexing Memory Level Join Count Table Count Length Group-by Count Total Row Count Block Hits in Cache Block Writes to RAM Block Reads from Disk Block Reads from RAM Year Created Total Queries by User Runtime User Processing

37 CGM for database configuration Query Database Retrieved Row Count Page Cost Indexing Memory Level Join Count Table Count Length Group-by Count Total Row Count Block Hits in Cache Block Writes to RAM Block Reads from Disk Block Reads from RAM Year Created Total Queries by User Runtime User Processing

38 Comparing associational and causal models Compare a state-of the-art associational model (a random forest) to a CGM constructed using greedy equivalence search (GES) (Chickering & Meek 2002) Evaluate by comparing to ground truth (experimental results for all queries obtained using a specific joint setting of the configuration parameters). Cache Hits (Garant & Jensen 2016)

39 Comparing associational and causal models Compare a state-of the-art associational model (a random forest) to a CGM constructed using greedy equivalence search (GES) (Chickering & Meek 2002) Evaluate by comparing to ground truth (experimental results for all queries obtained using a specific joint setting of the configuration parameters). Cache Hits (Garant & Jensen 2016)

40 Comparing associational and causal models Compare a state-of the-art associational model (a random forest) to a CGM constructed using greedy equivalence search (GES) (Chickering & Meek 2002) Evaluate by comparing to ground truth (experimental results for all queries obtained using a specific joint setting of the configuration parameters). Disk Reads (Garant & Jensen 2016)

41 Comparing associational and causal models Compare a state-of the-art associational model (a random forest) to a CGM constructed using greedy equivalence search (GES) (Chickering & Meek 2002) Evaluate by comparing to ground truth (experimental results for all queries obtained using a specific joint setting of the configuration parameters). Disk Reads (Garant & Jensen 2016)

42 Comparing associational and causal models Compare a state-of the-art associational model (a random forest) to a CGM constructed using greedy equivalence search (GES) (Chickering & Meek 2002) Evaluate by comparing to ground truth (experimental results for all queries obtained using a specific joint setting of the configuration parameters). Runtime (Garant & Jensen 2016)

43 Comparing associational and causal models Compare a state-of the-art associational model (a random forest) to a CGM constructed using greedy equivalence search (GES) (Chickering & Meek 2002) Evaluate by comparing to ground truth (experimental results for all queries obtained using a specific joint setting of the configuration parameters). Runtime (Garant & Jensen 2016)

44 Main points Representing and reasoning about causality is central to science and scientific discovery. Understanding of causal inference has advanced tremendously in the past 25 years through the work of several disparate research communities. Several emerging opportunities and challenges exist: Expressiveness Combining data and knowledge from multiple sources to understand complex phenomena Critique Inferring errors in modeling assumptions or problem construction Empirical evaluation Providing realistic empirical tests of methods for causal modeling

45 Thanks David Arbour Recent developments in learning causal dependence from bivariate joint distributions in relational data (UAI & KDD 2016) Dan Garant Empirical evaluation of algorithms for learning causal models (UAI 2016) Amanda Gentzel Granger causality methods and empirical evaluation Katerina Marazopoulou Extending causal semantics to temporal models (UAI 2015; 2016) Kaleigh Clary Additive noise models for learning causal dependence from bivariate joint distributions

46 kdl.cs.umass.edu cs.umass.edu/~jensen/ All opinions are mine and not those of any company, agency of the US Government, or the University of Massachusetts Amherst.

Temporal and Relational Models for Causality: Representation and Learning

Temporal and Relational Models for Causality: Representation and Learning University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses 2017 Temporal and Relational Models for Causality: Representation and Learning Katerina Marazopoulou

More information

EXPLORING CAUSAL RELATIONS IN DATA MINING BY USING DIRECTED ACYCLIC GRAPHS (DAG)

EXPLORING CAUSAL RELATIONS IN DATA MINING BY USING DIRECTED ACYCLIC GRAPHS (DAG) EXPLORING CAUSAL RELATIONS IN DATA MINING BY USING DIRECTED ACYCLIC GRAPHS (DAG) KRISHNA MURTHY INUMULA Associate Professor, Symbiosis Institute of International Business [SIIB], Symbiosis International

More information

Learning the Structure of Causal Models with Relational and Temporal Dependence

Learning the Structure of Causal Models with Relational and Temporal Dependence Learning the Structure of Causal Models with Relational and Temporal Dependence Katerina Marazopoulou kmarazo@cs.umass.edu Marc Maier maier@cs.umass.edu College of Information and Computer Sciences University

More information

A Characterization of Markov Equivalence Classes of Relational Causal Models under Path Semantics

A Characterization of Markov Equivalence Classes of Relational Causal Models under Path Semantics A Characterization of Markov Equivalence Classes of Relational Causal Models under Path Semantics Sanghack Lee and Vasant Honavar Artificial Intelligence Research Laboratory College of Information Sciences

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

Integrating locally learned causal structures with overlapping variables

Integrating locally learned causal structures with overlapping variables Integrating locally learned causal structures with overlapping variables Robert E. Tillman Carnegie Mellon University Pittsburgh, PA rtillman@andrew.cmu.edu David Danks, Clark Glymour Carnegie Mellon University

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Introduction to Statistical Relational Learning

Introduction to Statistical Relational Learning Introduction to Statistical Relational Learning Series Foreword Preface xi xiii 1 Introduction 1 Lise Getoor, Ben Taskar 1.1 Overview 1 1.2 Brief History of Relational Learning 2 1.3 Emerging Trends 3

More information

Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs

Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs use Causal Modeling of Observational Cost Data: A Ground-Breaking use of Directed Acyclic Graphs Bob Stoddard Mike Konrad SEMA SEMA November 17, 2015 Public Release; Distribution is Copyright 2015 Carnegie

More information

Evaluating the Effect of Perturbations in Reconstructing Network Topologies

Evaluating the Effect of Perturbations in Reconstructing Network Topologies DSC 2 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc-2/ Evaluating the Effect of Perturbations in Reconstructing Network Topologies Florian Markowetz and Rainer Spang Max-Planck-Institute

More information

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, 2016 1 Introduction Graphical models come in many kinds. There are graphical models where all the variables are categorical (Lauritzen,

More information

A Bayesian Network Approach for Causal Action Rule Mining

A Bayesian Network Approach for Causal Action Rule Mining A Bayesian Network Approach for Causal Action Rule Mining Pirooz Shamsinejad and Mohamad Saraee Abstract Actionable Knowledge Discovery has attracted much interest lately. It is almost a new paradigm shift

More information

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases Abstract and Applied Analysis Volume 2012, Article ID 579543, 9 pages doi:10.1155/2012/579543 Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases Qiang Zhao School

More information

Leveraging D-Separation for Relational Data Sets

Leveraging D-Separation for Relational Data Sets Leveraging D-Separation for Relational Data Sets Matthew J. H. Rattigan, David Jensen Department of Computer Science University of Massachusetts mherst, Masschusetts 01003 bstract Testing for marginal

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Computer Science 591Y Department of Computer Science University of Massachusetts Amherst February 3, 2005 Topics Tasks (Definition, example, and notes) Classification

More information

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Data Mining Technology Based on Bayesian Network Structure Applied in Learning , pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai

More information

Collective Classification with Relational Dependency Networks

Collective Classification with Relational Dependency Networks Collective Classification with Relational Dependency Networks Jennifer Neville and David Jensen Department of Computer Science 140 Governors Drive University of Massachusetts, Amherst Amherst, MA 01003

More information

A Novel Algorithm for Scalable and Accurate Bayesian Network Learning

A Novel Algorithm for Scalable and Accurate Bayesian Network Learning A Novel Algorithm for Scalable and Accurate Bayesian Learning Laura E. Brown, Ioannis Tsamardinos, Constantin F. Aliferis Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt

More information

A Framework for Securing Databases from Intrusion Threats

A Framework for Securing Databases from Intrusion Threats A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Understanding the effects of search constraints on structure learning

Understanding the effects of search constraints on structure learning Understanding the effects of search constraints on structure learning Michael Hay, Andrew Fast, and David Jensen {mhay,afast,jensen}@cs.umass.edu University of Massachusetts Amherst Computer Science Department

More information

Learning Directed Probabilistic Logical Models using Ordering-search

Learning Directed Probabilistic Logical Models using Ordering-search Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001

More information

Quartet-Based Learning of Shallow Latent Variables

Quartet-Based Learning of Shallow Latent Variables Quartet-Based Learning of Shallow Latent Variables Tao Chen and Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology, Hong Kong, China {csct,lzhang}@cse.ust.hk

More information

Learning Bayesian Networks with Discrete Variables from Data*

Learning Bayesian Networks with Discrete Variables from Data* From: KDD-95 Proceedings. Copyright 1995, AAAI (www.aaai.org). All rights reserved. Learning Bayesian Networks with Discrete Variables from Data* Peter Spirtes and Christopher Meek Department of Philosophy

More information

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas

CS639: Data Management for Data Science. Lecture 1: Intro to Data Science and Course Overview. Theodoros Rekatsinas CS639: Data Management for Data Science Lecture 1: Intro to Data Science and Course Overview Theodoros Rekatsinas 1 2 Big science is data driven. 3 Increasingly many companies see themselves as data driven.

More information

Join Bayes Nets: A New Type of Bayes net for Relational Data

Join Bayes Nets: A New Type of Bayes net for Relational Data Join Bayes Nets: A New Type of Bayes net for Relational Data Oliver Schulte oschulte@cs.sfu.ca Hassan Khosravi hkhosrav@cs.sfu.ca Bahareh Bina bba18@cs.sfu.ca Flavia Moser fmoser@cs.sfu.ca Abstract Many

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

Learning Causal Graphs with Small Interventions

Learning Causal Graphs with Small Interventions Learning Causal Graphs with Small Interventions Karthieyan Shanmugam 1, Murat Kocaoglu 2, Alexandros G. Dimais 3, Sriram Vishwanath 4 Department of Electrical and Computer Engineering The University of

More information

Robust Independence-Based Causal Structure Learning in Absence of Adjacency Faithfulness

Robust Independence-Based Causal Structure Learning in Absence of Adjacency Faithfulness Robust Independence-Based Causal Structure Learning in Absence of Adjacency Faithfulness Jan Lemeire Stijn Meganck Francesco Cartella ETRO Department, Vrije Universiteit Brussel, Belgium Interdisciplinary

More information

On Local Optima in Learning Bayesian Networks

On Local Optima in Learning Bayesian Networks On Local Optima in Learning Bayesian Networks Jens D. Nielsen, Tomáš Kočka and Jose M. Peña Department of Computer Science Aalborg University, Denmark {dalgaard, kocka, jmp}@cs.auc.dk Abstract This paper

More information

Counting and Exploring Sizes of Markov Equivalence Classes of Directed Acyclic Graphs

Counting and Exploring Sizes of Markov Equivalence Classes of Directed Acyclic Graphs Counting and Exploring Sizes of Markov Equivalence Classes of Directed Acyclic Graphs Yangbo He heyb@pku.edu.cn Jinzhu Jia jzjia@math.pku.edu.cn LMAM, School of Mathematical Sciences, LMEQF, and Center

More information

Learning Statistical Models From Relational Data

Learning Statistical Models From Relational Data Slides taken from the presentation (subset only) Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne

More information

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION ABSTRACT A Framework for Multi-Agent Multimedia Indexing Bernard Merialdo Multimedia Communications Department Institut Eurecom BP 193, 06904 Sophia-Antipolis, France merialdo@eurecom.fr March 31st, 1995

More information

arxiv: v1 [cs.ai] 11 Oct 2015

arxiv: v1 [cs.ai] 11 Oct 2015 Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00 ParallelPC: an R package for efficient constraint based causal exploration arxiv:1510.03042v1 [cs.ai] 11 Oct 2015 Thuc

More information

Modeling Plant Succession with Markov Matrices

Modeling Plant Succession with Markov Matrices Modeling Plant Succession with Markov Matrices 1 Modeling Plant Succession with Markov Matrices Concluding Paper Undergraduate Biology and Math Training Program New Jersey Institute of Technology Catherine

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada

More information

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation Pelin Angin Purdue University Department of Computer Science pangin@cs.purdue.edu Jennifer Neville Purdue University Departments

More information

Data Sources for Cyber Security Research

Data Sources for Cyber Security Research Data Sources for Cyber Security Research Melissa Turcotte mturcotte@lanl.gov Advanced Research in Cyber Systems, Los Alamos National Laboratory 14 June 2018 Background Advanced Research in Cyber Systems,

More information

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms

Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Evaluating the Explanatory Value of Bayesian Network Structure Learning Algorithms Patrick Shaughnessy University of Massachusetts, Lowell pshaughn@cs.uml.edu Gary Livingston University of Massachusetts,

More information

Mir Abolfazl Mostafavi Centre for research in geomatics, Laval University Québec, Canada

Mir Abolfazl Mostafavi Centre for research in geomatics, Laval University Québec, Canada Mir Abolfazl Mostafavi Centre for research in geomatics, Laval University Québec, Canada Mohamed Bakillah and Steve H.L. Liang Department of Geomatics Engineering University of Calgary, Alberta, Canada

More information

List of figures List of tables Acknowledgements

List of figures List of tables Acknowledgements List of figures List of tables Acknowledgements page xii xiv xvi Introduction 1 Set-theoretic approaches in the social sciences 1 Qualitative as a set-theoretic approach and technique 8 Variants of QCA

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz May 20, 2014 Announcements DB 2 Due Tuesday Next Week The Database Approach to Data Management Database: Collection of related files containing

More information

An Introduction to Probabilistic Graphical Models for Relational Data

An Introduction to Probabilistic Graphical Models for Relational Data An Introduction to Probabilistic Graphical Models for Relational Data Lise Getoor Computer Science Department/UMIACS University of Maryland College Park, MD 20740 getoor@cs.umd.edu Abstract We survey some

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

DISCOVERING PROCESS-VARIABLE-TO-SIGNAL RELATIONSHIPS IN EPICS 3.X AND 4.X *

DISCOVERING PROCESS-VARIABLE-TO-SIGNAL RELATIONSHIPS IN EPICS 3.X AND 4.X * 10th ICALEPCS Int. Conf. on Accelerator & Large Expt. Physics Control Systems. Geneva, 10-14 Oct 2005, PO2.073-5 (2005) DISCOVERING PROCESS-VARIABLE-TO-SIGNAL RELATIONSHIPS IN EPICS 3.X AND 4.X * N.D.

More information

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence Chapter 2 PRELIMINARIES In this chapter the notation is presented and the basic concepts related to the Bayesian network formalism are treated. Towards the end of the chapter, we introduce the Bayesian

More information

BSIT 1 Technology Skills: Apply current technical tools and methodologies to solve problems.

BSIT 1 Technology Skills: Apply current technical tools and methodologies to solve problems. Bachelor of Science in Information Technology At Purdue Global, we employ a method called Course-Level Assessment, or CLA, to determine student mastery of Course Outcomes. Through CLA, we measure how well

More information

Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data

Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data Peter Spirtes a, Clark Glymour b, Richard Scheines a, Stuart Kauffman c, Valerio Aimale c, Frank Wimberly c a Department

More information

The max-min hill-climbing Bayesian network structure learning algorithm

The max-min hill-climbing Bayesian network structure learning algorithm Mach Learn (2006) 65:31 78 DOI 10.1007/s10994-006-6889-7 The max-min hill-climbing Bayesian network structure learning algorithm Ioannis Tsamardinos Laura E. Brown Constantin F. Aliferis Received: January

More information

Latent Relation Representations for Universal Schemas

Latent Relation Representations for Universal Schemas University of Massachusetts Amherst From the SelectedWorks of Andrew McCallum 2013 Latent Relation Representations for Universal Schemas Sebastian Riedel Limin Yao Andrew McCallum, University of Massachusetts

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 34 Dec, 2, 2015 Slide source: from David Page (IT) (which were from From Lise Getoor, Nir Friedman, Daphne Koller, and Avi Pfeffer) and from

More information

Data Engineering Fuzzy Mathematics in System Theory and Data Analysis

Data Engineering Fuzzy Mathematics in System Theory and Data Analysis Data Engineering Fuzzy Mathematics in System Theory and Data Analysis Olaf Wolkenhauer Control Systems Centre UMIST o.wolkenhauer@umist.ac.uk www.csc.umist.ac.uk/people/wolkenhauer.htm 2 Introduction General

More information

Abstract. 2 Background 2.1 Belief Networks. 1 Introduction

Abstract. 2 Background 2.1 Belief Networks. 1 Introduction Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference* David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada

More information

Optimizer Challenges in a Multi-Tenant World

Optimizer Challenges in a Multi-Tenant World Optimizer Challenges in a Multi-Tenant World Pat Selinger pselinger@salesforce.come Classic Query Optimizer Concepts & Assumptions Relational Model Cost = X * CPU + Y * I/O Cardinality Selectivity Clustering

More information

DATABASE MANAGEMENT SYSTEM SUBJECT CODE: CE 305

DATABASE MANAGEMENT SYSTEM SUBJECT CODE: CE 305 DATABASE MANAGEMENT SYSTEM SUBJECT CODE: CE 305 Teaching Scheme (Credits and Hours) Teaching scheme Total Evaluation Scheme L T P Total Credit Theory Mid Sem Exam CIA Pract. Total Hrs Hrs Hrs Hrs Hrs Marks

More information

A System for Identifying Voyage Package Using Different Recommendations Techniques

A System for Identifying Voyage Package Using Different Recommendations Techniques GLOBAL IMPACT FACTOR 0.238 DIIF 0.876 A System for Identifying Voyage Package Using Different Recommendations Techniques 1 Gajjela.Sandeep, 2 R. Chandrashekar 1 M.Tech (CS),Department of Computer Science

More information

NoDB: Querying Raw Data. --Mrutyunjay

NoDB: Querying Raw Data. --Mrutyunjay NoDB: Querying Raw Data --Mrutyunjay Overview Introduction Motivation NoDB Philosophy: PostgreSQL Results Opportunities NoDB in Action: Adaptive Query Processing on Raw Data Ioannis Alagiannis, Renata

More information

Autonomic Computing. Pablo Chacin

Autonomic Computing. Pablo Chacin Autonomic Computing Pablo Chacin Acknowledgements Some Slides taken from Manish Parashar and Omer Rana presentations Agenda Fundamentals Definitions Objectives Alternative approaches Examples Research

More information

Searching for Meaning in the Era of Big Data and IoT

Searching for Meaning in the Era of Big Data and IoT Searching for Meaning in the Era of Big Data and IoT Trung Tran MIT Lincoln Labs GraphEx Conference 11 May 2016 Distribution Statement A MTO Strategy EM Spectrum Tactical Information Extraction Globalization

More information

Structure Estimation in Graphical Models

Structure Estimation in Graphical Models Wald Lecture, World Meeting on Probability and Statistics Istanbul 2012 Structure estimation Some examples General points Advances in computing has set focus on estimation of structure: Model selection

More information

Mobile Wireless Sensor Network enables convergence of ubiquitous sensor services

Mobile Wireless Sensor Network enables convergence of ubiquitous sensor services 1 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Mobile Wireless Sensor Network enables convergence of ubiquitous sensor services Dr. Jian Ma, Principal Scientist Nokia Research Center, Beijing 2 2005

More information

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning

More information

Graphical Models and Markov Blankets

Graphical Models and Markov Blankets Stephan Stahlschmidt Ladislaus von Bortkiewicz Chair of Statistics C.A.S.E. Center for Applied Statistics and Economics Humboldt-Universität zu Berlin Motivation 1-1 Why Graphical Models? Illustration

More information

INFORMATION DYNAMICS: AN INFORMATION-CENTRIC APPROACH TO SYSTEM DESIGN

INFORMATION DYNAMICS: AN INFORMATION-CENTRIC APPROACH TO SYSTEM DESIGN INFORMATION DYNAMICS: AN INFORMATION-CENTRIC APPROACH TO SYSTEM DESIGN Ashok K. Agrawala Ronald L. Larsen Douglas Szajda Department of Computer Science Maryland Applied Information Institute for Advanced

More information

Learning DAGs from observational data

Learning DAGs from observational data Learning DAGs from observational data General overview Introduction DAGs and conditional independence DAGs and causal effects Learning DAGs from observational data IDA algorithm Further problems 2 What

More information

Object-Oriented Programming and Laboratory of Simulation Development

Object-Oriented Programming and Laboratory of Simulation Development Object-Oriented Programming and Laboratory of Simulation Development Marco Valente LEM, Pisa and University of L Aquila January, 2008 Outline Goal: show major features of LSD and their methodological motivations

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

SYED AMMAL ENGINEERING COLLEGE

SYED AMMAL ENGINEERING COLLEGE CS6302- Database Management Systems QUESTION BANK UNIT-I INTRODUCTION TO DBMS 1. What is database? 2. Define Database Management System. 3. Advantages of DBMS? 4. Disadvantages in File Processing System.

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design

More information

Learning of Bayesian Network Structure from Massive Datasets: The Sparse Candidate Algorithm

Learning of Bayesian Network Structure from Massive Datasets: The Sparse Candidate Algorithm Learning of Bayesian Network Structure from Massive Datasets: The Sparse Candidate Algorithm Nir Friedman Institute of Computer Science Hebrew University Jerusalem, 91904, ISRAEL nir@cs.huji.ac.il Iftach

More information

Introduction to DAGs Directed Acyclic Graphs

Introduction to DAGs Directed Acyclic Graphs Introduction to DAGs Directed Acyclic Graphs Metalund and SIMSAM EarlyLife Seminar, 22 March 2013 Jonas Björk (Fleischer & Diez Roux 2008) E-mail: Jonas.Bjork@skane.se Introduction to DAGs Basic terminology

More information

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous

More information

745: Advanced Database Systems

745: Advanced Database Systems 745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.

More information

PROJECT PERIODIC REPORT

PROJECT PERIODIC REPORT PROJECT PERIODIC REPORT Grant Agreement number: 257403 Project acronym: CUBIST Project title: Combining and Uniting Business Intelligence and Semantic Technologies Funding Scheme: STREP Date of latest

More information

Dependency detection with Bayesian Networks

Dependency detection with Bayesian Networks Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov

More information

Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core)

Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core) INTERNATIONAL STANDARD ISO 24617-8 First edition 2016-12-15 Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core)

More information

Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs

Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Authors: Andreas Wagner, Veli Bicer, Thanh Tran, and Rudi Studer Presenter: Freddy Lecue IBM Research Ireland 2014 International

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Counting and Exploring Sizes of Markov Equivalence Classes of Directed Acyclic Graphs

Counting and Exploring Sizes of Markov Equivalence Classes of Directed Acyclic Graphs Journal of Machine Learning Research 16 (2015) 2589-2609 Submitted 9/14; Revised 3/15; Published 12/15 Counting and Exploring Sizes of Markov Equivalence Classes of Directed Acyclic Graphs Yangbo He heyb@pku.edu.cn

More information

Opportunities and challenges in personalization of online hotel search

Opportunities and challenges in personalization of online hotel search Opportunities and challenges in personalization of online hotel search David Zibriczky Data Science & Analytics Lead, User Profiling Introduction 2 Introduction About Mission: Helping the travelers to

More information

Systems Ph.D. Qualifying Exam

Systems Ph.D. Qualifying Exam Systems Ph.D. Qualifying Exam Spring 2011 (March 22, 2011) NOTE: PLEASE ATTEMPT 6 OUT OF THE 8 QUESTIONS GIVEN BELOW. Question 1 (Multicore) There are now multiple outstanding proposals and prototype systems

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

DataZapper: A Tool for Generating Incomplete Datasets

DataZapper: A Tool for Generating Incomplete Datasets DataZapper: A Tool for Generating Incomplete Datasets Yingying Wen, Kevin B. Korb and Ann E. Nicholson Bayesian Intelligence Pty Ltd, 2/21 The Parade, Clarinda, VIC 3169, Australia ying100@yahoo.com, {kevin.korb,ann.nicholson}@bayesian-intelligence.com

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE III SEMESTER - B.E COMPUTER SCIENCE AND ENGINEERING QUESTION BANK - CS6302 DATABASE MANAGEMENT SYSTEMS UNIT I 1. What are the disadvantages of file processing system? 2.

More information

COS 513: Foundations of Probabilistic Modeling. Lecture 5

COS 513: Foundations of Probabilistic Modeling. Lecture 5 COS 513: Foundations of Probabilistic Modeling Young-suk Lee 1 Administrative Midterm report is due Oct. 29 th. Recitation is at 4:26pm in Friend 108. Lecture 5 R is a computer language for statistical

More information

GESIA: Uncertainty-Based Reasoning for a Generic Expert System Intelligent User Interface

GESIA: Uncertainty-Based Reasoning for a Generic Expert System Intelligent User Interface GESIA: Uncertainty-Based Reasoning for a Generic Expert System Intelligent User Interface Robert A. Harrington, Sheila Banks, and Eugene Santos Jr. Air Force Institute of Technology Department of Electrical

More information

Current State of ontology in engineering systems

Current State of ontology in engineering systems Current State of ontology in engineering systems Henson Graves, henson.graves@hotmail.com, and Matthew West, matthew.west@informationjunction.co.uk This paper gives an overview of the current state of

More information

An Approach to Inference in Probabilistic Relational Models using Block Sampling

An Approach to Inference in Probabilistic Relational Models using Block Sampling JMLR: Workshop and Conference Proceedings 13: 315-330 2nd Asian Conference on Machine Learning (ACML2010), Tokyo, Japan, Nov. 8 10, 2010. An Approach to Inference in Probabilistic Relational Models using

More information

Local Search Methods for Learning Bayesian Networks Using a Modified Neighborhood in the Space of DAGs

Local Search Methods for Learning Bayesian Networks Using a Modified Neighborhood in the Space of DAGs Local Search Methods for Learning Bayesian Networks Using a Modified Neighborhood in the Space of DAGs L.M. de Campos 1, J.M. Fernández-Luna 2, and J.M. Puerta 3 1 Dpto. de Ciencias de la Computación e

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

Proposal for a scalable class of graphical models for Social Networks

Proposal for a scalable class of graphical models for Social Networks Proposal for a scalable class of graphical models for Social Networks Anna Goldenberg November 24, 2004 Abstract This proposal is about new statistical machine learning approaches to detect evolving relationships

More information

Logik für Informatiker Logic for computer scientists

Logik für Informatiker Logic for computer scientists Logik für Informatiker for computer scientists WiSe 2011/12 Overview Motivation Why is logic needed in computer science? The LPL book and software Scheinkriterien Why is logic needed in computer science?

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

A Heuristic Approach for Web log mining using Bayesian. Networks

A Heuristic Approach for Web log mining using Bayesian. Networks A Heuristic Approach for Web log mining using Bayesian Networks Abstract Nanasaheb Kadu* Devendra Thakore Bharti Vidyapeet College of Engineering, BV Deemed Univesity,Pune,India * E-mail of the corresponding

More information

Deduplication of Hospital Data using Genetic Programming

Deduplication of Hospital Data using Genetic Programming Deduplication of Hospital Data using Genetic Programming P. Gujar Department of computer engineering Thakur college of engineering and Technology, Kandiwali, Maharashtra, India Priyanka Desai Department

More information

A Discovery Algorithm for Directed Cyclic Graphs

A Discovery Algorithm for Directed Cyclic Graphs A Discovery Algorithm for Directed Cyclic Graphs Thomas Richardson 1 Logic and Computation Programme CMU, Pittsburgh PA 15213 1. Introduction Directed acyclic graphs have been used fruitfully to represent

More information

W3C Provenance Incubator Group: An Overview. Thanks to Contributing Group Members

W3C Provenance Incubator Group: An Overview. Thanks to Contributing Group Members W3C Provenance Incubator Group: An Overview DRAFT March 10, 2010 1 Thanks to Contributing Group Members 2 Outline What is Provenance Need for

More information