Exam Advanced Data Mining Date: Time:

Similar documents
Building Classifiers using Bayesian Networks

CS Machine Learning

Data Mining Concepts & Techniques

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Tree CE-717 : Machine Learning Sharif University of Technology

CSE4334/5334 DATA MINING

Data Mining and Knowledge Discovery: Practice Notes

7. Decision or classification trees

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining Classification - Part 1 -

Part I. Instructor: Wei Ding

Classification with Decision Tree Induction

Contents. Preface to the Second Edition

Lecture 7: Decision Trees

Performance Evaluation of Various Classification Algorithms

Network Traffic Measurements and Analysis

Decision Tree Learning

Classification. Instructor: Wei Ding

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

10-701/15-781, Fall 2006, Final

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Analytics. Qualification Exam, May 18, am 12noon

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan

Supervised Learning Classification Algorithms Comparison

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

STA 4273H: Statistical Machine Learning

Decision tree learning

Bayesian Networks Inference (continued) Learning

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

10708 Graphical Models: Homework 2

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Classification Algorithms in Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

10601 Machine Learning. Model and feature selection

Data Mining and Knowledge Discovery: Practice Notes

Midterm Examination CS540-2: Introduction to Artificial Intelligence

CPSC 340: Machine Learning and Data Mining

Bayesian Classification Using Probabilistic Graphical Models

Knowledge Discovery and Data Mining

Decision Tree Learning

Faster parameterized algorithms for Minimum Fill-In

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Machine Learning

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Classification and Regression

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Lecture 2 :: Decision Trees Learning

Classification and Regression Trees

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Classification and Regression Trees

Data Preparation. Nikola Milikić. Jelena Jovanović

Lecture Notes for Chapter 4

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

CS 540: Introduction to Artificial Intelligence

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

Data Mining Algorithms: Basic Methods

CS 540-1: Introduction to Artificial Intelligence

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

Probabilistic Classifiers DWML, /27

Lecture outline. Decision-tree classification

The Curse of Dimensionality

Building Intelligent Learning Database Systems

Algorithms: Decision Trees

COMP 465: Data Mining Classification Basics

Data Mining in Bioinformatics Day 1: Classification

Lecture 7. Data Stream Mining. Building decision trees

Leaning Graphical Model Structures using L1-Regularization Paths (addendum)

Nearest neighbor classification DSE 220

Data Mining and Knowledge Discovery: Practice Notes

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Classification Salvatore Orlando

Chapter 7: Frequent Itemsets and Association Rules

Machine Learning. Sourangshu Bhattacharya

Extra readings beyond the lecture slides are important:

CS 229 Midterm Review

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

Dependency detection with Bayesian Networks

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

15-780: Graduate Artificial Intelligence. Decision trees

Data Mining and Knowledge Discovery: Practice Notes

To earn the extra credit, one of the following has to hold true. Please circle and sign.

h=[3,2,5,7], pos=[2,1], neg=[4,4]

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Advanced learning algorithms

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Classification. Slide sources:

Ensemble Methods, Decision Trees

Transcription:

Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your calculations. 3. If you are a native speaker, answers in Dutch are preferred. 4. There are six questions, for which you can score a total of 100 points. Question 1 Multiple Choice (16 points) For the following questions, zero or more answers may be true. a) Which of the following statements are true? 1. An important difference between induction and deduction is that deductive reasoning is truth-preserving, while in an inductive argument, the conclusion may be false even though the premises are true. 2. In data mining we primarily deal with observational data rather than experimental data. 3. If the data to be analysed has missing values, it is justified to exclude the incomplete observations from the analysis, as long as there are enough complete observations left. 4. In data mining projects, the analysis phase is generally the most time consuming. b) Which of the following statements about classification trees are true? 1. The resubstitution error of a tree never goes up when we split one of its leaf nodes. 2. In growing a tree, we can always continue splitting until each leaf node contains examples of a single class, but the resulting tree will be overfitted. 1

3. If both children produced by a split have the same majority class, then the impurity reduction of the split is zero. 4. For classification problems with more than two classes, in order to determine the optimal split for a categorical attribute with L distinct values, we have to compute 2 L 1 1 possible splits. c) Which of the following statements about frequent pattern mining are true? 1. If all the proper subsets of an itemset are frequent, then the itemset itself must also be frequent. 2. All maximal frequent itemsets are closed. 3. If we only know the maximal frequent itemsets and their support, we can infer from that all frequent itemsets and their support. 4. For an association rule, if we move one item from the right-hand-side to the left-hand-side of the rule, then the confidence will never go down. d) In Pattern Set Mining which of the following are general criteria to determine the quality of a pattern set? 1. Surprisingness. 2. Complexity. 3. Descriptiveness. 4. Understandability. Question 2 Frequent Itemset Mining (25 points) Given are the following five transactions on items {A, B, C, D, E}: tid items 1 {A, B} 2 {A, B, D} 3 {B, D, E} 4 {B, C, D, E} 5 {A, B, C} a) Use the Apriori algorithm to compute all frequent itemsets, and their support, with minimum support 2. It is important that you clearly indicate the steps of the algorithm. b) Use the A-close algorithm to compute all closed frequent itemsets, with minimum support 2. It is important that you clearly indicate the steps of the algorithm. 2

c) Use all closed frequent itemsets computed at (b) to construct a krimp codetable, and compute how often each itemset in the codetable is used in covering the database. Also compute the optimal code length for each itemset in the codetable. Note: Here you should use logarithms with base 2. d) Suppose that in addition to the transactions, we also have information about the price of each item. Consider constraints of the type max(i.price) min(i.price) c where max(i.price) denotes the maximum of the prices of the items in itemset I, and c is some positive constant. Suppose that we want to find all frequent itemsets that also satisfy a constraint of this type. Can we use this constraint in an Apriori style levelwise search to further prune the search space, while still guaranteeing completeness of the results? Explain your answer. Question 3 Undirected Graphical Models (15 points) Let M be an (undirected) graphical model on discrete variables (A, B, C, D) with independence graph: D A B C a) Give all pairwise conditional independencies that hold for this model. b) Give the observed = fitted margin constraints that hold for the maximum likelihood fitted counts of M. c) Use the constraints of (b) together with the conditional independence properties of M, to find an expression for the maximum likelihood fitted counts ˆn(A, B, C, D) in terms of margins of the observed counts n(a, B, C, D). 3

Question 4 Bayesian Network Classifiers (20 points) The Naive Bayes classifier makes the fundamental assumption that the attributes are independent within each class. a) Explain why the Naive Bayes Classifier often performs quite well (in terms of the error-rate on a test sample), even when its independence assumption is not satisfied. To relax the independence assumption, one can allow some (restricted) dependencies between the attributes. A well-studied example are the Tree Augmented Naive Bayes (tan) classifiers. b) For a problem with k binary attributes, and a class label with m possible values, how many parameters does a tan classifier have? c) Explain why a tan classifier has the same independence properties as the undirected graph obtained by simply dropping the direction of all its edges. d) In the article of Friedman et al. many different methods to use a Bayesian Network for classification are studied. One of them is to use a standard structure learning algorithm using BIC (MDL) as the score function, and then to use the Markov blanket of the class variable in the resulting network for classification. It is shown that for datasets with many attributes, this method tends to produce poor results. Explain why. e) One of the advantages of restricting the structure on the attributes to trees is that there is a polynomial time algorithm to compute the optimal structure. Describe the steps of this algorithm. Question 5 Bayesian Networks (10 points) Consider a heuristic search for a Bayesian Network that maximizes the BIC score BIC(M) = L(M) ln n 2 dim(m). Here ln denotes the natural logarithm. The algorithm performs a hill-climbing search where the neighbours of the current model are obtained by either 1. removing an arrow from the current model 2. adding an arrow to the current model 3. turning an arrow of the current model around 4

The current model on (X 1, X 2, X 3 ) is: 1 2 3 a) Give all neighbours of the current model, and indicate which neighbours are equivalent to eachother, in the sense that they encode the same independence properties. b) Assume that each variable has 3 possible values, and that we have a data set with 100 observations. For each neighbour model that can be obtained by adding an arc, indicate how much the loglikelihood part of the BIC score should increase for it to have a better BIC score than the current model. Question 6 Classification Trees (14 points) The tree given below, denoted by T max, has been constructed on the training sample: t 1 60 40 t 2 30 10 t 3 30 30 t 4 30 5 t 5 0 5 t 6 0 20 t 7 30 10 t 8 25 5 t 9 5 5 In each node, the number of observations with class 0 is given in the left part, and the number of observations with class 1 in the right part. The leaf nodes have been drawn as rectangles. a) Compute the impurity of nodes t 1, t 2 and t 3 according to the gini-index. Give the impurity reduction achieved by the first split. b) Compute the cost-complexity pruning sequence T 1 > T 2 >... > {t 1 }, where T 1 is the smallest minimizing subtree of T max for α = 0. For each tree in the sequence, give the interval of α values for which it is the smallest minimizing subtree of T max. 5