Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM

Practice notes 4..9 Practice plan Data Mining and Knowledge Discovery Knowledge Discovery and Knowledge Management in e-science Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 9//4 9//: Predictive data mining Decision trees Naïve Bayes classifier Evaluating classifiers (separate test set, cross validation, confusion matrix, classification accuracy) Predictive data mining in Weka 9//4: Numeric prediction and descriptive data mining Numeric prediction Association rules Regression models and evaluation in Weka Descriptive data mining in Weka Discussion about seminars and exam 9//8: Written exam //6: Seminar proposal presentations 9/3/: deadline for data mining papers (written seminar) 9/3/3: Data mining seminar presentations Numeric prediction Data: attribute-value description Classification Numeric prediction Baseline, Linear Regression, Regression tree, Model Tree, KNN 3 Continuous Evaluation: cross validation, separate test set, MSE, MAE, RMSE, Algorithms: Linear regression, regression trees, Baseline predictor: Mean of the target variable Categorical (nominal) -accuracy Algorithms: Decision trees, Naïve Bayes, Baseline predictor: Majority class 4 Example Test set data about 8 people: and.5.5 5 5 6

Practice notes 4..9 Baseline numeric predictor Average of the target variable.8.6.4..8.6.4. 4 6 8 Average predictor 7 Baseline predictor: prediction Average of the target variable is.63 8 Linear Regression Model =.56 * +.48.5 Linear Regression: prediction =.56 * +.48.5.5 Prediction 4 6 8 9 Regression tree Regression tree: prediction.5.5 Prediction 5

Practice notes 4..9 Model tree Model tree: prediction.5.5 Prediction 4 6 8 3 4 KNN K nearest neighbors Looks at K closest examples (by non-target attributes) and predicts the average of their target variable In this example, K=3..8.6.4...8.6.4.. Prediction KNN, n=3 4 6 8 5.9.99. 3.3 3.7 5.9 5.7 6 8.36 8.33 9.45 9.39.49.66.5 3.59 4.58 7 3.57 3.88 3.7 34.55 37.65 37.8 38.6 39.69 39.8 8 3

Practice notes 4..9 67.56 67.87 69.67 69.86 7.74 7.8 7.7 76.88 Which predictor is the best? Baseline Linear regression Regressi on tree Model tree knn.85.63.43.39...4.63.47.46.47.44 35.7.63.6.7.7.67 7.6.63.8.7.75.77 9 Evaluating numeric prediction. non sufficient. sufficient Classification or a numeric prediction problem? Target variable with five.non sufficient.sufficient 3.good 4.very good 5.excellent Classification: the misclassification cost is the same if non sufficient is classified as sufficient or if it is classified as very good Numeric prediction: The error of predicting when it should be is, while the error of predicting 5 instead of is 4. If we have a variable with ordered values, it should be considered numeric. 3 Nominal or numeric attribute? A variable with five.non sufficient.sufficient 3.good 4.very good 5.excellent Nominal: Numeric: If we have a variable with ordered values, it should be considered numeric. 4 4

Practice notes 4..9. non sufficient. sufficient 5 Can KNN be used for classification tasks? YES. In numeric prediction tasks, the average of the neighborhood is computed In classification tasks, the distribution of the classes in the neighborhood is computed 6. non sufficient. sufficient 7 Similarities between KNN and Naïve Bayes. Both are black box models, which do not give the insight into the data. Both are lazy classifiers : they do not build a model in the training phase and use it for predicting, but they need the data when predicting the value for a new example (partially true for Naïve Bayes) 8 Regression trees Data: attribute-value description Decision trees. non sufficient. sufficient 9 Continuous MSE, MAE, RMSE, Algorithm: Top down induction, shortsighted method Heuristic: Standard deviation Stopping criterion: Standard deviation< threshold Categorical (nominal) Evaluation: cross validation, separate test set, -accuracy Heuristic : Information gain Stopping criterion: Pure leafs (entropy=) 3 5

Practice notes 4..9 Association rules Association Rules Rules X Y, X, Y conjunction of items Task: Find all association rules that satisfy minimum support and minimum confidence constraints - Support: Sup(X Y) = #XY/#D p(xy) - Confidence: Conf(X Y) = #XY/#X p(xy)/p(x) = p(y X) 3 3 Association rules - algorithm. generate frequent itemsets with a minimum support constraint. generate rules from frequent itemsets with a minimum confidence constraint * Data are in a transaction database Association rules transaction database Items: A=apple, B=banana, C=coca-cola, D=doughnut Client bought: A, B, C, D Client bought: B, C Client 3 bought: B, D Client 4 bought: A, C Client 5 bought: A, B, D Client 6 bought: A, B, C 33 34 Frequent itemsets Frequent itemsets algorithm Generate frequent itemsets with support at least /6 35 Items in an itemset should be sorted alphabetically. Generate all -itemsets with the given minimum support. Use -itemsets to generate -itemsets with the given minimum support. From -itemsets generate 3-itemsets with the given minimum support as unions of -itemsets with the same item at the beginning. From n-itemsets generate (n+)-itemsets as unions of n-itemsets with the same (n- ) items at the beginning. 36 6

Practice notes 4..9 Frequent itemsets lattice Rules from itemsets Frequent itemsets: A&B, A&C, A&D, B&C, B&D A&B&C, A&B&D 37 A&B is a frequent itemset with support 3/6 Two possible rules A B confidence = #(A&B)/#A = 3/4 B A confidence = #(A&B)/#B = 3/5 All the counts are in the itemset lattice! 38 Quality of association rules Quality of association rules Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Conviction(X Y) = -Support(Y)/(-Confidence(X Y)) Support(X) = #X / #D. P(X) Support(X Y) = Support (XY) = #XY / #D P(XY) Confidence(X Y) = #XY / #X P(Y X) Lift(X Y) = Support(X Y) / (Support (X)*Support(Y)) How many more times the items in X and Y occur together then it would be expected if the itemsets were statistically independent. Leverage(X Y) = Support(X Y) Support(X)*Support(Y) Similar to lift, difference instead of ratio. Conviction(X Y) = -Support(Y)/(-Confidence(X Y)) Degree of implication of a rule. Sensitive to rule direction. 39 4 Discussion Transformation of an attribute-value dataset to a transaction dataset. What would be the association rules for a dataset with two items A and B, each of them with support 8% and appearing in the same transactions as rarely as possible? minsupport = 5%, min conf = 7% minsupport = %, min conf = 7% What if we had 4 items: A, A, B, B Compare decision trees and association rules regarding handling an attribute like PersonID. What about attributes that have many values (eg. Month of year) 4 7