Data Warehousing & Data Mining

Size: px
Start display at page:

Download "Data Warehousing & Data Mining"

Transcription

1 Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

2 Summary Last week: Sequence Patterns: Generalized Sequential Patterns based on the Apriori property Time-Series Trend Analysis: Regression Analysis, Moving Averages, etc. Similarity Search This week Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

3 10. Classification 10. Classification 10.1 Decision Trees based Classification 10.2 Naive Bayesian Classification 10.3 Support Vector Machines (SVM) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

4 10.0 Classification What is classification? Given is a collection of records (training set ) Each record consists of a set of attributes, plus a specific class attribute Find a model for the class attribute as a function of the values of other attributes Goal: new records should be assigned to some class as accurately as possible A test set is used to determine the accuracy of the model Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

5 10.0 Classification How is this useful? Financial field Credit approval process Targeted marketing Classifying credit card transactions as legitimate or fraudulent Categorizing news stories as finance, weather, entertainment, sports, etc. DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

6 Classification How does it work? Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No Induction Learning algorithm Learn Model 10 No Small 90K Yes Training Set Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? Apply Model Model 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? Deduction 15 No Large 67K? Test Set DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

7 10.0 Classification Example: credit approval Step 1: learning (induction) Training data is analyzed by some classification algorithm and the learned model is coded into classification rules Training data Classification algorithm Name Age Income Decision Jones Young Low Risky Lee Young Low Risky Fox Middle High Safe Field Middle Low Risky Smith Senior Low Safe Phips Senior Medium Safe Classification rules IF age=young THEN decision=risky IF income=high THEN decision=safe IF age=middle AND income=low THEN decision=risky DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

8 10.0 Classification Step 2: classification (deduction) Test data validates the accuracy of the classification rules If the accuracy is considered acceptable, then the rules can be applied to the classification of new records Classification rules New data Name Age Income Decision Bing Senior Low Safe New data (Henry, Middle, Low) Loan decision? Crest Middle Low Risky Cox Middle High Safe Risky DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

9 10.0 Classification Supervised learning The training data (observations, measurements, etc.) is accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning (next lecture) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

10 10.0 Classification Prominent classification techniques Decision Tree based Methods Rule-based Methods Naive Bayes and Bayesian Belief Networks Support Vector Machines (SVM) Neural Networks DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

11 10.1 Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution E.g., decision making Who buys a computer? age? <=30 > student? yes credit rating? no yes fair excellent no yes no yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

12 10.1 Decision Trees Decision tree induction Many Algorithms: Hunt s Algorithm One of the earliest methods ID3 and its successor, C4.5 Represents a benchmark to supervised learning algorithms Based on the Hunt algorithm Classification and Regression Trees (CART) Similar to ID3 SLIQ,SPRINT DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

13 10.1 Decision Tree Induction Hunt s algorithm, general structure Let D t be the set of training records that reach a node t General Procedure: If D t contains records that belong the same class y t, then t is a leaf node labeled as y t If D t is an empty set, then t is a leaf node labeled by the default class, y d If D t contains records that belong to more than one class, use an attribute test to split the data into smaller subsets: recursively apply the procedure to each subset?? D t DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

14 10.1 Hunts Algorithm Example, VAT refund Yes Don t Cheat Don t Cheat Refund Single, Divorced Cheat No Marital Status Yes Don t Cheat Refund Married Don t Cheat No Don t Cheat Yes Don t Cheat Don t Cheat Refund Single, Divorced Taxable Income No Marital Status < 80K >= 80K Cheat Tid Refund Marital Taxable Status Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Married Don t Cheat DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

15 10.1 Hunts Algorithm Greedy strategy Split the records based on an attribute test that optimizes a certain criterion Issues Determine how to split the records How to specify the attribute test condition? How to determine the best split? Determine when to stop splitting DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

16 10.1 Attribute test condition Splitting: how to specify the attribute test condition? Depends on attribute types Nominal e.g., car: sports, luxury, family Ordinal e.g., small, medium large Continuous e.g, age Depends on number of ways to split Binary split Multi-way split Sports, Luxury Car type? Family Small Family Medium Car type? Sports Car type? Large Luxury DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

17 10.1 Attribute test condition What about splitting continuous attributes? Discretization to form an ordinal categorical attribute Static discretize once at the beginning Dynamic ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles), clustering, or supervised clustering Binary decision Consider all possible splits and finds the best cut Can be quite computationally expensive DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

18 10.1 Attribute test condition Splitting continuous attributes (e.g., age) Binary split No Multi-way split 30 age > 40? age? Yes >40 age income student Credit rating Buys computer 27 high no fair no 28 high no excellent no 31 high no fair yes 45 medium no excellent yes 43 low yes excellent yes 56 low yes fair no 37 low yes excellent yes 20 medium no fair no 20 low yes fair yes 60 medium yes excellent yes 24 medium yes excellent yes 36 medium no excellent yes 31 high yes fair yes 41 medium no fair no DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

19 10.1 Determine the best split How do we determine the best split? We can split on any of the 4 attributes! E.g., income Low yes:3, no:1 Medium yes:4, no:2 High yes:2, no:2 E.g., student No yes:3, no:4 Yes yes:6, no:1 Which split is better? age income student low income? Credit rating medium Buys computer 27 high no fair no 28 high no excellent no 31 high no fair yes 45 medium no excellent yes 43 low yes excellent yes 56 low yes fair no 37 low yes excellent yes 20 medium no fair no 20 low yes fair yes 60 medium yes excellent yes 24 medium yes excellent yes 36 medium no excellent yes 31 high yes fair yes 41 medium no fair no high DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

20 10.1 Determine the best split What does better mean? Nodes with homogeneous class distribution (pure nodes) are preferred E.g., homogeneous nodes Student attribute, Yes yes:6, no:1 E.g., heterogeneous nodes Income attribute, High yes:2, no:2 How do we measure node impurity? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

21 10.1 Determine the best split Methods to measure impurity Information gain (used in C4.5) All attributes are assumed to be categorical Can be modified for continuous-valued attributes Also Kullback Leibler divergence Gini index All attributes are assumed continuous-valued Assume there exist several possible split values for each attribute May need other tools, such as clustering, to get the possible split values DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

22 10.1 Decision Tree Induction Information gain Method Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of class N The amount of information, needed to decide whether an arbitrary example in S belongs to P or N is defined as I( p, n) p p n log 2 log 2 p n p n p n n p n Select the attribute with the highest information gain estimates the probability that label is p estimates the probability that label is n DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

23 10.1 Information Gain Information gain in decision tree induction Assume that using attribute A a set S will be partitioned into sets {S 1, S 2,, S v } If S i contains p i examples of P and n i examples of N, the entropy, or the expected information needed to classify objects in all subtrees S i is E( A) i 1 pi n p n I( p The encoding information that would be gained by branching on A is i i, n Gain( A) I( p, n) E( A) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23 i )

24 10.1 Information Gain Attribute selection by gain computation, example: Class P: buys_computer = yes Class N: buys_computer = no I( p, n) p p n log 2 log p n p n p n 2 n p n I(p, n) = I(9, 5) =0.94 age income student Credit rating Buys computer <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no excellent yes >40 low yes excellent yes >40 low yes fair no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes excellent yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no fair no DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

25 10.1 Information Gain Compute the entropy for the following 3 age partitions age <= 30, 30 < age <= 40 and 40 < age E( A) i 1 pi n p n I( p, n Gain(age) = I(p, n) E(age) = The same we can calculate also the gains for Gain(income) = Gain(student) = Gain(credit_rating) = i i i ) age p i n i I(p i, n i ) <= , > , E( age) I(2,3) I(4,0) I(3,2) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

26 10.1 Information Gain Since age brings the highest information gain, it becomes the splitting node Continue recursively to grow the tree until stop conditions are met youth age? senior income student credit class high no fair no high no excellent no medium no fair no low yes fair yes medium yes excellent yes middle_aged income student credit class medium no excellent yes low yes excellent yes low yes fair no medium yes excellent yes medium no fair no income student credit class high no fair yes low yes excellent yes medium no excellent yes high yes fair yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

27 10.1 Decision Tree Induction Stop conditions All the records belong to the same class E.g., income credit class high fair no high excellent no medium fair no In this case a leaf node is created with the corresponding class label (here no ) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

28 10.1 Decision Tree Induction Stop conditions All the records have similar attribute values E.g., perform split by student but all records are students student credit class yes fair no yes fair no yes fair no yes fair yes yes excellent yes In this case instead of performing the split, a leaf node is created with the majority class as label (here no ) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

29 10.1 Decision Trees Decision tree deduction Use the decision tree rules to classify new data Exemplified together with induction in the detour section DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

30 10.1 Decision Trees Classification based on decision tree example Step 1 Induction Generate the decision tree Input: training data set, attribute list used for classification, attribute selection method Output: the decision tree Step 2 Deduction Predict the classes of the new data Input: the decision tree from step 1 and the new data Output: classified new data DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

31 10.1 Decision Trees Step 1 Input Training set data Use all attributes for classification Use information gain as selection method age income student Credit rating Buys computer <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no excellent yes >40 low yes excellent yes >40 low yes fair no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes excellent yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no fair no DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

32 10.1 Decision Trees First node was already calculated age p i n i I(p i, n i ) <= , > ,971 Gain(age) = >Gain(income), Gain(student), Gain(credit_rating) age? youth middle_aged senior income student credit class high no fair no high no excellent no medium no fair no low yes fair yes medium yes excellent yes income student credit class medium no excellent yes low yes excellent yes low yes fair no medium yes excellent yes medium no fair no income student credit class high no fair yes low yes excellent yes medium no excellent yes high yes fair yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

33 10.1 Decision Trees Subnode age = youth For the income attribute I(2, 3) = 0.97 E(income) = 1/5*I(1, 0)+ 2/5 * I(1, 1) + 2/5 * I(0, 2) = 0.4 Thus Gain(youth,income) = = 0.57 For the student attribute I is the same E(student) = 2/5 * I(2, 0) + 3/5 * I(0, 3) = 0 Thus Gain(youth, student) = 0.97 youth income student credit class high no fair no high no excellent no medium no fair no low yes fair yes medium yes excellent yes I( p, n) age? Class P: buys_computer = yes Class N: buys_computer = no p p n n log 2 log p n p n p n p n 2 E( A) i 1 pi p n n i I( p i, n i ) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

34 10.1 Decision Trees Subnode age = youth For the credit attribute I(2, 3) = 0.97 E(credit) = 3/5*I(1, 2)+ 2/5 * I(1, 1) = 0.95 Thus Gain(youth,credit) = = 0.02 youth income student credit class high no fair no high no excellent no medium no fair no low yes fair yes medium yes excellent yes Largest gain was of 0.97 for the student attribute age? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

35 10.1 Decision Trees Subnode age = youth Split by student attribute income credit class high fair no high excellent no medium fair no no youth student? age? yes income credit class low fair yes medium excellent yes Stop condition reached Resulting subtree is no no youth student? age? yes yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

36 10.1 Decision Trees Subnode age = middle_aged Stop condition reached We have just one class age? middle_aged income student credit class high no fair yes low yes excellent yes medium no excellent yes high yes fair yes age? middle_aged yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

37 10.1 Decision Trees Subnode age = senior For the income attribute I(3, 2) = 0.97 E(income) = 2/5*I(1, 1)+ 3/5 * I(2, 1) = 0.95 Thus Gain(senior,income) = = 0.02 For the student attribute I is the same E(student) = 3/5 * I(2, 1) + 2/5 * I(1, 1) = 0.95 Thus Gain(youth, student) = 0.02 age? senior income student credit class medium no excellent yes low yes excellent yes low yes fair no medium yes excellent yes medium no fair no DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

38 10.1 Decision Trees Subnode age = senior For the credit attribute I is the same E(income) = 2/5*I(0, 2)+ 3/5 * I(3, 0) = 0 Thus Gain(senior,credit) = 0.97 Thus split by credit attribute age? senior income student credit class medium no excellent yes low yes excellent yes low yes fair no medium yes excellent yes medium no fair no DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

39 10.1 Decision Trees Subnode age = senior Split by credit attribute fair age? senior credit? excellent income student class low yes no medium no no income student class medium no yes low yes yes medium yes yes Stop condition reached age? senior no fair credit? excellent yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

40 10.1 Decision Trees Step 1 has finished with the following decision tree as output age? youth middle_aged senior student? yes credit rating? no yes fair excellent no yes no yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

41 10.1 Decision Trees Step 2 New data age income student Credit rating low yes fair <=30 low yes fair <=30 low yes excellent >40 low no fair Buys computer yes yes yes no age? youth middle_aged senior student? yes credit rating? no yes fair excellent no yes no yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

42 10.1 Decision Trees Extracting classification rules from trees Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction The leaf node holds the class prediction Rules are easier for humans to understand DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

43 10.1 Extracting rules from trees age? Example IF age = 30 AND student = no THEN buys_computer = no IF age = 30 AND student = yes THEN buys_computer = yes IF age = THEN buys_computer = yes IF age = >40 AND credit_rating = excellent THEN buys_computer = yes student? no yes fair IF age = >40 AND credit_rating = fair THEN buys_computer = no no youth yes middle_aged yes senior credit rating? no excellent yes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

44 10.1 Decision Trees Advantages: Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small-sized trees Accuracy is comparable to other classification techniques for many simple data sets Very good average performance over many datasets DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

45 10.1 Decision Trees Avoid over-fitting in classification The generated tree may over-fit the training data Too many branches, some may reflect anomalies due to noise or outliers Result is in poor accuracy for unseen samples Two approaches to avoid over-fitting Prepruning Postpruning DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

46 10.1 Decision Trees Prepruning Halt tree construction early Do not split a node if this would result in the information gain falling below a threshold Difficult to choose an appropriate threshold Postpruning Remove branches from a fully grown tree Get a sequence of progressively pruned trees Use a set of data different from the training data to decide which is the best pruned tree DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

47 10.1 Decision Trees Enhancements Allow for continuous-valued attributes Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals Handle missing attribute values Assign the most common value of the attribute Assign probability to each of the possible values Attribute construction Create new attributes based on existing ones that are sparsely represented This reduces fragmentation, repetition, and replication DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

48 10.2 Naive Bayesian Classification Bayesian Classification Probabilistic learning Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems Incremental Each training example can incrementally increase/decrease the probability that a hypothesis is correct Prior knowledge can be combined with observed data. Probabilistic prediction Predict multiple hypotheses, weighted by their probabilities DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

49 10.2 Naive Bayesian Classification A simple Bayesian network: S student P Persons who buy a computer M middle age Pr(P) = #P / #count What proportion of all records represents persons who buy computers? E Excellent credit rating Pr(S) = #S / #count Pr(S P) = #(S and P) / #P Pr(S P) = #(S and P) / #( P) Pr(M) = Pr(M P) = Pr(M P) = All these probabilities can be estimated from the training set (possibly using smoothing)! Pr(E) = Pr(E P) = Pr(E P) = DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

50 10.2 Naive Bayesian Classification For new documents to be classified: S student P Persons who buy a computer M middle age We know whether each of the events S, M, and E occurred We want to find out whether event P is true This can be done using Bayes Theorem: E Excellent credit rating DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

51 10.2 Naive Bayesian Classification Assume that the test set to be classified represents a young student with fair credit rating Consequently, we want to find Pr(P S, M, E) Bayes Theorem yields: Pr( P S, M, E) Pr( P) Pr( S, M, E) Pr( S, M, E P) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

52 10.2 Naive Bayesian Classification Pr( P S, M, E) Pr( P) Pr( S, M, E Pr( S, M, E) In naive Bayes (sometimes called idiot Bayes), statistical independence is assumed: Pr( P S, M, E) Pr( P) Pr( S Pr( S) Pr( M ) Pr( E) How to classify a new record d? Estimate Pr(c d), for any class c C P) Pr( M P) P) Pr( E Assign d to the class having the highest probability P) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

53 Example: 10.2 Naive Bayes Positive (p) Buys_computer = yes Negative (n) Buys_computer = no P(p) = 9/14 P(n) = 5/14 Calculate the probabilities for each attribute E.g.: Age attribute P(youth p) = 2/9 P(youth n) = 3/5 P(middle p) = 4/9 P(middle n) = 0/5 P(senior p) = 3/9 P(senior n) = 2/5 age income student Credit rating Buys computer <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

54 10.2 Naive Bayes Continue with the other attributes: Income attribute P(low p) = 3/9 P(low n) = 1/5 P(medium p) = 4/9 P(medium n) = 2/5 P(high p) = 2/9 P(high n) = 2/5 Student attribute P(yes p) = 6/9 P(yes n) = 1/5 P(no p) = 3/9 P(no n) = 4/5 age income student Credit rating Buys computer <=30 high no fair no <=30 high no excellent no high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes medium no excellent yes high yes fair yes >40 medium no excellent no Credit attribute P(fair p) = 6/9 P(fair n) = 2/5 P(excellent p) = 3/9 P(excellent n) = 3/5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

55 10.2 Naive Bayes Classify an unseen set X=<Age: youth, Income: low, Student: yes, Credit: fair> Compare P(p/X) and P(n,X) [P(X p)*p(p)]/p(x) = P(youth p) * P(low p)* *P(yes p)*p(fair p) *P(p) = = 2/9 * 3/9 * 6/9 * 6/9 * 9/14 = [P(X n)*p(n) )]/P(X) = P(youth n) * P(low n)* *P(yes n)*p(fair n) *P(n) = = 3/5 * 1/5 * 1/5 * 2/5 * 5/14 = Since P(X p)*p(p) > P(X n)*p(n) X can be classified as buys a computer P(p) = 9/14 P(n) = 5/14 Age attribute P(youth p) = 2/9 P(youth n) = 3/5 P(middle p) = 4/9 P(middle n) = 0/5 P(senior p) = 3/9 P(senior n) = 2/5 Income attribute P(low p) = 3/9 P(low n) = 1/5 P(medium p) = 4/9 P(medium n) = 2/5 P(high p) = 2/9 P(high n) = 2/5 Student attribute P(yes p) = 6/9 P(yes n) = 1/5 P(no p) = 3/9 P(no n) = 4/5 Credit attribute P(fair p) = 6/9 P(fair n) = 2/5 P(excellent p) = 3/9 P(excellent n) = 3/5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

56 10.2 Naive Bayesian Classification Summary Robust to isolated noise points Handle missing values by ignoring the instance during probability estimate calculations Robust to irrelevant attributes Independence assumption may not hold for some attributes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

57 10.3 Support Vector Machines SVM: Problem definition Assumptions: Binary classification: Let s assume there are only two classes (e.g. spam/non-spam or relevant/non-relevant) Vector representation: Any item to be classified can be represented as a d-dimensional real vector Task: Find a linear classifier (i.e. a hyperplane) that divides the R d into two parts DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

58 10.3 Support Vector Machines Example: A two-dimensional example training set Task: Separate it by a line! Any of these linear classifiers would be fine Which one is best? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58

59 10.3 SVM Margin Which line is better? Idea: measure the quality of a linear classifier by its margin! Margin = The width that the boundary could be increased without hitting a data point DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59

60 10.3 SVM Margin Margins (2) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

61 10.3 SVM Margin Margins (3) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61

62 10.3 Support Vector Machines A maximum margin classifier is the linear classifier with a maximum margin DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62

63 10.3 Maximum Margin Classifier The maximum margin classifier is the simplest kind of support vector machine, called a linear SVM Let s assume for now that there always is such a classifier, i.e. the training set is linearly separable! The data points that the margin pushes against are called support vectors DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63

64 10.3 Maximum Margin Classifier Why maximum margin? It s intuitive to divide the two classes by a large margin The largest margin guards best against small errors in choosing the right separator This approach is robust since usually only a small fraction of all data points are support vectors There are some theoretical arguments why this is a good thing Empirically, it works very well DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 64

65 10.3 Maximum Margin Classifier How to formalize this approach? Training data: Let there be n training examples The i-th training example is a pair (y i, z i ), where y i is a d-dimensional real vector and z i { 1, 1} 1 stands for the first class and 1 stands for the second class (1, 2), 1 ( 1, 1), 1 + (4, 1), 1 (1, 0), 1 + (5, 1), 1 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 65

66 10.3 Finding MM Classifier What s a valid linear separator? Any hyperplane can be defined by a real row vector w and a scalar b The set of points located on the hyperplane is given by w is a normal vector of the hyperplane, i.e. w is perpendicular to it b represents a shift from the origin of the coordinate system ( 1, 1) (1, 2) (1, 0) x 1 x 2 2 = 0 + (4, 1) + (5, 1) x 1 x 2 2 < 0 x 1 x 2 2 > 0 66

67 10.3 Finding MM Classifier Therefore, any valid separating hyperplane (w, b) must satisfy the following constraints, for any i = 1,, n: If z i = 1, then w y i + b < 0 If z i = 1, then w y i + b > 0 ( 1, 1) (1, 2) (1, 0) x 1 x 2 2 = 0 + (4, 1) + (5, 1) x 1 x 2 2 < 0 x 1 x 2 2 > 0 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 67

68 10.3 Finding MM Classifier Furthermore, if (w, b) is a valid separating hyperplane, then there are scalars r + > 0 and r > 0 such that w x + b + r = 0 and w x + b r + = 0 are the hyperplanes that define the boundaries to the 1 class and the 1 class, respectively The support vectors are located on these hyperplanes! w x + b + r = 0 w x + b = 0 w x + b r + = DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 68

69 10.3 Finding MM Classifier Let (w, b) be a valid separating hyperplane with scalars r + and r as defined above Observation 1: Define b = b + (r r + ) / 2. Then, the hyperplane w x + b = 0 is a valid separating hyperplane with equal shift constants r = (r r + ) / 2 to its bounding hyperplanes (the margin width is the same) w x + b = 0 + w x + b r + = 0 w x + b r = 0 w x + b + r = 0 w x + b + r = 0 w x + b = 0 + DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 69

70 10.3 Finding MM Classifier Now, divide w, b, and r by r This does not change any of the three hyperplanes Observation 2: Define w = w / r and b = b / r. Then, the hyperplane w x + b = 0 is a valid separating hyperplane with shift constant 1 to each of its bounding hyperplanes + w x + b r = 0 w x + b 1 = 0 w x + b + r = 0 w x + b + 1 = 0 w x + b = 0 w x + b = 0 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig + 70

71 10.3 Finding MM Classifier Corollary (normalization): If there exists a valid separating hyperplane (w, b), then there always is a hyperplane (w, b ) such that (w, b ) is a valid separating hyperplane (w, b) and (w, b ) have equal margin widths the bounding hyperplanes of (w, b ) are shifted away by 1 Therefore, to find a maximum margin classifier, we can limit the search to all hyperplanes of this special type Further advantage: It seems to be a good idea to use a linear classifier that lies equally spaced between its bounding hyperplanes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 71

72 10.3 Finding MM Classifier Our search space then consists of all pairs (w, b) such that w R d b R For any i = 1,, n: If z i = 1, then w y i + b 1 If z i = 1, then w y i + b 1 There is an i such that z i = 1 and w y i + b = 1 There is an i such that z i = 1 and w y i + b = 1 Now, what is the margin width of such a hyperplane? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 72

73 10.3 Finding MM Classifier w x + b + 1 = 0 w x + b = 0 w x + b 1 = / w Linear algebra: The distance of a hyperplane w x + b = 0 to the origin of coordinate space is b / w Therefore, the margin width is 2 / w Consequently, our goal is to maximize the margin width subject to the constraints from the previous slide + DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 73

74 10.3 Finding MM Classifier We arrive at the following optimization problem over all w R d and b R: Maximize 2 / w subject to the following constraints: For any i = 1,, n: If z i = 1, then w y i + b 1 If z i = 1, then w y i + b 1 There is an i such that z i = 1 and w y i + b = 1 There is an i such that z i = 1 and w y i + b = 1 Note that due to the maximize the margin goal, the last two constraints are not needed anymore since any optimal solution satisfies them anyway DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 74

75 10.3 Finding MM Classifier The problem then becomes: Maximize 2 / w over all w R d and b R subject to the following constraints: For any i = 1,, n: If z i = 1, then w y i + b 1 If z i = 1, then w y i + b 1 Instead of maximizing 2 / w, we also could minimize w, or even minimize 0.5 w 2 Squaring avoids the square root within w The factor 0.5 brings the problem into some standard form DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 75

76 10.3 Finding MM Classifier The problem then becomes: Minimize 0.5 w 2 over all w R d and b R subject to the following constraints: For any i = 1,, n: If z i = 1, then w y i + b 1 If z i = 1, then w y i + b 1 The two constraints can be combined into a single one: For any i = 1,, n: z i (w y i + b) 1 0 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 76

77 10.3 Finding MM Classifier Finally: Minimize 0.5 w 2 over all w R d and b R subject to the following constraints: For any i = 1,, n: z i (w y i + b) 1 0 This is a so-called quadratic programming (QP) problem There are many standard methods to find the solution QPs that emerge from an SVM have a special structure, which can be exploited to speed up computation DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 77

78 10.3 SVM At the beginning we assumed that our training data set is linearly separable What if it looks like this? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 78

79 10.3 Soft Margins So-called soft margins can be used to handle such cases We allow the classifier to make some mistakes on the training data Each misclassification gets assigned an error, the total classification error then is to be minimized + + DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 79

80 10.3 SVM At the beginning, we also assumed that there are only two classes in the training set How to handle more than that? Some ideas: One-versus-all classifiers: Build an SVM for any class that occurs in the training set; To classify new items, choose the greatest margin s class One-versus-one classifiers: Build an SVM for any pair of classes in the training set; To classify new items, choose the class selected by most SVMs Multiclass SVMs DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 80

81 10.3 Overfitting One problem in using SVMs remains: If we use a mapping to a high-dimensional space that is complicated enough, we could find a perfect linear separation in the transformed space, for any training set So, what type of SVM is the right one? Example: How to separate this data set into two parts? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 81 +

82 10.3 Overfitting A perfect classification for the training set could generalize badly on new data Fitting a classifier too strongly to the specific properties of the training set is called overfitting What can we do to avoid it? Cross-validation: Randomly split the available data into two parts (training set + test set) Use the first part for learning the classifier and the second part for checking the classifier s performance Choose a classifier that maximizes performance on the test set DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 82

83 10.3 Overfitting Regularization: If you know how a good classifier roughly should look like (e.g. polynomial of low degree) you could introduce a penalty value into the optimization problem Assign a large penalty if the type of classifier is far away from what you expect, and a small penalty otherwise Choose the classifier that minimizes the overall optimization goal (original goal + penalty) An example of regularization is the soft margin technique since classifiers with large margins and few errors are preferred DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 83

84 Summary Classification Decision Trees: Hunt s algorithm Based on Information Gain and Entropy Naive Bayesian Classification Based on the Bayes Theorem, and the statistical independence assumption Support Vector Machines Binary classification Finding the maximum margin classifier Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 84

85 Next lecture Cluster Analysis Flat clustering Hierarchical clustering DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 85

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Basic Concepts, Decision Trees, and Model Evaluation Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification. lassification-decision Trees, Slide 1/56 Classification Decision Trees Huiping Cao lassification-decision Trees, Slide 2/56 Examples of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1

More information

Part I. Instructor: Wei Ding

Part I. Instructor: Wei Ding Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set

More information

CS Machine Learning

CS Machine Learning CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute

More information

Extra readings beyond the lecture slides are important:

Extra readings beyond the lecture slides are important: 1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

Data Mining Classification - Part 1 -

Data Mining Classification - Part 1 - Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model

More information

COMP 465: Data Mining Classification Basics

COMP 465: Data Mining Classification Basics Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised

More information

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier DATA MINING LECTURE 11 Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie?

More information

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees Evaluation What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie? Facial hair alone is not

More information

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1

More information

Lecture outline. Decision-tree classification

Lecture outline. Decision-tree classification Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

More information

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Classification Salvatore Orlando

Classification Salvatore Orlando Classification Salvatore Orlando 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. The values of the

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:

More information

Nesnelerin İnternetinde Veri Analizi

Nesnelerin İnternetinde Veri Analizi Nesnelerin İnternetinde Veri Analizi Bölüm 3. Classification in Data Streams w3.gazi.edu.tr/~suatozdemir Supervised vs. Unsupervised Learning (1) Supervised learning (classification) Supervision: The training

More information

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

DATA MINING LECTURE 9. Classification Decision Trees Evaluation DATA MINING LECTURE 9 Classification Decision Trees Evaluation 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Model Evaluation

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Classification (Basic Concepts) Huan Sun, CSE@The Ohio State University 09/12/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han Classification: Basic Concepts

More information

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

More information

Given a collection of records (training set )

Given a collection of records (training set ) Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is (always) the class. Find a model for class attribute as a function of the values of other

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Predictive Modeling Goal: learn a mapping: y = f(x;θ) Need: 1. A model structure 2. A score function

More information

Data Mining D E C I S I O N T R E E. Matteo Golfarelli

Data Mining D E C I S I O N T R E E. Matteo Golfarelli Data Mining D E C I S I O N T R E E Matteo Golfarelli Decision Tree It is one of the most widely used classification techniques that allows you to represent a set of classification rules with a tree. Tree:

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) 1 Classification:

More information

Chapter 2: Classification & Prediction

Chapter 2: Classification & Prediction Chapter 2: Classification & Prediction 2.1 Basic Concepts of Classification and Prediction 2.2 Decision Tree Induction 2.3 Bayes Classification Methods 2.4 Rule Based Classification 2.4.1 The principle

More information

ISSUES IN DECISION TREE LEARNING

ISSUES IN DECISION TREE LEARNING ISSUES IN DECISION TREE LEARNING Handling Continuous Attributes Other attribute selection measures Overfitting-Pruning Handling of missing values Incremental Induction of Decision Tree 1 DECISION TREE

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

CS 584 Data Mining. Classification 1

CS 584 Data Mining. Classification 1 CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES Outline 1- Introduction 2- know your data 3- Classification

More information

Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Lecture Notes for Chapter 4

Lecture Notes for Chapter 4 Classification - Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster

More information

Decision tree learning

Decision tree learning Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

Lecture 5: Decision Trees (Part II)

Lecture 5: Decision Trees (Part II) Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

COMP90049 Knowledge Technologies

COMP90049 Knowledge Technologies COMP90049 Knowledge Technologies Data Mining (Lecture Set 3) 2017 Rao Kotagiri Department of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived from Prof Vipin

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Exam Advanced Data Mining Date: Time:

Exam Advanced Data Mining Date: Time: Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your

More information

(Classification and Prediction)

(Classification and Prediction) Tamkang University Big Data Mining Tamkang University (Classification and Prediction) 1062DM04 MI4 (M2244) (2995) Wed, 9, 10 (16:10-18:00) (B206) Min-Yuh Day Assistant Professor Dept. of Information Management,

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

9 Classification: KNN and SVM

9 Classification: KNN and SVM CSE4334/5334 Data Mining 9 Classification: KNN and SVM Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2017 (Slides courtesy of Pang-Ning Tan, Michael Steinbach

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,

More information

Machine Learning in Real World: C4.5

Machine Learning in Real World: C4.5 Machine Learning in Real World: C4.5 Industrial-strength algorithms For an algorithm to be useful in a wide range of realworld applications it must: Permit numeric attributes with adaptive discretization

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty) Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics Decision Trees Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is largely based on the book Artificial Intelligence: A Modern Approach,

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Building Intelligent Learning Database Systems

Building Intelligent Learning Database Systems Building Intelligent Learning Database Systems 1. Intelligent Learning Database Systems: A Definition (Wu 1995, Wu 2000) 2. Induction: Mining Knowledge from Data Decision tree construction (ID3 and C4.5)

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control. What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem

More information

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Supervised Learning CS 586 Machine Learning Prepared by Jugal Kalita With help from Alpaydin s Introduction to Machine Learning, Chapter 2. Topics What is classification? Hypothesis classes and learning

More information

PARALLEL CLASSIFICATION ALGORITHMS

PARALLEL CLASSIFICATION ALGORITHMS PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information