Cellular Tree Classifiers. Gérard Biau & Luc Devroye

Size: px
Start display at page:

Download "Cellular Tree Classifiers. Gérard Biau & Luc Devroye"

Transcription

1 Cellular Tree Classifiers Gérard Biau & Luc Devroye Paris, December 2013

2 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests

3 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests

4 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools.

5 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools. It includes data sets with sizes beyond the ability of commonly-used software tools to process the data within a tolerable elapsed time.

6 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools. It includes data sets with sizes beyond the ability of commonly-used software tools to process the data within a tolerable elapsed time. As of 2012, every day 2.5 quintillion ( ) bytes of data were created.

7 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools. It includes data sets with sizes beyond the ability of commonly-used software tools to process the data within a tolerable elapsed time. As of 2012, every day 2.5 quintillion ( ) bytes of data were created. Megabytes and gigabytes are old-fashioned.

8 The Big Data era The challenges involved in Big Data problems are interdisciplinary.

9 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of:

10 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales?

11 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales? Statistical issues: How to cope with errors and biases of all kinds? How to develop models and procedures that work when both n and p are astronomical?

12 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales? Statistical issues: How to cope with errors and biases of all kinds? How to develop models and procedures that work when both n and p are astronomical? Algorithmic issues: How to perform computations?

13 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales? Statistical issues: How to cope with errors and biases of all kinds? How to develop models and procedures that work when both n and p are astronomical? Algorithmic issues: How to perform computations? Big Data requires massively parallel softwares running on tens, hundreds, or even thousands of servers.

14 Greedy algorithms Greedy algorithms build solutions incrementally, usually with little effort.

15 Greedy algorithms Greedy algorithms build solutions incrementally, usually with little effort. Such procedures form a result piece by piece, always choosing the next item that offers the most obvious and immediate benefit.

16 Greedy algorithms Greedy algorithms build solutions incrementally, usually with little effort. Such procedures form a result piece by piece, always choosing the next item that offers the most obvious and immediate benefit. Greedy methods have an autonomy that makes them ideally suited for distributive or parallel computation.

17 Greedy algorithms Our goal is to formalize the setting and to provide a foundational discussion of various properties of tree classifiers that are designed following these principles.

18 Greedy algorithms Our goal is to formalize the setting and to provide a foundational discussion of various properties of tree classifiers that are designed following these principles. They may find use in a world with new computational models in which parallel or distributed computation is feasible and even the norm.

19 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests

20 Classification

21 Classification?

22 Classification

23 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution.

24 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution. Goal: Design a classifier g : R d {0,1}.

25 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution. Goal: Design a classifier g : R d {0,1}. The probability of error is L(g) = P{g(X) Y}.

26 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution. Goal: Design a classifier g : R d {0,1}. The probability of error is L(g) = P{g(X) Y}. The Bayes classifier { 1 if P{Y = 1 X = x} > 1/2 g (x) = 0 otherwise has the smallest probability of error, that is L = L(g ) = inf P{g(X) Y}. g:r d {0,1}

27 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y).

28 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n.

29 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n. The probability of error is L(g n ) = P{g n (X) Y D n }.

30 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n. The probability of error is L(g n ) = P{g n (X) Y D n }. It is consistent if lim EL(g n) = L. n

31 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n. The probability of error is L(g n ) = P{g n (X) Y D n }. It is consistent if lim EL(g n) = L. n It is universally consistent if it is consistent for all possible distributions of (X, Y).

32

33 Tree classifiers Many popular classifiers are universally consistent.

34 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers.

35 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers. Tree methods loom large for several reasons:

36 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers. Tree methods loom large for several reasons: All procedures that partition space can be viewed as special cases of partitions generated by trees.

37 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers. Tree methods loom large for several reasons: All procedures that partition space can be viewed as special cases of partitions generated by trees. Tree classifiers are conceptually simple, and explain the data very well.

38 Trees?

39 Trees?

40 Trees?

41 Trees?

42 Trees?

43 Trees?

44 Trees?

45 Trees

46 Trees

47 Trees

48 Trees

49 Trees

50 Trees

51 Trees

52 Trees

53 Trees?

54 Trees?

55 Trees?

56 Trees?

57 Trees?

58 Trees?

59 Trees?

60 Trees The tree structure is usually data dependent, and it is in the construction itself that trees differ.

61 Trees The tree structure is usually data dependent, and it is in the construction itself that trees differ. Thus, there are virtually infinitely many possible strategies to build classification trees.

62 Trees The tree structure is usually data dependent, and it is in the construction itself that trees differ. Thus, there are virtually infinitely many possible strategies to build classification trees. Despite this great diversity, all tree species end up with two fundamental questions at each node: 1 Should the node be split? 2 In the affirmative, what are its children?

63 The cellular spirit Cellular trees proceed from a different philosophy.

64 The cellular spirit Cellular trees proceed from a different philosophy. A cellular tree should be able to answer questions 1 and 2 using local information only. Majority vote Cellular decision No split Split

65 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests

66 A model Let C be a class of possible subsets of R d that can be used for splits.

67 A model Let C be a class of possible subsets of R d that can be used for splits. Example: C = { hyperplanes of R d}.

68 A model Let C be a class of possible subsets of R d that can be used for splits. Example: C = { hyperplanes of R d}. The class is parametrized by a vector σ R p.

69 A model Let C be a class of possible subsets of R d that can be used for splits. Example: C = { hyperplanes of R d}. The class is parametrized by a vector σ R p. There is a splitting function f(x,σ) such that R d is partitioned into A = {x R d : f(x,σ) 0} and B = {x R d : f(x,σ) < 0}.

70 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p.

71 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p. We see that σ is a model for the cellular decision 2.

72 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p. We see that σ is a model for the cellular decision 2. In addition, there is a second family of mappings θ, but this time with a boolean output.

73 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p. We see that σ is a model for the cellular decision 2. In addition, there is a second family of mappings θ, but this time with a boolean output. It is a stopping rule and models the cellular decision 1.

74 Cellular procedure A cellular machine partitions the data recursively.

75 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split.

76 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}.

77 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}. The data D n are partitioned into two groups.

78 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}. The data D n are partitioned into two groups. The groups are sent to child cells, and the process is repeated.

79 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}. The data D n are partitioned into two groups. The groups are sent to child cells, and the process is repeated. Final classification proceeds by a majority vote.

80 An important remark Is any classifier a cellular tree?

81 An important remark Is any classifier a cellular tree? 1 Set θ = 1 if we care at the root, and θ = 0 elsewhere. 2 The root node is split by the classifier into a set A = {x R d : g n (x) = 1} and its complement, and both child nodes are leaves.

82 An important remark Is any classifier a cellular tree? 1 Set θ = 1 if we care at the root, and θ = 0 elsewhere. 2 The root node is split by the classifier into a set A = {x R d : g n (x) = 1} and its complement, and both child nodes are leaves. This is not allowed.

83 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests

84 An example At first sight, there are no consistent cellular tree classifiers.

85 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree.

86 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s.

87 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds.

88 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds. In d dimensions, rotate through the coordinates.

89 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds. In d dimensions, rotate through the coordinates. This rule is consistent, provided k and k2 k /n 0.

90 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds. In d dimensions, rotate through the coordinates. This rule is consistent, provided k and k2 k /n 0. This is not cellular.

91 A randomized solution The cellular splitting method σ mimics the median tree classifier.

92 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random.

93 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random. The selected dimension is then split at the median.

94 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random. The selected dimension is then split at the median. The novelty is in the choice of the decision function θ.

95 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random. The selected dimension is then split at the median. The novelty is in the choice of the decision function θ. This function ignores the data altogether and uses a randomized decision that is based on the size of the input.

96 Consistency Consider a nonincreasing function ϕ : N (0, 1].

97 Consistency Consider a nonincreasing function ϕ : N (0, 1]. Then, if U is the uniform [0,1] random variable associated with node A, θ = 1 [U>ϕ(N(A))].

98 Consistency Consider a nonincreasing function ϕ : N (0, 1]. Then, if U is the uniform [0,1] random variable associated with node A, θ = 1 [U>ϕ(N(A))]. Theorem Let β be a real number in (0,1). Define { 1 if n < 3 ϕ(n) = 1/log β n if n 3. Then lim EL(g n) = L as n. n

99 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests

100 A full 2 d -ary tree At the root, we find the median in direction 1.

101 A full 2 d -ary tree At the root, we find the median in direction 1. Then on each of the two subsets, we find the median in direction 2.

102 A full 2 d -ary tree At the root, we find the median in direction 1. Then on each of the two subsets, we find the median in direction 2. Then on each of the four subsets, we find the median in direction 3, and so forth.

103 A full 2 d -ary tree At the root, we find the median in direction 1. Then on each of the two subsets, we find the median in direction 2. Then on each of the four subsets, we find the median in direction 3, and so forth. Repeating this for k levels of nodes leads to 2 dk leaf regions.

104 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 i=1 1 [Xi A,Y i=0] ).

105 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ).

106 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ). Set ˆL n (A,k + ) = A j P k +(A) ˆL n (A j ) N(A j) N(A).

107 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ). Set ˆL n (A,k + ) = A j P k +(A) ˆL n (A j ) N(A j) N(A). Both ˆL n (A) and ˆL n (A,k + ) may be evaluated on the basis of the data points falling in A only.

108 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ). Set ˆL n (A,k + ) = A j P k +(A) ˆL n (A j ) N(A j) N(A). Both ˆL n (A) and ˆL n (A,k + ) may be evaluated on the basis of the data points falling in A only. This is cellular.

109 The stopping rule θ Put θ = 0 if ˆL n (A) ˆL n (A,k + ) ( ) β 1. N(A)+1

110 The stopping rule θ Put θ = 0 if ˆL n (A) ˆL n (A,k + ) ( ) β 1. N(A)+1 Theorem Take 1 dα 2β > 0. Then lim EL(g n) = L as n. n

111 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests

112 Leo Breiman ( )

113 Leo Breiman ( )

114 From trees to forests Leo Breiman promoted random forests.

115 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules.

116 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules. The base trees are simple and randomized.

117 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules. The base trees are simple and randomized. Breiman s ideas were decisively influenced by Amit and Geman (1997, geometric feature selection). Ho (1998, random subspace method). Dietterich (2000, random split selection approach).

118 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules. The base trees are simple and randomized. Breiman s ideas were decisively influenced by Amit and Geman (1997, geometric feature selection). Ho (1998, random subspace method). Dietterich (2000, random split selection approach). stat.berkeley.edu/users/breiman/randomforests

119 Random forests They are serious competitors to state-of-the-art methods.

120 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement.

121 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement. They can handle a very large number of input variables.

122 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement. They can handle a very large number of input variables. The algorithm is difficult to analyze and its mathematical properties remain to date largely unknown.

123 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement. They can handle a very large number of input variables. The algorithm is difficult to analyze and its mathematical properties remain to date largely unknown. Most theoretical studies have concentrated on isolated parts or stylized versions of the procedure.

124 Three basic ingredients 1. Randomization and no-pruning For each tree, select at random, at each node, a small group of input coordinates to split.

125 Three basic ingredients 1. Randomization and no-pruning For each tree, select at random, at each node, a small group of input coordinates to split. Calculate the best split based on these features and cut.

126 Three basic ingredients 1. Randomization and no-pruning For each tree, select at random, at each node, a small group of input coordinates to split. Calculate the best split based on these features and cut. The tree is grown to maximum size.

127 Three basic ingredients 2. Aggregation Final predictions are obtained by aggregating over the ensemble.

128 Three basic ingredients 2. Aggregation Final predictions are obtained by aggregating over the ensemble. It is fast and easily parallelizable.

129 Three basic ingredients 3. Bagging The subspace randomization scheme is blended with bagging.

130 Three basic ingredients 3. Bagging The subspace randomization scheme is blended with bagging. Breiman (1996). Bühlmann and Yu (2002). Biau, Cérou and Guyader (2010).

131 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}.

132 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}. A generic pair: (X,Y) satisfying EY 2 <.

133 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}. A generic pair: (X,Y) satisfying EY 2 <. Our goal: Estimate the regression function r(x) = E[Y X = x].

134 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}. A generic pair: (X,Y) satisfying EY 2 <. Our goal: Estimate the regression function r(x) = E[Y X = x]. Quality criterion: E[r n (X) r(x)] 2.

135 The model A random forest is a collection of randomized base regression trees {r n (x,θ m,d n ),m 1}.

136 The model A random forest is a collection of randomized base regression trees {r n (x,θ m,d n ),m 1}. These random trees are combined to form the aggregate r n (X,D n ) = E Θ [r n (X,Θ,D n )].

137 The model A random forest is a collection of randomized base regression trees {r n (x,θ m,d n ),m 1}. These random trees are combined to form the aggregate r n (X,D n ) = E Θ [r n (X,Θ,D n )]. Θ is independent of X and the training sample D n.

138 The procedure

139 The procedure

140 The procedure

141 The procedure

142 The procedure

143 The forest A local averaging estimate: r n (x) = n i=1 K(x,X i)y i n j=1 K(x,X j).

144 The forest A local averaging estimate: r n (x) = n i=1 K(x,X i)y i n j=1 K(x,X j). Centered cuts: K(x,z) = k n1,...,k nd d j=1 knj=kn k n! k n1!...k nd! d j=1 p knj nj 1 [k nj<α j].

145 The forest A local averaging estimate: r n (x) = n i=1 K(x,X i)y i n j=1 K(x,X j). Centered cuts: K(x,z) = k n1,...,k nd d j=1 knj=kn k n! k n1!...k nd! d j=1 p knj nj 1 [k nj<α j]. K(x,z) = Uniform cuts: k n1,...,k nd d j=1 knj=kn k n! k n1!...k nd! d m=1 p knm nm 1 k nm! log xm z m 0 t knm e t dt.

146

147 Consistency Theorem The random forests estimate r n is consistent whenever p nj logk n for all j = 1,...,d and k n /n 0 as n.

148 Consistency Theorem The random forests estimate r n is consistent whenever p nj logk n for all j = 1,...,d and k n /n 0 as n. In the purely random model, p nj = 1/d, independently of n and j.

149 Consistency Theorem The random forests estimate r n is consistent whenever p nj logk n for all j = 1,...,d and k n /n 0 as n. In the purely random model, p nj = 1/d, independently of n and j. A more in-depth analysis is needed.

150 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation.

151 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients.

152 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients. High-throughput technologies.

153 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients. High-throughput technologies. Sparse estimation is playing an increasingly important role in the statistics and machine learning communities.

154 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients. High-throughput technologies. Sparse estimation is playing an increasingly important role in the statistics and machine learning communities. Several methods have recently been developed in both fields, which rely upon the notion of sparsity.

155 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features.

156 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features. In other words: r(x) = E[Y X S ].

157 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features. In other words: r(x) = E[Y X S ]. In this dimension reduction scenario, the ambient dimension d can be very large, much larger than n.

158 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features. In other words: r(x) = E[Y X S ]. In this dimension reduction scenario, the ambient dimension d can be very large, much larger than n. As such, the value S characterizes the sparsity: The smaller S, the sparser r.

159 Sparsity and random forests Ideally, p nj = 1/S for j S.

160 Sparsity and random forests Ideally, p nj = 1/S for j S. To stick to reality, we will rather require p nj = 1 S (1+ξ nj).

161 Sparsity and random forests Ideally, p nj = 1/S for j S. To stick to reality, we will rather require p nj = 1 S (1+ξ nj). Today, an assumption. Tomorrow, a lemma.

162 Main result Theorem If p nj = (1/S)(1+ξ nj ) for j S, with ξ nj logn 0 as n, then for the choice /(1+ k n n S log 2 ), we have lim sup n E[ r n (X) r(x)] sup (X,Y) F S S log n Λ.

163 Main result Theorem If p nj = (1/S)(1+ξ nj ) for j S, with ξ nj logn 0 as n, then for the choice /(1+ k n n S log 2 ), we have lim sup n E[ r n (X) r(x)] sup (X,Y) F S S log n Λ. Take-home message The rate n 0.75 S log is strictly faster than the usual minimax rate n 2/(d+2) as soon as S 0.54d.

164 Dimension reduction

165 Proof root G k G n k P k A G + k k + P k +(A)

166 Proof Let ψ(n,k) = L k L. Set (2 ) kn = min l 0 : ψ(n,l) < dl 1 dα n. Then 2 dk n n 0 as n.

Classification and Trees

Classification and Trees Classification and Trees Luc Devroye School of Computer Science, McGill University Montreal, Canada H3A 2K6 This paper is dedicated to the memory of Pierre Devijver. Abstract. Breiman, Friedman, Gordon

More information

Nonparametric Classification Methods

Nonparametric Classification Methods Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

TECHNICAL REPORT NO Random Forests and Adaptive Nearest Neighbors 1

TECHNICAL REPORT NO Random Forests and Adaptive Nearest Neighbors 1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1055 May 29, 2002 Random Forests and Adaptive Nearest Neighbors 1 by Yi Lin and Yongho Jeon

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Mondrian Forests: Efficient Online Random Forests

Mondrian Forests: Efficient Online Random Forests Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan Joint work with Daniel M. Roy and Yee Whye Teh 1 Outline Background and Motivation Mondrian Forests Randomization mechanism Online

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Nearest-Neighbor Algorithms Nearest Neighbor Algorithms Definition: fix k 1, given a labeled

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

April 3, 2012 T.C. Havens

April 3, 2012 T.C. Havens April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

SUPPORT VECTOR MACHINES

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software

More information

Multi-label Classification. Jingzhou Liu Dec

Multi-label Classification. Jingzhou Liu Dec Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with

More information

Decision Trees / Discrete Variables

Decision Trees / Discrete Variables Decision trees Decision Trees / Discrete Variables Season Location Fun? Location summer prison -1 summer beach +1 Prison Beach Ski Slope Winter ski-slope +1-1 Season +1 Winter beach -1 Winter Summer -1

More information

Computational Statistics and Mathematics for Cyber Security

Computational Statistics and Mathematics for Cyber Security and Mathematics for Cyber Security David J. Marchette Sept, 0 Acknowledgment: This work funded in part by the NSWC In-House Laboratory Independent Research (ILIR) program. NSWCDD-PN--00 Topics NSWCDD-PN--00

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Mondrian Forests: Efficient Online Random Forests

Mondrian Forests: Efficient Online Random Forests Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation

More information

Bias-Variance Analysis of Ensemble Learning

Bias-Variance Analysis of Ensemble Learning Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques

More information

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005

More information

8. Tree-based approaches

8. Tree-based approaches Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques

More information

Outline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff

Outline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff Compressed Sensing David L Donoho Presented by: Nitesh Shroff University of Maryland Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

A Geometric Analysis of Subspace Clustering with Outliers

A Geometric Analysis of Subspace Clustering with Outliers A Geometric Analysis of Subspace Clustering with Outliers Mahdi Soltanolkotabi and Emmanuel Candés Stanford University Fundamental Tool in Data Mining : PCA Fundamental Tool in Data Mining : PCA Subspace

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques. . Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction

More information

Vertex Cover is Fixed-Parameter Tractable

Vertex Cover is Fixed-Parameter Tractable Vertex Cover is Fixed-Parameter Tractable CS 511 Iowa State University November 28, 2010 CS 511 (Iowa State University) Vertex Cover is Fixed-Parameter Tractable November 28, 2010 1 / 18 The Vertex Cover

More information

An introduction to random forests

An introduction to random forests An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

A Random Forest Guided Tour

A Random Forest Guided Tour Noname manuscript No. (will be inserted by the editor) A Random Forest Guided Tour Gérard Biau Erwan Scornet Received: date / Accepted: date Abstract The random forest algorithm, proposed by L. Breiman

More information

Lecture 2 :: Decision Trees Learning

Lecture 2 :: Decision Trees Learning Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning

More information

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8 Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

Nearest Neighbors Classifiers

Nearest Neighbors Classifiers Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,

More information

CS 340 Lec. 4: K-Nearest Neighbors

CS 340 Lec. 4: K-Nearest Neighbors CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

Convex and Distributed Optimization. Thomas Ropars

Convex and Distributed Optimization. Thomas Ropars >>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.

More information

Risk bounds for some classification and regression models that interpolate

Risk bounds for some classification and regression models that interpolate Risk bounds for some classification and regression models that interpolate Daniel Hsu Columbia University Joint work with: Misha Belkin (The Ohio State University) Partha Mitra (Cold Spring Harbor Laboratory)

More information

A Random Forest Guided Tour

A Random Forest Guided Tour A Random Forest Guided Tour Gérard Biau Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France & Institut universitaire de France gerard.biau@upmc.fr Erwan Scornet Sorbonne Universités, UPMC

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Compact Sets. James K. Peterson. September 15, Department of Biological Sciences and Department of Mathematical Sciences Clemson University

Compact Sets. James K. Peterson. September 15, Department of Biological Sciences and Department of Mathematical Sciences Clemson University Compact Sets James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University September 15, 2017 Outline 1 Closed Sets 2 Compactness 3 Homework Closed Sets

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

Faster Rates in Regression via Active Learning

Faster Rates in Regression via Active Learning Faster Rates in Regression via Active Learning Rui Castro Rice University Houston, TX 77005 rcastro@rice.edu Rebecca Willett University of Wisconsin Madison, WI 53706 willett@cae.wisc.edu Robert Nowak

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

NONPARAMETRIC REGRESSION TECHNIQUES

NONPARAMETRIC REGRESSION TECHNIQUES NONPARAMETRIC REGRESSION TECHNIQUES C&PE 940, 28 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available at: http://people.ku.edu/~gbohling/cpe940

More information

Math 5593 Linear Programming Lecture Notes

Math 5593 Linear Programming Lecture Notes Math 5593 Linear Programming Lecture Notes Unit II: Theory & Foundations (Convex Analysis) University of Colorado Denver, Fall 2013 Topics 1 Convex Sets 1 1.1 Basic Properties (Luenberger-Ye Appendix B.1).........................

More information

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY 1 USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu ABSTRACT A prediction algorithm is consistent

More information

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS

More information

10. EXTENDING TRACTABILITY

10. EXTENDING TRACTABILITY 0. EXTENDING TRACTABILITY finding small vertex covers solving NP-hard problems on trees circular arc coverings vertex cover in bipartite graphs Lecture slides by Kevin Wayne Copyright 005 Pearson-Addison

More information

A Random Forest Guided Tour

A Random Forest Guided Tour A Random Forest Guided Tour Gérard Biau Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France & Institut Universitaire de France gerard.biau@upmc.fr Erwan Scornet Sorbonne Universités, UPMC

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

Foundations of Computer Science Spring Mathematical Preliminaries

Foundations of Computer Science Spring Mathematical Preliminaries Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College

More information

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly

More information

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

LOGISTIC REGRESSION FOR MULTIPLE CLASSES Peter Orbanz Applied Data Mining Not examinable. 111 LOGISTIC REGRESSION FOR MULTIPLE CLASSES Bernoulli and multinomial distributions The mulitnomial distribution of N draws from K categories with parameter

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( )

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( ) 17.4 Dynamic tables Let us now study the problem of dynamically expanding and contracting a table We show that the amortized cost of insertion/ deletion is only (1) Though the actual cost of an operation

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Diversity Coloring for Distributed Storage in Mobile Networks

Diversity Coloring for Distributed Storage in Mobile Networks Diversity Coloring for Distributed Storage in Mobile Networks Anxiao (Andrew) Jiang and Jehoshua Bruck California Institute of Technology Abstract: Storing multiple copies of files is crucial for ensuring

More information

Voxel selection algorithms for fmri

Voxel selection algorithms for fmri Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn

Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Graphs Extremely important concept in computer science Graph, : node (or vertex) set : edge set Simple graph: no self loops, no multiple

More information

NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING

NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING Dmitry Potapov National Research Nuclear University MEPHI, Russia, Moscow, Kashirskoe Highway, The European Laboratory for

More information