Cellular Tree Classifiers. Gérard Biau & Luc Devroye
|
|
- Hillary Casey
- 5 years ago
- Views:
Transcription
1 Cellular Tree Classifiers Gérard Biau & Luc Devroye Paris, December 2013
2 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests
3 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests
4 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools.
5 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools. It includes data sets with sizes beyond the ability of commonly-used software tools to process the data within a tolerable elapsed time.
6 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools. It includes data sets with sizes beyond the ability of commonly-used software tools to process the data within a tolerable elapsed time. As of 2012, every day 2.5 quintillion ( ) bytes of data were created.
7 The Big Data era Big Data is a collection of data sets so large and complex that it becomes impossible to process using classical tools. It includes data sets with sizes beyond the ability of commonly-used software tools to process the data within a tolerable elapsed time. As of 2012, every day 2.5 quintillion ( ) bytes of data were created. Megabytes and gigabytes are old-fashioned.
8 The Big Data era The challenges involved in Big Data problems are interdisciplinary.
9 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of:
10 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales?
11 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales? Statistical issues: How to cope with errors and biases of all kinds? How to develop models and procedures that work when both n and p are astronomical?
12 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales? Statistical issues: How to cope with errors and biases of all kinds? How to develop models and procedures that work when both n and p are astronomical? Algorithmic issues: How to perform computations?
13 The Big Data era The challenges involved in Big Data problems are interdisciplinary. Data analysis in the Big Data regime requires consideration of: Systems issues: How to store, index and transport data at massive scales? Statistical issues: How to cope with errors and biases of all kinds? How to develop models and procedures that work when both n and p are astronomical? Algorithmic issues: How to perform computations? Big Data requires massively parallel softwares running on tens, hundreds, or even thousands of servers.
14 Greedy algorithms Greedy algorithms build solutions incrementally, usually with little effort.
15 Greedy algorithms Greedy algorithms build solutions incrementally, usually with little effort. Such procedures form a result piece by piece, always choosing the next item that offers the most obvious and immediate benefit.
16 Greedy algorithms Greedy algorithms build solutions incrementally, usually with little effort. Such procedures form a result piece by piece, always choosing the next item that offers the most obvious and immediate benefit. Greedy methods have an autonomy that makes them ideally suited for distributive or parallel computation.
17 Greedy algorithms Our goal is to formalize the setting and to provide a foundational discussion of various properties of tree classifiers that are designed following these principles.
18 Greedy algorithms Our goal is to formalize the setting and to provide a foundational discussion of various properties of tree classifiers that are designed following these principles. They may find use in a world with new computational models in which parallel or distributed computation is feasible and even the norm.
19 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests
20 Classification
21 Classification?
22 Classification
23 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution.
24 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution. Goal: Design a classifier g : R d {0,1}.
25 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution. Goal: Design a classifier g : R d {0,1}. The probability of error is L(g) = P{g(X) Y}.
26 Basics of classification We have a random pair (X,Y) R d {0,1} with an unknown distribution. Goal: Design a classifier g : R d {0,1}. The probability of error is L(g) = P{g(X) Y}. The Bayes classifier { 1 if P{Y = 1 X = x} > 1/2 g (x) = 0 otherwise has the smallest probability of error, that is L = L(g ) = inf P{g(X) Y}. g:r d {0,1}
27 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y).
28 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n.
29 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n. The probability of error is L(g n ) = P{g n (X) Y D n }.
30 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n. The probability of error is L(g n ) = P{g n (X) Y D n }. It is consistent if lim EL(g n) = L. n
31 Basics of classification The data: D n = {(X 1,Y 1 ),...,(X n,y n )}, i.i.d. copies of (X,Y). A classifier g n (x) is a function of x and D n. The probability of error is L(g n ) = P{g n (X) Y D n }. It is consistent if lim EL(g n) = L. n It is universally consistent if it is consistent for all possible distributions of (X, Y).
32
33 Tree classifiers Many popular classifiers are universally consistent.
34 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers.
35 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers. Tree methods loom large for several reasons:
36 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers. Tree methods loom large for several reasons: All procedures that partition space can be viewed as special cases of partitions generated by trees.
37 Tree classifiers Many popular classifiers are universally consistent. These include several brands of histogram rules, k-nearest neighbor rules, kernel rules, neural networks, and tree classifiers. Tree methods loom large for several reasons: All procedures that partition space can be viewed as special cases of partitions generated by trees. Tree classifiers are conceptually simple, and explain the data very well.
38 Trees?
39 Trees?
40 Trees?
41 Trees?
42 Trees?
43 Trees?
44 Trees?
45 Trees
46 Trees
47 Trees
48 Trees
49 Trees
50 Trees
51 Trees
52 Trees
53 Trees?
54 Trees?
55 Trees?
56 Trees?
57 Trees?
58 Trees?
59 Trees?
60 Trees The tree structure is usually data dependent, and it is in the construction itself that trees differ.
61 Trees The tree structure is usually data dependent, and it is in the construction itself that trees differ. Thus, there are virtually infinitely many possible strategies to build classification trees.
62 Trees The tree structure is usually data dependent, and it is in the construction itself that trees differ. Thus, there are virtually infinitely many possible strategies to build classification trees. Despite this great diversity, all tree species end up with two fundamental questions at each node: 1 Should the node be split? 2 In the affirmative, what are its children?
63 The cellular spirit Cellular trees proceed from a different philosophy.
64 The cellular spirit Cellular trees proceed from a different philosophy. A cellular tree should be able to answer questions 1 and 2 using local information only. Majority vote Cellular decision No split Split
65 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests
66 A model Let C be a class of possible subsets of R d that can be used for splits.
67 A model Let C be a class of possible subsets of R d that can be used for splits. Example: C = { hyperplanes of R d}.
68 A model Let C be a class of possible subsets of R d that can be used for splits. Example: C = { hyperplanes of R d}. The class is parametrized by a vector σ R p.
69 A model Let C be a class of possible subsets of R d that can be used for splits. Example: C = { hyperplanes of R d}. The class is parametrized by a vector σ R p. There is a splitting function f(x,σ) such that R d is partitioned into A = {x R d : f(x,σ) 0} and B = {x R d : f(x,σ) < 0}.
70 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p.
71 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p. We see that σ is a model for the cellular decision 2.
72 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p. We see that σ is a model for the cellular decision 2. In addition, there is a second family of mappings θ, but this time with a boolean output.
73 A model A cellular split may be seen as a family of mappings σ : (R d {0,1}) n R p. We see that σ is a model for the cellular decision 2. In addition, there is a second family of mappings θ, but this time with a boolean output. It is a stopping rule and models the cellular decision 1.
74 Cellular procedure A cellular machine partitions the data recursively.
75 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split.
76 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}.
77 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}. The data D n are partitioned into two groups.
78 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}. The data D n are partitioned into two groups. The groups are sent to child cells, and the process is repeated.
79 Cellular procedure A cellular machine partitions the data recursively. If θ(d n) = 0, the root cell is final, and the space is not split. Otherwise, R d is split into A = {x : f (x,σ(d n)) 0} and B = {x : f (x,σ(d n)) < 0}. The data D n are partitioned into two groups. The groups are sent to child cells, and the process is repeated. Final classification proceeds by a majority vote.
80 An important remark Is any classifier a cellular tree?
81 An important remark Is any classifier a cellular tree? 1 Set θ = 1 if we care at the root, and θ = 0 elsewhere. 2 The root node is split by the classifier into a set A = {x R d : g n (x) = 1} and its complement, and both child nodes are leaves.
82 An important remark Is any classifier a cellular tree? 1 Set θ = 1 if we care at the root, and θ = 0 elsewhere. 2 The root node is split by the classifier into a set A = {x R d : g n (x) = 1} and its complement, and both child nodes are leaves. This is not allowed.
83 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests
84 An example At first sight, there are no consistent cellular tree classifiers.
85 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree.
86 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s.
87 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds.
88 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds. In d dimensions, rotate through the coordinates.
89 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds. In d dimensions, rotate through the coordinates. This rule is consistent, provided k and k2 k /n 0.
90 An example At first sight, there are no consistent cellular tree classifiers. Example: The k-median tree. When d = 1, split by finding the median element among the X i s. Keep doing this for k rounds. In d dimensions, rotate through the coordinates. This rule is consistent, provided k and k2 k /n 0. This is not cellular.
91 A randomized solution The cellular splitting method σ mimics the median tree classifier.
92 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random.
93 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random. The selected dimension is then split at the median.
94 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random. The selected dimension is then split at the median. The novelty is in the choice of the decision function θ.
95 A randomized solution The cellular splitting method σ mimics the median tree classifier. The dimension to cut is chosen uniformly at random. The selected dimension is then split at the median. The novelty is in the choice of the decision function θ. This function ignores the data altogether and uses a randomized decision that is based on the size of the input.
96 Consistency Consider a nonincreasing function ϕ : N (0, 1].
97 Consistency Consider a nonincreasing function ϕ : N (0, 1]. Then, if U is the uniform [0,1] random variable associated with node A, θ = 1 [U>ϕ(N(A))].
98 Consistency Consider a nonincreasing function ϕ : N (0, 1]. Then, if U is the uniform [0,1] random variable associated with node A, θ = 1 [U>ϕ(N(A))]. Theorem Let β be a real number in (0,1). Define { 1 if n < 3 ϕ(n) = 1/log β n if n 3. Then lim EL(g n) = L as n. n
99 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests
100 A full 2 d -ary tree At the root, we find the median in direction 1.
101 A full 2 d -ary tree At the root, we find the median in direction 1. Then on each of the two subsets, we find the median in direction 2.
102 A full 2 d -ary tree At the root, we find the median in direction 1. Then on each of the two subsets, we find the median in direction 2. Then on each of the four subsets, we find the median in direction 3, and so forth.
103 A full 2 d -ary tree At the root, we find the median in direction 1. Then on each of the two subsets, we find the median in direction 2. Then on each of the four subsets, we find the median in direction 3, and so forth. Repeating this for k levels of nodes leads to 2 dk leaf regions.
104 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 i=1 1 [Xi A,Y i=0] ).
105 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ).
106 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ). Set ˆL n (A,k + ) = A j P k +(A) ˆL n (A j ) N(A j) N(A).
107 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ). Set ˆL n (A,k + ) = A j P k +(A) ˆL n (A j ) N(A j) N(A). Both ˆL n (A) and ˆL n (A,k + ) may be evaluated on the basis of the data points falling in A only.
108 The stopping rule θ The quality of the classifier at node A is assessed by ( ˆL n (A) = 1 n N(A) min n 1 [Xi A,Y i=1], i=1 Define the nonnegative integer k + by i=1 k + = αlog 2 (N(A)+1). 1 [Xi A,Y i=0] ). Set ˆL n (A,k + ) = A j P k +(A) ˆL n (A j ) N(A j) N(A). Both ˆL n (A) and ˆL n (A,k + ) may be evaluated on the basis of the data points falling in A only. This is cellular.
109 The stopping rule θ Put θ = 0 if ˆL n (A) ˆL n (A,k + ) ( ) β 1. N(A)+1
110 The stopping rule θ Put θ = 0 if ˆL n (A) ˆL n (A,k + ) ( ) β 1. N(A)+1 Theorem Take 1 dα 2β > 0. Then lim EL(g n) = L as n. n
111 Outline 1 Context 2 Cellular tree classifiers 3 A mathematical model 4 Are there consistent cellular tree classifiers? 5 A non-randomized solution 6 Random forests
112 Leo Breiman ( )
113 Leo Breiman ( )
114 From trees to forests Leo Breiman promoted random forests.
115 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules.
116 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules. The base trees are simple and randomized.
117 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules. The base trees are simple and randomized. Breiman s ideas were decisively influenced by Amit and Geman (1997, geometric feature selection). Ho (1998, random subspace method). Dietterich (2000, random split selection approach).
118 From trees to forests Leo Breiman promoted random forests. Idea: Using tree averaging as a means of obtaining good rules. The base trees are simple and randomized. Breiman s ideas were decisively influenced by Amit and Geman (1997, geometric feature selection). Ho (1998, random subspace method). Dietterich (2000, random split selection approach). stat.berkeley.edu/users/breiman/randomforests
119 Random forests They are serious competitors to state-of-the-art methods.
120 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement.
121 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement. They can handle a very large number of input variables.
122 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement. They can handle a very large number of input variables. The algorithm is difficult to analyze and its mathematical properties remain to date largely unknown.
123 Random forests They are serious competitors to state-of-the-art methods. They are fast and easy to implement. They can handle a very large number of input variables. The algorithm is difficult to analyze and its mathematical properties remain to date largely unknown. Most theoretical studies have concentrated on isolated parts or stylized versions of the procedure.
124 Three basic ingredients 1. Randomization and no-pruning For each tree, select at random, at each node, a small group of input coordinates to split.
125 Three basic ingredients 1. Randomization and no-pruning For each tree, select at random, at each node, a small group of input coordinates to split. Calculate the best split based on these features and cut.
126 Three basic ingredients 1. Randomization and no-pruning For each tree, select at random, at each node, a small group of input coordinates to split. Calculate the best split based on these features and cut. The tree is grown to maximum size.
127 Three basic ingredients 2. Aggregation Final predictions are obtained by aggregating over the ensemble.
128 Three basic ingredients 2. Aggregation Final predictions are obtained by aggregating over the ensemble. It is fast and easily parallelizable.
129 Three basic ingredients 3. Bagging The subspace randomization scheme is blended with bagging.
130 Three basic ingredients 3. Bagging The subspace randomization scheme is blended with bagging. Breiman (1996). Bühlmann and Yu (2002). Biau, Cérou and Guyader (2010).
131 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}.
132 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}. A generic pair: (X,Y) satisfying EY 2 <.
133 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}. A generic pair: (X,Y) satisfying EY 2 <. Our goal: Estimate the regression function r(x) = E[Y X = x].
134 Mathematical framework A training sample: D n = {(X 1,Y 1 ),...,(X n,y n )}. A generic pair: (X,Y) satisfying EY 2 <. Our goal: Estimate the regression function r(x) = E[Y X = x]. Quality criterion: E[r n (X) r(x)] 2.
135 The model A random forest is a collection of randomized base regression trees {r n (x,θ m,d n ),m 1}.
136 The model A random forest is a collection of randomized base regression trees {r n (x,θ m,d n ),m 1}. These random trees are combined to form the aggregate r n (X,D n ) = E Θ [r n (X,Θ,D n )].
137 The model A random forest is a collection of randomized base regression trees {r n (x,θ m,d n ),m 1}. These random trees are combined to form the aggregate r n (X,D n ) = E Θ [r n (X,Θ,D n )]. Θ is independent of X and the training sample D n.
138 The procedure
139 The procedure
140 The procedure
141 The procedure
142 The procedure
143 The forest A local averaging estimate: r n (x) = n i=1 K(x,X i)y i n j=1 K(x,X j).
144 The forest A local averaging estimate: r n (x) = n i=1 K(x,X i)y i n j=1 K(x,X j). Centered cuts: K(x,z) = k n1,...,k nd d j=1 knj=kn k n! k n1!...k nd! d j=1 p knj nj 1 [k nj<α j].
145 The forest A local averaging estimate: r n (x) = n i=1 K(x,X i)y i n j=1 K(x,X j). Centered cuts: K(x,z) = k n1,...,k nd d j=1 knj=kn k n! k n1!...k nd! d j=1 p knj nj 1 [k nj<α j]. K(x,z) = Uniform cuts: k n1,...,k nd d j=1 knj=kn k n! k n1!...k nd! d m=1 p knm nm 1 k nm! log xm z m 0 t knm e t dt.
146
147 Consistency Theorem The random forests estimate r n is consistent whenever p nj logk n for all j = 1,...,d and k n /n 0 as n.
148 Consistency Theorem The random forests estimate r n is consistent whenever p nj logk n for all j = 1,...,d and k n /n 0 as n. In the purely random model, p nj = 1/d, independently of n and j.
149 Consistency Theorem The random forests estimate r n is consistent whenever p nj logk n for all j = 1,...,d and k n /n 0 as n. In the purely random model, p nj = 1/d, independently of n and j. A more in-depth analysis is needed.
150 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation.
151 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients.
152 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients. High-throughput technologies.
153 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients. High-throughput technologies. Sparse estimation is playing an increasingly important role in the statistics and machine learning communities.
154 Sparsity There is empirical evidence that many signals in high-dimensional spaces admit a sparse representation. Images wavelet coefficients. High-throughput technologies. Sparse estimation is playing an increasingly important role in the statistics and machine learning communities. Several methods have recently been developed in both fields, which rely upon the notion of sparsity.
155 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features.
156 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features. In other words: r(x) = E[Y X S ].
157 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features. In other words: r(x) = E[Y X S ]. In this dimension reduction scenario, the ambient dimension d can be very large, much larger than n.
158 Our vision The regression function r(x) = E[Y X] depends in fact only on a nonempty subset S of the d features. In other words: r(x) = E[Y X S ]. In this dimension reduction scenario, the ambient dimension d can be very large, much larger than n. As such, the value S characterizes the sparsity: The smaller S, the sparser r.
159 Sparsity and random forests Ideally, p nj = 1/S for j S.
160 Sparsity and random forests Ideally, p nj = 1/S for j S. To stick to reality, we will rather require p nj = 1 S (1+ξ nj).
161 Sparsity and random forests Ideally, p nj = 1/S for j S. To stick to reality, we will rather require p nj = 1 S (1+ξ nj). Today, an assumption. Tomorrow, a lemma.
162 Main result Theorem If p nj = (1/S)(1+ξ nj ) for j S, with ξ nj logn 0 as n, then for the choice /(1+ k n n S log 2 ), we have lim sup n E[ r n (X) r(x)] sup (X,Y) F S S log n Λ.
163 Main result Theorem If p nj = (1/S)(1+ξ nj ) for j S, with ξ nj logn 0 as n, then for the choice /(1+ k n n S log 2 ), we have lim sup n E[ r n (X) r(x)] sup (X,Y) F S S log n Λ. Take-home message The rate n 0.75 S log is strictly faster than the usual minimax rate n 2/(d+2) as soon as S 0.54d.
164 Dimension reduction
165 Proof root G k G n k P k A G + k k + P k +(A)
166 Proof Let ψ(n,k) = L k L. Set (2 ) kn = min l 0 : ψ(n,l) < dl 1 dα n. Then 2 dk n n 0 as n.
Classification and Trees
Classification and Trees Luc Devroye School of Computer Science, McGill University Montreal, Canada H3A 2K6 This paper is dedicated to the memory of Pierre Devijver. Abstract. Breiman, Friedman, Gordon
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationTECHNICAL REPORT NO Random Forests and Adaptive Nearest Neighbors 1
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1055 May 29, 2002 Random Forests and Adaptive Nearest Neighbors 1 by Yi Lin and Yongho Jeon
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan Joint work with Daniel M. Roy and Yee Whye Teh 1 Outline Background and Motivation Mondrian Forests Randomization mechanism Online
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationIntroduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Nearest-Neighbor Algorithms Nearest Neighbor Algorithms Definition: fix k 1, given a labeled
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationImproving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets
Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationApril 3, 2012 T.C. Havens
April 3, 2012 T.C. Havens Different training parameters MLP with different weights, number of layers/nodes, etc. Controls instability of classifiers (local minima) Similar strategies can be used to generate
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationV Advanced Data Structures
V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 18.9 Goals (Naïve Bayes classifiers) Support vector machines 1 Support Vector Machines (SVMs) SVMs are probably the most popular off-the-shelf classifier! Software
More informationMulti-label Classification. Jingzhou Liu Dec
Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with
More informationDecision Trees / Discrete Variables
Decision trees Decision Trees / Discrete Variables Season Location Fun? Location summer prison -1 summer beach +1 Prison Beach Ski Slope Winter ski-slope +1-1 Season +1 Winter beach -1 Winter Summer -1
More informationComputational Statistics and Mathematics for Cyber Security
and Mathematics for Cyber Security David J. Marchette Sept, 0 Acknowledgment: This work funded in part by the NSWC In-House Laboratory Independent Research (ILIR) program. NSWCDD-PN--00 Topics NSWCDD-PN--00
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationV Advanced Data Structures
V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation
More informationBias-Variance Analysis of Ensemble Learning
Bias-Variance Analysis of Ensemble Learning Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd Outline Bias-Variance Decomposition
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationNonparametric Approaches to Regression
Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques
More informationAn Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods
An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005
More information8. Tree-based approaches
Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques
More informationOutline Introduction Problem Formulation Proposed Solution Applications Conclusion. Compressed Sensing. David L Donoho Presented by: Nitesh Shroff
Compressed Sensing David L Donoho Presented by: Nitesh Shroff University of Maryland Outline 1 Introduction Compressed Sensing 2 Problem Formulation Sparse Signal Problem Statement 3 Proposed Solution
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationA Geometric Analysis of Subspace Clustering with Outliers
A Geometric Analysis of Subspace Clustering with Outliers Mahdi Soltanolkotabi and Emmanuel Candés Stanford University Fundamental Tool in Data Mining : PCA Fundamental Tool in Data Mining : PCA Subspace
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.
. Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction
More informationVertex Cover is Fixed-Parameter Tractable
Vertex Cover is Fixed-Parameter Tractable CS 511 Iowa State University November 28, 2010 CS 511 (Iowa State University) Vertex Cover is Fixed-Parameter Tractable November 28, 2010 1 / 18 The Vertex Cover
More informationAn introduction to random forests
An introduction to random forests Eric Debreuve / Team Morpheme Institutions: University Nice Sophia Antipolis / CNRS / Inria Labs: I3S / Inria CRI SA-M / ibv Outline Machine learning Decision tree Random
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationA Random Forest Guided Tour
Noname manuscript No. (will be inserted by the editor) A Random Forest Guided Tour Gérard Biau Erwan Scornet Received: date / Accepted: date Abstract The random forest algorithm, proposed by L. Breiman
More informationLecture 2 :: Decision Trees Learning
Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationNearest Neighbors Classifiers
Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,
More informationCS 340 Lec. 4: K-Nearest Neighbors
CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection
More informationNetwork. Department of Statistics. University of California, Berkeley. January, Abstract
Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationText Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999
Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification
More informationConvex and Distributed Optimization. Thomas Ropars
>>> Presentation of this master2 course Convex and Distributed Optimization Franck Iutzeler Jérôme Malick Thomas Ropars Dmitry Grishchenko from LJK, the applied maths and computer science laboratory and
More informationTreewidth and graph minors
Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under
More informationCART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology
CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.
More informationRisk bounds for some classification and regression models that interpolate
Risk bounds for some classification and regression models that interpolate Daniel Hsu Columbia University Joint work with: Misha Belkin (The Ohio State University) Partha Mitra (Cold Spring Harbor Laboratory)
More informationA Random Forest Guided Tour
A Random Forest Guided Tour Gérard Biau Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France & Institut universitaire de France gerard.biau@upmc.fr Erwan Scornet Sorbonne Universités, UPMC
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationMachine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationCompact Sets. James K. Peterson. September 15, Department of Biological Sciences and Department of Mathematical Sciences Clemson University
Compact Sets James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University September 15, 2017 Outline 1 Closed Sets 2 Compactness 3 Homework Closed Sets
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationPaths, Flowers and Vertex Cover
Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationNearest Neighbor Classification
Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest
More informationFaster Rates in Regression via Active Learning
Faster Rates in Regression via Active Learning Rui Castro Rice University Houston, TX 77005 rcastro@rice.edu Rebecca Willett University of Wisconsin Madison, WI 53706 willett@cae.wisc.edu Robert Nowak
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationNONPARAMETRIC REGRESSION TECHNIQUES
NONPARAMETRIC REGRESSION TECHNIQUES C&PE 940, 28 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available at: http://people.ku.edu/~gbohling/cpe940
More informationMath 5593 Linear Programming Lecture Notes
Math 5593 Linear Programming Lecture Notes Unit II: Theory & Foundations (Convex Analysis) University of Colorado Denver, Fall 2013 Topics 1 Convex Sets 1 1.1 Basic Properties (Luenberger-Ye Appendix B.1).........................
More informationUSING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY
1 USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu ABSTRACT A prediction algorithm is consistent
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS
More information10. EXTENDING TRACTABILITY
0. EXTENDING TRACTABILITY finding small vertex covers solving NP-hard problems on trees circular arc coverings vertex cover in bipartite graphs Lecture slides by Kevin Wayne Copyright 005 Pearson-Addison
More informationA Random Forest Guided Tour
A Random Forest Guided Tour Gérard Biau Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France & Institut Universitaire de France gerard.biau@upmc.fr Erwan Scornet Sorbonne Universités, UPMC
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationFoundations of Computer Science Spring Mathematical Preliminaries
Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationLOGISTIC REGRESSION FOR MULTIPLE CLASSES
Peter Orbanz Applied Data Mining Not examinable. 111 LOGISTIC REGRESSION FOR MULTIPLE CLASSES Bernoulli and multinomial distributions The mulitnomial distribution of N draws from K categories with parameter
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationDiscrete Optimization. Lecture Notes 2
Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The
More informationLet the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( )
17.4 Dynamic tables Let us now study the problem of dynamically expanding and contracting a table We show that the amortized cost of insertion/ deletion is only (1) Though the actual cost of an operation
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationDiversity Coloring for Distributed Storage in Mobile Networks
Diversity Coloring for Distributed Storage in Mobile Networks Anxiao (Andrew) Jiang and Jehoshua Bruck California Institute of Technology Abstract: Storing multiple copies of files is crucial for ensuring
More informationVoxel selection algorithms for fmri
Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationChapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn
Chapter 5 Graph Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Graphs Extremely important concept in computer science Graph, : node (or vertex) set : edge set Simple graph: no self loops, no multiple
More informationNUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING
NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING Dmitry Potapov National Research Nuclear University MEPHI, Russia, Moscow, Kashirskoe Highway, The European Laboratory for
More information