Adaptive Pattern Discovery for Interactive Multimedia Retrieval

Size: px
Start display at page:

Download "Adaptive Pattern Discovery for Interactive Multimedia Retrieval"

Transcription

1 Adaptive Pattern Discovery for Interactive Multimedia Retrieval Yimin Wu and Aidong Zhang Department of Computer Science and Engineering, SUNY at Buffalo, Abstract Relevance feedback has been an indispensable component for multimedia retrieval systems. In this paper, we present an adaptive pattern discovery method, which addresses relevance feedback by interactively discovering meaningful patterns of relevant objects. To facilitate pattern discovery, we first present a dynamic feature extraction method, which aims to alleviate the curse of dimensionality by extracting a feature subspace using balanced information gain. In the feature subspace, we train an online pattern classification method called adaptive random forests to classify multimedia objects as relevant or irrelevant. Our adaptive random forests adapts the traditional classification method known as random forests for relevance feedback. It improves the efficiency of pattern discovery by choosing the most-informative samples for online learning. Extensive experiments are carried out on a Corel image set (with 31,438 images) to evaluate the performance of our method as compared against the state-of-the-art approaches. 1 Introduction In multimedia retrieval, the necessity to integrate relevance feedback stems from the following reasons: The subjective nature of information retrieval. That is, different users have different opinions about whether a returned object is relevant or irrelevant. Content-based multimedia descriptors, which are normally high-dimensional feature vectors, are unable to capture complex query concepts well. To address these issues, relevance feedback has been introduced to integrate users into multimedia retrieval, where users are allowed to show their preference by labeling retrieval results as relevant or irrelevant. Using user-labeled objects as training samples, the system iteratively adapts the successive retrieval results to the user s query concept. To capture the user s subjective query concept, it is inevitable to perform online learning in multimedia retrieval. Hence, most existing methods address relevance feedback as an online learning problem, although many research efforts [16] have also been devoted to challenging it in an offline manner. To perform online learning, many researchers address the relevance feedback using machine learning and/or pattern recognition methods. For example, some researchers employ the biased discriminant analysis [19], while others apply AdaBoost algorithm [14], or support vector machine [15]. However, to well facilitate the application of machine learning and pattern recognition methods to relevance feedback, the following critical issues have to be addressed appropriately: The Curse of Dimensionality. Multimedia feature vectors are often of high dimensions. In high dimensional feature spaces, most machine learning methods can only achieve good performance with large amount of training samples. Multimodal distribution of relevant objects. For complex queries, the relevant objects are distributed multimodally (i.e., nonlinearly) in the feature space. To simplify the above problems, some approaches [12, 17] assume a unimodal-gaussian distribution for relevant objects, so they only perform optimally in unimodal/linear cases. On the other hand, some other methods can handle multimodality by using discriminative classifiers (such as support vector machine [15]). But, with the limited number of training samples from relevance feedback, traditional classifiers, such as support vector machine and decision trees [11], can not yield a strong classifier unless the user can provide more training samples (as in [15]). To learn with small training samples from relevance feedback, Tieu et al. [14] proposed to boost image retrieval with AdaBoost [8]. Unfortunately, it is computationally intractable to boost multiple discriminative classifiers (such as decision trees) online. Hence, they boost multiple Gaussian-model-based weak classifiers, each of which computes a Gaussian model to fit relevant and irrelevant images on one feature 1. When all the features are taken into account, still assumes a unimodal-gaussian distribution for either class of images. In this paper, we present an adaptive pattern discovery method (for relevance feedback), which aims to improve the performance of multimedia retrieval by discovering meaningful patterns of relevant objects. To facilitate 1 A feature means an element of the feature vector here.

2 pattern discovery, we first present a dynamic feature extraction method to alleviate the curse of dimensionality. During online learning, our method selects a feature subspace using balanced information gain [6]. The advantages of our dynamic feature extraction are: it not only removes noises from the feature space, but also improves the efficiency of the learning machine. In the feature subspace, we train a pattern classification method known as random forests [3] to classify multimedia objects (as relevant or irrelevant). As a nonparametric and nonlinear classification algorithm, random forests can handle the multimodality of multimedia objects. However, even in the feature subspace, it is still computationally intractable to train a regular random forests online. To improve the efficiency of random forests, we present an active sample selection method, which selects the most-informative samples for online learning. By combining random forests with active sample selection, we propose an online pattern classification method termed adaptive random forests, which runs 2 to 3 times faster than the regular random forests, while achieving comparable precision/recall against the latter. Extensive experiments on a Corel image set (with 31,438 images) demonstrate that our method outperforms the stateof-the-art approaches [, 14] by at least 22% on average precision and recall. The rest of this paper is organized as follows. We first coin some useful notations and briefly introduce the random forests algorithm in Section 2. Our adaptive pattern discovery method is then presented in Section 3. Empirical results are given in Section 4. Finally, we conclude in Section 5. 2 Random Forests In this section, we introduce some useful notation and the random forests algorithm. 2.1 Useful Notation As first, we represent the feature space as F = {f 1,..., f M }, where M is the dimensions of F. We then denote the multimedia database by db = {o 1,..., o I }, where I is the size of db. To represent each multimedia object o i db, we use a real-valued vector (i.e., point) o i = [o i,1,..., o i,m ] T, where o i is an instance in the feature space F. Similarly, the query q is represented by vector q = [q 1,..., q M ] T. To calculate the distance between object o i and query q, we use the Euclidean distance of their feature vectors. From the original feature space F, our dynamic feature extraction aims to learn a projection ψ : F F, where F F is a M dimensional subspace of F. With projection ψ, we can project any object o i db into F, with o i = ψ( o i) and ψ( o i ) F. During interactive multimedia retrieval, we obtain the training set S = {(s 1, v 1 )..., (s N, v N )} from relevance feedback, where N is the size of S. In the training set S, each training sample (s n, v n ) S is a labeled object represented by s n = [s n,1,..., s n,m ] T, where v n {0, 1} is its class value (0/1 means irrelevant/relevant). We then denote the set of relevant/irrelevant training samples as R/U, with S = R U. Using the training set S, we train an adaptive random forest h to classify database objects into one of the two classes: relevant or irrelevant. 2.2 Random Forests Algorithm In this section, we briefly introduce random forests [3], which is a method for growing a composite classifier from multiple tree classifiers. To obtain the composite classifier, it combines bagging [2] with random feature selection [3] and achieves favorable performance over the state-of-theart approaches (such as bagging and AdaBoost [8]). The tree classifier trained in random forests is the classification and regression tree (CART) [4]. At each node, CART searches through all features to find the best split for allocating training data to two children nodes. If the current node only contains training data of one specific class, CART makes it a leaf node. This leaf node then assigns all test data falling into it to that class. All test instances are run down the tree to get their predicted classes. As a collection of tree classifiers, the random forest h = {h j ( o, θ j ), j = 1,...J} trains its jth tree classifier h j with a bootstrap [2, 7] sample S j S and a random vector θ j, where J is the total number of tree classifiers in h. The bootstrap sample S j is obtained by randomly resampling the original training set S with replacement from S, while the random vector θ j is generated, independent of the past random vectors θ 1,..., θ j 1 but with the same distribution. When growing h j, the split on each node is chosen as the best one from M randomly selected features. As pointed out by Breiman [3], a random forest is insensitive to M and performs optimally (in general) when M = M, where M is the dimensions of the feature space F. So, random forests runs M times faster than bagging and AdaBoost for combining tree classifiers. To classify input object o, random forest h lets its member classifiers vote for the most popular class, where each classifier casts a unit vote. Breiman [3] proved that a random forest, as more trees are added, does not overfit and only produces a limited value of generalization errors. 3 Adaptive Pattern Discovery In this section, we present our adaptive pattern discovery method, which is consisted of dynamic feature extraction, active sample selection and online pattern classification.

3 3.1 Dynamic Feature Extraction The high dimensionality of multimedia feature spaces is often hindrance for relevance feedback: it not only degrades the efficiency of the learning machine, but also impedes the application of many pattern recognition methods due to the curse of dimensionality [9]. To alleviate the curse of dimensionality, some researchers extract a low-dimensional subspace using PCA (principle component analysis) [13], while others [14] employ the feature selection technique to select an optimal subspace from the original feature space. In this section, we present a dynamic feature extraction method using balanced information gain [6], which initiates from information theory and machine learning. To present our method, we first define the entropy [11] of training set S as follows: E(S) = p log 2 p p log 2 p (1) where p /p is the percentage of relevant/irrelevant training samples in S. According to information theory, the larger the E(S), the more number of bits is required to encode S. Let S i be a partition of S, the information gain [11] of partition S i is then given by: G( i S i ) = E(S) 1 S S i E(S i ) (2) In the light of Occam s razor [11], the partition maximizes the above information gain should be chosen, since it leads to the most concise representation of S. Despite its successful applications in text classification [18], information gain has the drawback of not placing any penalty on the arity of partitions, so it favors partitions with excessively large arity. To balance the bias of information gain, we use the following balanced information gain [6]: B g ( i where κ is the arity of S i. i S i ) = G( i S i ) log 2 κ, (3) Figure 1: Several useful concepts for computing balanced information gain. So far we have reduced the feature extraction problem to selecting features with maximum balanced information gain. For each feature, we compute its balanced information gain as follows [11, 6]: We first sort all samples in ascending order, and set the mean values of adjacent samples with different classification as potential cut points [11, 6] of the partition. On the feature, t-1 cut points create t continuous intervals (see Figure 1), which comprise a t-ary partition of S, since these intervals contain t non-overlapped subsets of S. In the beginning, all training samples belong to the single interval of a 1-ary partition. To obtain the t-ary partition, we then select one interval of the (t-1)-ary partition to be partitioned into 2 subintervals. This is done by greedily choosing the cut point which maximizes the balanced information gain at each step. Suppose κ is the maximum arity desired, we set the balanced information gain of current feature to the maximum value achieved by the partitions from 2 to κ-ary. After the above computation, our method outputs features with the top M largest balanced information gain, where M is an empirical threshold. The output of feature extraction is represented by an M -element sequence Z = {1 z 1 <... < z m <... < z M M}, where each element z m Z specifies an individual coordinate of the elements (from the original point o i ) that will appear in the projection o i = ψ( o i ). The computational cost T (e) of our feature extraction method is dominated by the sort operation, so we have T (e) = O(MN log N). 3.2 Active Sample Selection In the optimal feature subspace, we train a random forest to classify multimedia objects. However, during multimedia retrieval, it is computationally intractable to train a regular random forest online. To address this issue, we present an active sample selection method to improve the efficiency of random forests. Before introducing our approach, we first analyze the computational cost of random forests, which is dominated by training its member classifiers (CART). If each CART is grown to a uniform depth D (i.e., to 2 D leaf nodes), the computational cost of random forest h is [4, 3]: T (h) J N (D + 1) log N (4) where J is the number of tree classifiers and N is the size of the bootstrap sample for each tree classifier. Since T (h) increases super-linearly with N, we can reduce T (h) by ensuring N N 0, where N 0 is a threshold. However, in the regular random forests, N always (approximately) equals 0.63N, because the size of a bootstrap sample is about 63% of that of the original sample set [7]. For online learning, bootstrap lacks the flexibility to reduce the size of each bootstrap sample as necessary. To address this issue, we present an active sample selection method, which actively selects no more than N 0 training samples for each tree classifier. To achieve this, we have to discard some training samples from the bootstrap sample. Obviously, it is favorable to keep all the most-informative training samples and to discard some less-informative ones. In the training set S, the most-informative samples are given as follows:

4 Relevant samples. In relevance feedback, the number of relevant samples is often much smaller than that of irrelevant ones, so every relevant sample is precious and/or informative. Centroids of irrelevant samples. In the light of pattern recognition [9], the centroids of irrelevant clusters are dependable representatives for irrelevant samples/patterns. According to above discussion, we need to cluster irrelevant training samples to find the most-informative ones from them. To cluster irrelevant samples, we employ an incremental clustering method termed doubling algorithm [5], which takes the maximum number of clusters as input. As a centroid-preserving approach, the doubling algorithm guarantees that the centroids of resulting clusters are members of irrelevant samples. Moreover, it also provides proven performance guarantee on the quality of resulting clusters. Figure 2: An illustration of our active sample selection. Figure 2 demonstrates the principles of our active sample selection method, which can be summarized as follows: Incrementally cluster new irrelevant samples in each feedback iteration and add the resulting centroids into the most-informative irrelevant set U c. Get the most-informative sample set S c = R U c, where R is the relevant training set. Get the less-informative sample set U d = U U c. To obtain the training set S j (for the jth classifier h j ), we create a bootstrap sample S j,c from S c, and randomly choose N 0 S j,c samples from U d to form the random sample set S j,d. Finally, we use S j = S j,c Sj,d as the training set for h j. In comparison with bootstrap, our active sample selection method distinguishes between most-informative training samples and less-informative ones. To reduce the number of training samples, our method selects as many mostinformative training samples as bootstrap does, but intelligently discards some less-informative ones. The computational cost T (s) of our active sample selection is dominated by the clustering operation. Hence, we have T (s) = O(N ζ log ζ) [5], where N is the number of new irrelevant samples in each iteration and ζ is the maximum number of clusters. By combining the active sample selection with random forests, we develop an online pattern classification method called adaptive random forests. Given the optimal feature subspace F and the projection ψ : F F (cf. Section 3.1), our adaptive random forest h = {h j (ψ( o), θ j ), j = 1,...J} classifies database objects as follows: { J h 1 if ( o) = j=1 h j(ψ( o)) J 2, (5) 0 otherwise. In Formula 5, each tree h j outputs 0 or 1 as the class value of the input object, so it casts a unit vote in h. h then classifies the input object into the most popular class. 3.3 Adaptive Pattern Discovery During multimedia retrieval, we train an adaptive random forest h to classify multimedia objects, and use classifiedrelevant/irrelevant to denote objects (or set of objects) that are classified as relevant/irrelevant. From time to time, the adaptive random forest may output less than k classifiedrelevant objects, where k is the number of objects returned to the user. To address this issue, we define the relevance probability as follows: J j=1 P (1 o) = h j(ψ( o)). (6) J P (1 o) is the number of tree classifiers output o as relevant over the total number of classifiers. The larger the P (1 o), the more confident it is to output object o as relevant. So, if less than k objects are classified as relevant, our method returns some classified-irrelevant objects with the largest P (1 o) values. When applied to relevance feedback, our adaptive pattern discovery method is summarized as follows: For each query q, our method runs an initial nearest neighbor search and returns the k nearest neighbors of q to the user. It then asks the user to provide the initial training sample set S. With S, our method extracts the subspace F and the projection ψ : F F. In F, it trains an adaptive random forest h. After that, our approach projects every object into F and classifies it using h. Each object classified as relevant is added to the classified-relevant set Γ. In case Γ < k, we find k Γ classified-irrelevant objects with the largest relevance probability and add them into Γ. From Γ, our method returns the k nearest neighbors of relevant centroid to the user. Based on the latest retrieval result, the user can provide more training samples, and the system can start a new learning iteration accordingly. An alternate scheme for the above adaptive pattern discovery is to ultimately neglect the query and to rank database objects using the relevance probability (defined in Formula 6). This scheme is suitable for classification applications with sufficient training samples. However, it is unlikely to succeed in relevance feedback, for which we have to train the classifier with small number of samples especially in the initial iterations, we may have only a few relevant training samples.

5 4 Empirical Results 4.1 Experimental Setup The data set used in our experiments is a Corel image set with 31,438 images. To evaluate retrieval performance, we choose 44 semantic categories (such as rose, butterfly, tiger, eagle, penguin and falls, etc.) with refined ground truth. We employ all the 5,172 images from these categories as queries. For each query, the retrieval is executed on the the whole database (with 31,438 images). Precision and recall are used to evaluate the retrieval performance. Precision is the number of retrieved relevant images over the total number of retrieved images, and recall is the number of retrieved relevant images over the total number of relevant images in the database. To calculate precision and recall, only those retrieved images from the same semantic category as the query are counted as relevant. The average precision and recall of all queries are used as the overall performance. In our experiments, each image is represented by the following five image features: the first one is a 64-bin color coherence vector in the HSV color space. The second one is a 9-bin color moments extracted from the L*a*b color space. The third one is a -bin wavelet-based texture feature [12]. The fourth one is a 64-bin edge coherence histogram [1]. The fifth one is a 32-bin Fourier shape descriptor[1]. We normalize feature values into the range [-1, 1] and concatenate all image features into a 179-bin feature vector. The number of nearest neighbors returned (that is, k) is often called scope. Since retrieval performance also varies with scope, we conducted experiments on scopes of and 80, respectively. For comparison, we provide the performance of the regular random forests (), support vector machine () [] and the AdaBoost-based method () [14] under the same experimental conditions. All our experiments are run on a SUN Ultra 80 with a 4MHZ CPU and 1GB memory. Based on the presented method, we have implemented a content-based image retrieval system called PicQuest. Figure 3 demonstrates the user interface of our system. Figure 3: User interface of our system. the computational cost of the random forest increases linearly with the number of trees J, we empirically minimize J, while maintaining the performance close to optimal. Table 1 demonstrates the precision achieved by random forests on the scope of when J=, and 1. Obviously, a random forest with trees achieves the nearly-optimal performance: it performs almost as good as a random forest with substantially more trees (e.g., 1 trees), and dramatically outperforms a random forest with trees. So, we set J to in all the following experiments. To test the performance of our dynamic feature extraction method, we perform experiments when the dimension of the subspace (i.e., M ) equals 179, 1 and, which correspond to employ all features, % of all features and % of them. To distinguish these three cases, we denote them by ARF-179D, ARF-1D and ARF-D, respectively. For a specific value of M, we adjust N 0 to guarantee our adaptive random forests (ARF) runs as efficiently as the state-of-the-art approaches (such as [14]). Hence, N 0 is set to the values given in Table Thresholds Table 2: Value of N 0 in difference cases. 4.3 Performance Evaluation Table 1: Precisions (%) achieved by, and 1 trees. We demonstrate how to decide appropriate values for the tree number J and sample set size N 0 in this section. Since In the following discussion, we neglect recall on the scope of, because (on smaller scopes) recall is not as effective as precision for comparing different approaches. For instance, in our experiments, the maximum achievable recall on the scope of is about 16.7%(= /1), since the

6 Echo time(seconds) The number of iterations (scope = ) 55 (a) The number of iterations (scope = ) 15 (b) Echo time(seconds) The number of iterations (scope = ) (c) The number of iterations (scope = 80) (d) The number of iterations (scope = 80) (e) The number of iterations (scope = 80) (f) Figure 4: A comparison of different methods using precision, recall and echo time. The performance of scopes and 80 are listed from the top down. In each row, the precision, recall and echo time are demonstrated from the left to the right Scope = 80 Scope = 0 90 Scope = 80 Scope = Scope = 80 Scope = (a) (b) (c) Figure 5: Precision-recall curves. (a)/(b)/(c) show the curves of ARF-D//. Table 3: Performance of ARF-1D and ARF-D on the th iteration, where P/R denotes precision/recall. average size of all semantic categories is about 1. In this case, the difference on recall of compared methods does not precisely reflect their disparity on performance. In our experiments, we run feedback iterations for each query. Figure 4 presents the performance of, ARF-179D, ARF-D, and for each iteration. Table 3 compares the performance of ARF-D and ARF- 1D for the th iteration. From Figure 4 and Table 3, we can draw the following conclusions: is too computationally intensive for relevance feedback. On the scope of 80, the echo time of can go as high as 18 seconds. In comparison with ARF methods, runs 2 to 3 times slower than the former, but only improves the precision and recall slightly by 1 3%. Among the three compared ARF methods, ARF-D is the most cost-effective one. On all scopes, it either achieves as good precision and recall as ARF- 1D/ARF-179D do, or improves the precision and recall over the latter by 1 2%. As to efficiency, it runs about 15 % faster than ARF-1D and ARF-179D in most cases. ARF methods dramatically outperform both and on retrieval performance. ARF-D improves the precision/recall over by 22 %. As to the efficiency, ARF-D runs as efficiently as on

7 the scope of 80, while improving the efficiency against by % on the scope of. can not achieve good performance with small training samples. On the scope of, the precision achieved by are severely inferior to those attained by. Only with more training samples from the scope of 80, achieves comparable precision/recall against. During online learning, our dynamic feature extraction is very successful in extracting the optimal features. By cooperating with our active sample selection approach, it can remove up to 70% of features with no more than 3% degradation on precision and recall. Experimental results presented so far demonstrate that our method achieves remarkable improvement on precision and recall over and. By combining multiple tree classifiers, our method can find multiple nonlinear clusters of relevant objects, while only performs optimally in unimodal-gaussian cases. On the other hand, our method can train a strong classifier from small training samples, so it dramatically outperforms traditional classification methods, such as. Figure 5 presents the precision-recall curves of ARF- D, and. It demonstrates the impact of sample size on the retrieval performance of compared methods. We can see from this figure that both ARF-D and achieve noticeable gain in performance on larger scope. With more training samples from scope 80, both ARF-D and increase the probability of learning the multimodal distributions of multimedia objects. On the other hand,, which assumes unimodal-gaussian distribution for both relevant objects and irrelevant ones, does not fulfill obvious improvement in performance on larger scope. 5 Conclusion In this paper, we present an adaptive pattern discovery method, which aims to iteratively discover the distribution patterns of relevant objects using relevance feedback. To facilitate pattern discovery, we first present a dynamic feature extraction method to alleviate the curse of dimensionality for multimedia retrieval. During online learning, our dynamic feature extraction selects a feature subspace using balanced information gain. In the feature subspace, we train an online pattern classification approach termed adaptive random forests to classify multimedia objects. Our adaptive random forests adapts a composite classification method known as random forests for relevance feedback. To improve the efficiency of random forests, it employs an active sample selection method to select the most-informative samples for online learning. Extensive experimental results on a Corel image set (with 31,438 images) demonstrate that our method runs 2 to 3 times faster than the regular random forests, while achieving comparable precision and recall against the later. Moreover, our approach improves the precision and recall by at least 22% over and. As to efficiency, it runs as efficiently as, but sometimes faster than the latter. References [1] S. Brandt, J. Laaksonen, and E. Oja. Statistical shape features in content-based image retrieval. In Proceeding of ICPR, Sept. 00. [2] L. Breiman. Bagging predictors. Machine Learning, 24:123 1, [3] L. Breiman. Random forests random features. Technical Report 567, Department of Statistics, University of California,Berkeley, September [4] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, [5] M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Incremental clustering and dynamic information retrieval. In Proc. of ACM Symposium on Theory of Computing, [6] T. Elomaa and J. Rousu. General and efficient multisplitting of numerical attributes. Machine Learning, 36(3):1 49, [7] B. Enfron and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, [8] Y. Freund and R. Shapire. A decision-theoretic generalization of online learning and an application to boosting. J. of Comp. and Sys. Sci., 55(1), [9] K. Fukunaga. Introduction to Statistical Pattern Recognition. San Deigo, California: Academic Press, Inc., [] T. Joachims. Making large-scale svm learning practical. In Advances in Kernel Methods - Support Vector Learning, [11] T. M. Mitchell and T. M. Mitchell. Machine Learning. Mc- Graw Hill, [12] Y. Rui and T. Huang. Optimizing learning in image retrieval. IEEE Conf. on CVPR, June 00. [13] Z. Su, S. Li, and H. Zhang. Extraction of feature subspaces for content-based image retrieval using relevance feedback. In Proc. of ACM Multimedia, 01. [14] K. Tieu and P. Viola. Boosting image retrieval. In IEEE Conf. CVPR, June 00. [15] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. of ACM Multimedia, 01. [16] N. Vasconcelos and A. Lippman. Bayesian relevance feedback for content-based image retrieval. In Proc. IEEE Workshop on CAIVL, 00. [17] Y. Wu and A. Zhang. A feature re-weighting approach for relevance feedback in image retrieval. In Proc. IEEE Int. Conf. on Image Proc., 02. [18] Y. Yang and J. O. Pedersen. A comparative study on feature section in text categorization. In Proc. of ICML-97, [19] X. S. Zhou and T. S. Huang. Comparing discriminating transformations and svm for learning during multimedia retrieval. In Proc. of ACM Multimedia, 01.

Interactive Pattern Analysis for Relevance Feedback in Multimedia Information Retrieval

Interactive Pattern Analysis for Relevance Feedback in Multimedia Information Retrieval ACM Multimedia Systems Journal manuscript No. (will be inserted by the editor) Interactive Pattern Analysis for Relevance Feedback in Multimedia Information Retrieval Yimin Wu 1, Aidong Zhang 2 Department

More information

Adaptively Discovering Meaningful Patterns in High-Dimensional Nearest Neighbor Search

Adaptively Discovering Meaningful Patterns in High-Dimensional Nearest Neighbor Search Adaptively Discovering Meaningful Patterns in High-Dimensional Nearest Neighbor Search Yimin Wu and Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Buffalo,

More information

Feature Selection for Classifying High-Dimensional Numerical Data

Feature Selection for Classifying High-Dimensional Numerical Data Feature Selection for Classifying High-Dimensional Numerical Data Yimin Wu and Aidong Zhang Department of Computer Science and Engineering, SUNY at Buffalo, {yiminkwu,azhang}@cse.buffalo.edu Abstract Classifying

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Image retrieval based on bag of images

Image retrieval based on bag of images University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2009 Image retrieval based on bag of images Jun Zhang University of Wollongong

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Dynamic Ensemble Construction via Heuristic Optimization

Dynamic Ensemble Construction via Heuristic Optimization Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Feature Selection for Image Retrieval and Object Recognition

Feature Selection for Image Retrieval and Object Recognition Feature Selection for Image Retrieval and Object Recognition Nuno Vasconcelos et al. Statistical Visual Computing Lab ECE, UCSD Presented by Dashan Gao Scalable Discriminant Feature Selection for Image

More information

FADA: An Efficient Dimension Reduction Scheme for Image Classification

FADA: An Efficient Dimension Reduction Scheme for Image Classification Best Paper Candidate in Retrieval rack, Pacific-rim Conference on Multimedia, December 11-14, 7, Hong Kong. FADA: An Efficient Dimension Reduction Scheme for Image Classification Yijuan Lu 1, Jingsheng

More information

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query-Sensitive Similarity Measure for Content-Based Image Retrieval Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Stepwise Metric Adaptation Based on Semi-Supervised Learning for Boosting Image Retrieval Performance

Stepwise Metric Adaptation Based on Semi-Supervised Learning for Boosting Image Retrieval Performance Stepwise Metric Adaptation Based on Semi-Supervised Learning for Boosting Image Retrieval Performance Hong Chang & Dit-Yan Yeung Department of Computer Science Hong Kong University of Science and Technology

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN

ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN THE SEVENTH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2002), DEC. 2-5, 2002, SINGAPORE. ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN Bin Zhang, Catalin I Tomai,

More information

Cluster based boosting for high dimensional data

Cluster based boosting for high dimensional data Cluster based boosting for high dimensional data Rutuja Shirbhate, Dr. S. D. Babar Abstract -Data Dimensionality is crucial for learning and prediction systems. Term Curse of High Dimensionality means

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005

More information

Textural Features for Image Database Retrieval

Textural Features for Image Database Retrieval Textural Features for Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington Seattle, WA 98195-2500 {aksoy,haralick}@@isl.ee.washington.edu

More information

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIENCE, VOL.32, NO.9, SEPTEMBER 2010 Hae Jong Seo, Student Member,

More information

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Classification of Digital Photos Taken by Photographers or Home Users

Classification of Digital Photos Taken by Photographers or Home Users Classification of Digital Photos Taken by Photographers or Home Users Hanghang Tong 1, Mingjing Li 2, Hong-Jiang Zhang 2, Jingrui He 1, and Changshui Zhang 3 1 Automation Department, Tsinghua University,

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Open Access Self-Growing RBF Neural Network Approach for Semantic Image Retrieval

Open Access Self-Growing RBF Neural Network Approach for Semantic Image Retrieval Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1505-1509 1505 Open Access Self-Growing RBF Neural Networ Approach for Semantic Image Retrieval

More information

Object detection using non-redundant local Binary Patterns

Object detection using non-redundant local Binary Patterns University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2010 Object detection using non-redundant local Binary Patterns Duc Thanh

More information

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Bagging and Boosting Algorithms for Support Vector Machine Classifiers Bagging and Boosting Algorithms for Support Vector Machine Classifiers Noritaka SHIGEI and Hiromi MIYAJIMA Dept. of Electrical and Electronics Engineering, Kagoshima University 1-21-40, Korimoto, Kagoshima

More information

Image retrieval based on region shape similarity

Image retrieval based on region shape similarity Image retrieval based on region shape similarity Cheng Chang Liu Wenyin Hongjiang Zhang Microsoft Research China, 49 Zhichun Road, Beijing 8, China {wyliu, hjzhang}@microsoft.com ABSTRACT This paper presents

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens Faculty of Economics and Applied Economics Trimmed bagging a Christophe Croux, Kristel Joossens and Aurélie Lemmens DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) KBI 0721 Trimmed Bagging

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform

The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform 385 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Stepwise Nearest Neighbor Discriminant Analysis

Stepwise Nearest Neighbor Discriminant Analysis Stepwise Nearest Neighbor Discriminant Analysis Xipeng Qiu and Lide Wu Media Computing & Web Intelligence Lab Department of Computer Science and Engineering Fudan University, Shanghai, China xpqiu,ldwu@fudan.edu.cn

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Short Run length Descriptor for Image Retrieval

Short Run length Descriptor for Image Retrieval CHAPTER -6 Short Run length Descriptor for Image Retrieval 6.1 Introduction In the recent years, growth of multimedia information from various sources has increased many folds. This has created the demand

More information

A General Greedy Approximation Algorithm with Applications

A General Greedy Approximation Algorithm with Applications A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University Tracking Hao Guan( 管皓 ) School of Computer Science Fudan University 2014-09-29 Multimedia Video Audio Use your eyes Video Tracking Use your ears Audio Tracking Tracking Video Tracking Definition Given

More information

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Using the Kolmogorov-Smirnov Test for Image Segmentation

Using the Kolmogorov-Smirnov Test for Image Segmentation Using the Kolmogorov-Smirnov Test for Image Segmentation Yong Jae Lee CS395T Computational Statistics Final Project Report May 6th, 2009 I. INTRODUCTION Image segmentation is a fundamental task in computer

More information

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas Table of Contents Recognition of Facial Gestures...................................... 1 Attila Fazekas II Recognition of Facial Gestures Attila Fazekas University of Debrecen, Institute of Informatics

More information

Speed-up Multi-modal Near Duplicate Image Detection

Speed-up Multi-modal Near Duplicate Image Detection Open Journal of Applied Sciences, 2013, 3, 16-21 Published Online March 2013 (http://www.scirp.org/journal/ojapps) Speed-up Multi-modal Near Duplicate Image Detection Chunlei Yang 1,2, Jinye Peng 2, Jianping

More information

A Bayesian Approach to Hybrid Image Retrieval

A Bayesian Approach to Hybrid Image Retrieval A Bayesian Approach to Hybrid Image Retrieval Pradhee Tandon and C. V. Jawahar Center for Visual Information Technology International Institute of Information Technology Hyderabad - 500032, INDIA {pradhee@research.,jawahar@}iiit.ac.in

More information

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY 1 USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu ABSTRACT A prediction algorithm is consistent

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

A Bagging Method using Decision Trees in the Role of Base Classifiers

A Bagging Method using Decision Trees in the Role of Base Classifiers A Bagging Method using Decision Trees in the Role of Base Classifiers Kristína Machová 1, František Barčák 2, Peter Bednár 3 1 Department of Cybernetics and Artificial Intelligence, Technical University,

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA 1 HALF&HALF BAGGING AND HARD BOUNDARY POINTS Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu Technical Report 534 Statistics Department September 1998

More information

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes

CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes 1 CS 559: Machine Learning Fundamentals and Applications 10 th Set of Notes Instructor: Philippos Mordohai Webpage: www.cs.stevens.edu/~mordohai E-mail: Philippos.Mordohai@stevens.edu Office: Lieb 215

More information

ii(i,j) = k i,l j efficiently computed in a single image pass using recurrences

ii(i,j) = k i,l j efficiently computed in a single image pass using recurrences 4.2 Traditional image data structures 9 INTEGRAL IMAGE is a matrix representation that holds global image information [Viola and Jones 01] valuesii(i,j)... sums of all original image pixel-values left

More information

Comparative Study of Subspace Clustering Algorithms

Comparative Study of Subspace Clustering Algorithms Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that

More information

CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR)

CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR) 63 CHAPTER 4 SEMANTIC REGION-BASED IMAGE RETRIEVAL (SRBIR) 4.1 INTRODUCTION The Semantic Region Based Image Retrieval (SRBIR) system automatically segments the dominant foreground region and retrieves

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Combined Weak Classifiers

Combined Weak Classifiers Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

High Dimensional Indexing by Clustering

High Dimensional Indexing by Clustering Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should

More information

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual

More information

Mondrian Forests: Efficient Online Random Forests

Mondrian Forests: Efficient Online Random Forests Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation

More information

Water-Filling: A Novel Way for Image Structural Feature Extraction

Water-Filling: A Novel Way for Image Structural Feature Extraction Water-Filling: A Novel Way for Image Structural Feature Extraction Xiang Sean Zhou Yong Rui Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana Champaign,

More information

Calibrating Random Forests

Calibrating Random Forests Calibrating Random Forests Henrik Boström Informatics Research Centre University of Skövde 541 28 Skövde, Sweden henrik.bostrom@his.se Abstract When using the output of classifiers to calculate the expected

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Generic Face Alignment Using an Improved Active Shape Model

Generic Face Alignment Using an Improved Active Shape Model Generic Face Alignment Using an Improved Active Shape Model Liting Wang, Xiaoqing Ding, Chi Fang Electronic Engineering Department, Tsinghua University, Beijing, China {wanglt, dxq, fangchi} @ocrserv.ee.tsinghua.edu.cn

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Relevance Feedback for Content-Based Image Retrieval Using Support Vector Machines and Feature Selection

Relevance Feedback for Content-Based Image Retrieval Using Support Vector Machines and Feature Selection Relevance Feedback for Content-Based Image Retrieval Using Support Vector Machines and Feature Selection Apostolos Marakakis 1, Nikolaos Galatsanos 2, Aristidis Likas 3, and Andreas Stafylopatis 1 1 School

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Mondrian Forests: Efficient Online Random Forests

Mondrian Forests: Efficient Online Random Forests Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan Joint work with Daniel M. Roy and Yee Whye Teh 1 Outline Background and Motivation Mondrian Forests Randomization mechanism Online

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

Consistent Line Clusters for Building Recognition in CBIR

Consistent Line Clusters for Building Recognition in CBIR Consistent Line Clusters for Building Recognition in CBIR Yi Li and Linda G. Shapiro Department of Computer Science and Engineering University of Washington Seattle, WA 98195-250 shapiro,yi @cs.washington.edu

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Aggregated Color Descriptors for Land Use Classification

Aggregated Color Descriptors for Land Use Classification Aggregated Color Descriptors for Land Use Classification Vedran Jovanović and Vladimir Risojević Abstract In this paper we propose and evaluate aggregated color descriptors for land use classification

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information