Adaptive Pattern Discovery for Interactive Multimedia Retrieval

Size: px

Start display at page:

Download "Adaptive Pattern Discovery for Interactive Multimedia Retrieval"

Branden Stephens
5 years ago
Views:

1 Adaptive Pattern Discovery for Interactive Multimedia Retrieval Yimin Wu and Aidong Zhang Department of Computer Science and Engineering, SUNY at Buffalo, Abstract Relevance feedback has been an indispensable component for multimedia retrieval systems. In this paper, we present an adaptive pattern discovery method, which addresses relevance feedback by interactively discovering meaningful patterns of relevant objects. To facilitate pattern discovery, we first present a dynamic feature extraction method, which aims to alleviate the curse of dimensionality by extracting a feature subspace using balanced information gain. In the feature subspace, we train an online pattern classification method called adaptive random forests to classify multimedia objects as relevant or irrelevant. Our adaptive random forests adapts the traditional classification method known as random forests for relevance feedback. It improves the efficiency of pattern discovery by choosing the most-informative samples for online learning. Extensive experiments are carried out on a Corel image set (with 31,438 images) to evaluate the performance of our method as compared against the state-of-the-art approaches. 1 Introduction In multimedia retrieval, the necessity to integrate relevance feedback stems from the following reasons: The subjective nature of information retrieval. That is, different users have different opinions about whether a returned object is relevant or irrelevant. Content-based multimedia descriptors, which are normally high-dimensional feature vectors, are unable to capture complex query concepts well. To address these issues, relevance feedback has been introduced to integrate users into multimedia retrieval, where users are allowed to show their preference by labeling retrieval results as relevant or irrelevant. Using user-labeled objects as training samples, the system iteratively adapts the successive retrieval results to the user s query concept. To capture the user s subjective query concept, it is inevitable to perform online learning in multimedia retrieval. Hence, most existing methods address relevance feedback as an online learning problem, although many research efforts [16] have also been devoted to challenging it in an offline manner. To perform online learning, many researchers address the relevance feedback using machine learning and/or pattern recognition methods. For example, some researchers employ the biased discriminant analysis [19], while others apply AdaBoost algorithm [14], or support vector machine [15]. However, to well facilitate the application of machine learning and pattern recognition methods to relevance feedback, the following critical issues have to be addressed appropriately: The Curse of Dimensionality. Multimedia feature vectors are often of high dimensions. In high dimensional feature spaces, most machine learning methods can only achieve good performance with large amount of training samples. Multimodal distribution of relevant objects. For complex queries, the relevant objects are distributed multimodally (i.e., nonlinearly) in the feature space. To simplify the above problems, some approaches [12, 17] assume a unimodal-gaussian distribution for relevant objects, so they only perform optimally in unimodal/linear cases. On the other hand, some other methods can handle multimodality by using discriminative classifiers (such as support vector machine [15]). But, with the limited number of training samples from relevance feedback, traditional classifiers, such as support vector machine and decision trees [11], can not yield a strong classifier unless the user can provide more training samples (as in [15]). To learn with small training samples from relevance feedback, Tieu et al. [14] proposed to boost image retrieval with AdaBoost [8]. Unfortunately, it is computationally intractable to boost multiple discriminative classifiers (such as decision trees) online. Hence, they boost multiple Gaussian-model-based weak classifiers, each of which computes a Gaussian model to fit relevant and irrelevant images on one feature 1. When all the features are taken into account, still assumes a unimodal-gaussian distribution for either class of images. In this paper, we present an adaptive pattern discovery method (for relevance feedback), which aims to improve the performance of multimedia retrieval by discovering meaningful patterns of relevant objects. To facilitate 1 A feature means an element of the feature vector here.

2 pattern discovery, we first present a dynamic feature extraction method to alleviate the curse of dimensionality. During online learning, our method selects a feature subspace using balanced information gain [6]. The advantages of our dynamic feature extraction are: it not only removes noises from the feature space, but also improves the efficiency of the learning machine. In the feature subspace, we train a pattern classification method known as random forests [3] to classify multimedia objects (as relevant or irrelevant). As a nonparametric and nonlinear classification algorithm, random forests can handle the multimodality of multimedia objects. However, even in the feature subspace, it is still computationally intractable to train a regular random forests online. To improve the efficiency of random forests, we present an active sample selection method, which selects the most-informative samples for online learning. By combining random forests with active sample selection, we propose an online pattern classification method termed adaptive random forests, which runs 2 to 3 times faster than the regular random forests, while achieving comparable precision/recall against the latter. Extensive experiments on a Corel image set (with 31,438 images) demonstrate that our method outperforms the stateof-the-art approaches [, 14] by at least 22% on average precision and recall. The rest of this paper is organized as follows. We first coin some useful notations and briefly introduce the random forests algorithm in Section 2. Our adaptive pattern discovery method is then presented in Section 3. Empirical results are given in Section 4. Finally, we conclude in Section 5. 2 Random Forests In this section, we introduce some useful notation and the random forests algorithm. 2.1 Useful Notation As first, we represent the feature space as F = {f 1,..., f M }, where M is the dimensions of F. We then denote the multimedia database by db = {o 1,..., o I }, where I is the size of db. To represent each multimedia object o i db, we use a real-valued vector (i.e., point) o i = [o i,1,..., o i,m ] T, where o i is an instance in the feature space F. Similarly, the query q is represented by vector q = [q 1,..., q M ] T. To calculate the distance between object o i and query q, we use the Euclidean distance of their feature vectors. From the original feature space F, our dynamic feature extraction aims to learn a projection ψ : F F, where F F is a M dimensional subspace of F. With projection ψ, we can project any object o i db into F, with o i = ψ( o i) and ψ( o i ) F. During interactive multimedia retrieval, we obtain the training set S = {(s 1, v 1 )..., (s N, v N )} from relevance feedback, where N is the size of S. In the training set S, each training sample (s n, v n ) S is a labeled object represented by s n = [s n,1,..., s n,m ] T, where v n {0, 1} is its class value (0/1 means irrelevant/relevant). We then denote the set of relevant/irrelevant training samples as R/U, with S = R U. Using the training set S, we train an adaptive random forest h to classify database objects into one of the two classes: relevant or irrelevant. 2.2 Random Forests Algorithm In this section, we briefly introduce random forests [3], which is a method for growing a composite classifier from multiple tree classifiers. To obtain the composite classifier, it combines bagging [2] with random feature selection [3] and achieves favorable performance over the state-of-theart approaches (such as bagging and AdaBoost [8]). The tree classifier trained in random forests is the classification and regression tree (CART) [4]. At each node, CART searches through all features to find the best split for allocating training data to two children nodes. If the current node only contains training data of one specific class, CART makes it a leaf node. This leaf node then assigns all test data falling into it to that class. All test instances are run down the tree to get their predicted classes. As a collection of tree classifiers, the random forest h = {h j ( o, θ j ), j = 1,...J} trains its jth tree classifier h j with a bootstrap [2, 7] sample S j S and a random vector θ j, where J is the total number of tree classifiers in h. The bootstrap sample S j is obtained by randomly resampling the original training set S with replacement from S, while the random vector θ j is generated, independent of the past random vectors θ 1,..., θ j 1 but with the same distribution. When growing h j, the split on each node is chosen as the best one from M randomly selected features. As pointed out by Breiman [3], a random forest is insensitive to M and performs optimally (in general) when M = M, where M is the dimensions of the feature space F. So, random forests runs M times faster than bagging and AdaBoost for combining tree classifiers. To classify input object o, random forest h lets its member classifiers vote for the most popular class, where each classifier casts a unit vote. Breiman [3] proved that a random forest, as more trees are added, does not overfit and only produces a limited value of generalization errors. 3 Adaptive Pattern Discovery In this section, we present our adaptive pattern discovery method, which is consisted of dynamic feature extraction, active sample selection and online pattern classification.

3 3.1 Dynamic Feature Extraction The high dimensionality of multimedia feature spaces is often hindrance for relevance feedback: it not only degrades the efficiency of the learning machine, but also impedes the application of many pattern recognition methods due to the curse of dimensionality [9]. To alleviate the curse of dimensionality, some researchers extract a low-dimensional subspace using PCA (principle component analysis) [13], while others [14] employ the feature selection technique to select an optimal subspace from the original feature space. In this section, we present a dynamic feature extraction method using balanced information gain [6], which initiates from information theory and machine learning. To present our method, we first define the entropy [11] of training set S as follows: E(S) = p log 2 p p log 2 p (1) where p /p is the percentage of relevant/irrelevant training samples in S. According to information theory, the larger the E(S), the more number of bits is required to encode S. Let S i be a partition of S, the information gain [11] of partition S i is then given by: G( i S i ) = E(S) 1 S S i E(S i ) (2) In the light of Occam s razor [11], the partition maximizes the above information gain should be chosen, since it leads to the most concise representation of S. Despite its successful applications in text classification [18], information gain has the drawback of not placing any penalty on the arity of partitions, so it favors partitions with excessively large arity. To balance the bias of information gain, we use the following balanced information gain [6]: B g ( i where κ is the arity of S i. i S i ) = G( i S i ) log 2 κ, (3) Figure 1: Several useful concepts for computing balanced information gain. So far we have reduced the feature extraction problem to selecting features with maximum balanced information gain. For each feature, we compute its balanced information gain as follows [11, 6]: We first sort all samples in ascending order, and set the mean values of adjacent samples with different classification as potential cut points [11, 6] of the partition. On the feature, t-1 cut points create t continuous intervals (see Figure 1), which comprise a t-ary partition of S, since these intervals contain t non-overlapped subsets of S. In the beginning, all training samples belong to the single interval of a 1-ary partition. To obtain the t-ary partition, we then select one interval of the (t-1)-ary partition to be partitioned into 2 subintervals. This is done by greedily choosing the cut point which maximizes the balanced information gain at each step. Suppose κ is the maximum arity desired, we set the balanced information gain of current feature to the maximum value achieved by the partitions from 2 to κ-ary. After the above computation, our method outputs features with the top M largest balanced information gain, where M is an empirical threshold. The output of feature extraction is represented by an M -element sequence Z = {1 z 1 <... < z m <... < z M M}, where each element z m Z specifies an individual coordinate of the elements (from the original point o i ) that will appear in the projection o i = ψ( o i ). The computational cost T (e) of our feature extraction method is dominated by the sort operation, so we have T (e) = O(MN log N). 3.2 Active Sample Selection In the optimal feature subspace, we train a random forest to classify multimedia objects. However, during multimedia retrieval, it is computationally intractable to train a regular random forest online. To address this issue, we present an active sample selection method to improve the efficiency of random forests. Before introducing our approach, we first analyze the computational cost of random forests, which is dominated by training its member classifiers (CART). If each CART is grown to a uniform depth D (i.e., to 2 D leaf nodes), the computational cost of random forest h is [4, 3]: T (h) J N (D + 1) log N (4) where J is the number of tree classifiers and N is the size of the bootstrap sample for each tree classifier. Since T (h) increases super-linearly with N, we can reduce T (h) by ensuring N N 0, where N 0 is a threshold. However, in the regular random forests, N always (approximately) equals 0.63N, because the size of a bootstrap sample is about 63% of that of the original sample set [7]. For online learning, bootstrap lacks the flexibility to reduce the size of each bootstrap sample as necessary. To address this issue, we present an active sample selection method, which actively selects no more than N 0 training samples for each tree classifier. To achieve this, we have to discard some training samples from the bootstrap sample. Obviously, it is favorable to keep all the most-informative training samples and to discard some less-informative ones. In the training set S, the most-informative samples are given as follows:

4 Relevant samples. In relevance feedback, the number of relevant samples is often much smaller than that of irrelevant ones, so every relevant sample is precious and/or informative. Centroids of irrelevant samples. In the light of pattern recognition [9], the centroids of irrelevant clusters are dependable representatives for irrelevant samples/patterns. According to above discussion, we need to cluster irrelevant training samples to find the most-informative ones from them. To cluster irrelevant samples, we employ an incremental clustering method termed doubling algorithm [5], which takes the maximum number of clusters as input. As a centroid-preserving approach, the doubling algorithm guarantees that the centroids of resulting clusters are members of irrelevant samples. Moreover, it also provides proven performance guarantee on the quality of resulting clusters. Figure 2: An illustration of our active sample selection. Figure 2 demonstrates the principles of our active sample selection method, which can be summarized as follows: Incrementally cluster new irrelevant samples in each feedback iteration and add the resulting centroids into the most-informative irrelevant set U c. Get the most-informative sample set S c = R U c, where R is the relevant training set. Get the less-informative sample set U d = U U c. To obtain the training set S j (for the jth classifier h j ), we create a bootstrap sample S j,c from S c, and randomly choose N 0 S j,c samples from U d to form the random sample set S j,d. Finally, we use S j = S j,c Sj,d as the training set for h j. In comparison with bootstrap, our active sample selection method distinguishes between most-informative training samples and less-informative ones. To reduce the number of training samples, our method selects as many mostinformative training samples as bootstrap does, but intelligently discards some less-informative ones. The computational cost T (s) of our active sample selection is dominated by the clustering operation. Hence, we have T (s) = O(N ζ log ζ) [5], where N is the number of new irrelevant samples in each iteration and ζ is the maximum number of clusters. By combining the active sample selection with random forests, we develop an online pattern classification method called adaptive random forests. Given the optimal feature subspace F and the projection ψ : F F (cf. Section 3.1), our adaptive random forest h = {h j (ψ( o), θ j ), j = 1,...J} classifies database objects as follows: { J h 1 if ( o) = j=1 h j(ψ( o)) J 2, (5) 0 otherwise. In Formula 5, each tree h j outputs 0 or 1 as the class value of the input object, so it casts a unit vote in h. h then classifies the input object into the most popular class. 3.3 Adaptive Pattern Discovery During multimedia retrieval, we train an adaptive random forest h to classify multimedia objects, and use classifiedrelevant/irrelevant to denote objects (or set of objects) that are classified as relevant/irrelevant. From time to time, the adaptive random forest may output less than k classifiedrelevant objects, where k is the number of objects returned to the user. To address this issue, we define the relevance probability as follows: J j=1 P (1 o) = h j(ψ( o)). (6) J P (1 o) is the number of tree classifiers output o as relevant over the total number of classifiers. The larger the P (1 o), the more confident it is to output object o as relevant. So, if less than k objects are classified as relevant, our method returns some classified-irrelevant objects with the largest P (1 o) values. When applied to relevance feedback, our adaptive pattern discovery method is summarized as follows: For each query q, our method runs an initial nearest neighbor search and returns the k nearest neighbors of q to the user. It then asks the user to provide the initial training sample set S. With S, our method extracts the subspace F and the projection ψ : F F. In F, it trains an adaptive random forest h. After that, our approach projects every object into F and classifies it using h. Each object classified as relevant is added to the classified-relevant set Γ. In case Γ < k, we find k Γ classified-irrelevant objects with the largest relevance probability and add them into Γ. From Γ, our method returns the k nearest neighbors of relevant centroid to the user. Based on the latest retrieval result, the user can provide more training samples, and the system can start a new learning iteration accordingly. An alternate scheme for the above adaptive pattern discovery is to ultimately neglect the query and to rank database objects using the relevance probability (defined in Formula 6). This scheme is suitable for classification applications with sufficient training samples. However, it is unlikely to succeed in relevance feedback, for which we have to train the classifier with small number of samples especially in the initial iterations, we may have only a few relevant training samples.

4 Empirical Results 4.1 Experimental Setup The data set used in our experiments is a Corel image set with 31,438 images.

We employ all the 5,172 images from these categories as queries. For each query, the retrieval is executed on the the whole database (with 31,438 images).

Precision is the number of retrieved relevant images over the total number of retrieved images, and recall is the number of retrieved relevant images over the total number of relevant images in the

5 4 Empirical Results 4.1 Experimental Setup The data set used in our experiments is a Corel image set with 31,438 images. To evaluate retrieval performance, we choose 44 semantic categories (such as rose, butterfly, tiger, eagle, penguin and falls, etc.) with refined ground truth. We employ all the 5,172 images from these categories as queries. For each query, the retrieval is executed on the the whole database (with 31,438 images). Precision and recall are used to evaluate the retrieval performance. Precision is the number of retrieved relevant images over the total number of retrieved images, and recall is the number of retrieved relevant images over the total number of relevant images in the database. To calculate precision and recall, only those retrieved images from the same semantic category as the query are counted as relevant. The average precision and recall of all queries are used as the overall performance. In our experiments, each image is represented by the following five image features: the first one is a 64-bin color coherence vector in the HSV color space. The second one is a 9-bin color moments extracted from the L*a*b color space. The third one is a -bin wavelet-based texture feature [12]. The fourth one is a 64-bin edge coherence histogram [1]. The fifth one is a 32-bin Fourier shape descriptor[1]. We normalize feature values into the range [-1, 1] and concatenate all image features into a 179-bin feature vector. The number of nearest neighbors returned (that is, k) is often called scope. Since retrieval performance also varies with scope, we conducted experiments on scopes of and 80, respectively. For comparison, we provide the performance of the regular random forests (), support vector machine () [] and the AdaBoost-based method () [14] under the same experimental conditions. All our experiments are run on a SUN Ultra 80 with a 4MHZ CPU and 1GB memory. Based on the presented method, we have implemented a content-based image retrieval system called PicQuest. Figure 3 demonstrates the user interface of our system. Figure 3: User interface of our system. the computational cost of the random forest increases linearly with the number of trees J, we empirically minimize J, while maintaining the performance close to optimal. Table 1 demonstrates the precision achieved by random forests on the scope of when J=, and 1. Obviously, a random forest with trees achieves the nearly-optimal performance: it performs almost as good as a random forest with substantially more trees (e.g., 1 trees), and dramatically outperforms a random forest with trees. So, we set J to in all the following experiments. To test the performance of our dynamic feature extraction method, we perform experiments when the dimension of the subspace (i.e., M ) equals 179, 1 and, which correspond to employ all features, % of all features and % of them. To distinguish these three cases, we denote them by ARF-179D, ARF-1D and ARF-D, respectively. For a specific value of M, we adjust N 0 to guarantee our adaptive random forests (ARF) runs as efficiently as the state-of-the-art approaches (such as [14]). Hence, N 0 is set to the values given in Table Thresholds Table 2: Value of N 0 in difference cases. 4.3 Performance Evaluation Table 1: Precisions (%) achieved by, and 1 trees. We demonstrate how to decide appropriate values for the tree number J and sample set size N 0 in this section. Since In the following discussion, we neglect recall on the scope of, because (on smaller scopes) recall is not as effective as precision for comparing different approaches. For instance, in our experiments, the maximum achievable recall on the scope of is about 16.7%(= /1), since the

6 Echo time(seconds) The number of iterations (scope = ) 55 (a) The number of iterations (scope = ) 15 (b) Echo time(seconds) The number of iterations (scope = ) (c) The number of iterations (scope = 80) (d) The number of iterations (scope = 80) (e) The number of iterations (scope = 80) (f) Figure 4: A comparison of different methods using precision, recall and echo time. The performance of scopes and 80 are listed from the top down. In each row, the precision, recall and echo time are demonstrated from the left to the right Scope = 80 Scope = 0 90 Scope = 80 Scope = Scope = 80 Scope = (a) (b) (c) Figure 5: Precision-recall curves. (a)/(b)/(c) show the curves of ARF-D//. Table 3: Performance of ARF-1D and ARF-D on the th iteration, where P/R denotes precision/recall. average size of all semantic categories is about 1. In this case, the difference on recall of compared methods does not precisely reflect their disparity on performance. In our experiments, we run feedback iterations for each query. Figure 4 presents the performance of, ARF-179D, ARF-D, and for each iteration. Table 3 compares the performance of ARF-D and ARF- 1D for the th iteration. From Figure 4 and Table 3, we can draw the following conclusions: is too computationally intensive for relevance feedback. On the scope of 80, the echo time of can go as high as 18 seconds. In comparison with ARF methods, runs 2 to 3 times slower than the former, but only improves the precision and recall slightly by 1 3%. Among the three compared ARF methods, ARF-D is the most cost-effective one. On all scopes, it either achieves as good precision and recall as ARF- 1D/ARF-179D do, or improves the precision and recall over the latter by 1 2%. As to efficiency, it runs about 15 % faster than ARF-1D and ARF-179D in most cases. ARF methods dramatically outperform both and on retrieval performance. ARF-D improves the precision/recall over by 22 %. As to the efficiency, ARF-D runs as efficiently as on

7 the scope of 80, while improving the efficiency against by % on the scope of. can not achieve good performance with small training samples. On the scope of, the precision achieved by are severely inferior to those attained by. Only with more training samples from the scope of 80, achieves comparable precision/recall against. During online learning, our dynamic feature extraction is very successful in extracting the optimal features. By cooperating with our active sample selection approach, it can remove up to 70% of features with no more than 3% degradation on precision and recall. Experimental results presented so far demonstrate that our method achieves remarkable improvement on precision and recall over and. By combining multiple tree classifiers, our method can find multiple nonlinear clusters of relevant objects, while only performs optimally in unimodal-gaussian cases. On the other hand, our method can train a strong classifier from small training samples, so it dramatically outperforms traditional classification methods, such as. Figure 5 presents the precision-recall curves of ARF- D, and. It demonstrates the impact of sample size on the retrieval performance of compared methods. We can see from this figure that both ARF-D and achieve noticeable gain in performance on larger scope. With more training samples from scope 80, both ARF-D and increase the probability of learning the multimodal distributions of multimedia objects. On the other hand,, which assumes unimodal-gaussian distribution for both relevant objects and irrelevant ones, does not fulfill obvious improvement in performance on larger scope. 5 Conclusion In this paper, we present an adaptive pattern discovery method, which aims to iteratively discover the distribution patterns of relevant objects using relevance feedback. To facilitate pattern discovery, we first present a dynamic feature extraction method to alleviate the curse of dimensionality for multimedia retrieval. During online learning, our dynamic feature extraction selects a feature subspace using balanced information gain. In the feature subspace, we train an online pattern classification approach termed adaptive random forests to classify multimedia objects. Our adaptive random forests adapts a composite classification method known as random forests for relevance feedback. To improve the efficiency of random forests, it employs an active sample selection method to select the most-informative samples for online learning. Extensive experimental results on a Corel image set (with 31,438 images) demonstrate that our method runs 2 to 3 times faster than the regular random forests, while achieving comparable precision and recall against the later. Moreover, our approach improves the precision and recall by at least 22% over and. As to efficiency, it runs as efficiently as, but sometimes faster than the latter. References [1] S. Brandt, J. Laaksonen, and E. Oja. Statistical shape features in content-based image retrieval. In Proceeding of ICPR, Sept. 00. [2] L. Breiman. Bagging predictors. Machine Learning, 24:123 1, [3] L. Breiman. Random forests random features. Technical Report 567, Department of Statistics, University of California,Berkeley, September [4] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth International Group, [5] M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Incremental clustering and dynamic information retrieval. In Proc. of ACM Symposium on Theory of Computing, [6] T. Elomaa and J. Rousu. General and efficient multisplitting of numerical attributes. Machine Learning, 36(3):1 49, [7] B. Enfron and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, [8] Y. Freund and R. Shapire. A decision-theoretic generalization of online learning and an application to boosting. J. of Comp. and Sys. Sci., 55(1), [9] K. Fukunaga. Introduction to Statistical Pattern Recognition. San Deigo, California: Academic Press, Inc., [] T. Joachims. Making large-scale svm learning practical. In Advances in Kernel Methods - Support Vector Learning, [11] T. M. Mitchell and T. M. Mitchell. Machine Learning. Mc- Graw Hill, [12] Y. Rui and T. Huang. Optimizing learning in image retrieval. IEEE Conf. on CVPR, June 00. [13] Z. Su, S. Li, and H. Zhang. Extraction of feature subspaces for content-based image retrieval using relevance feedback. In Proc. of ACM Multimedia, 01. [14] K. Tieu and P. Viola. Boosting image retrieval. In IEEE Conf. CVPR, June 00. [15] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proc. of ACM Multimedia, 01. [16] N. Vasconcelos and A. Lippman. Bayesian relevance feedback for content-based image retrieval. In Proc. IEEE Workshop on CAIVL, 00. [17] Y. Wu and A. Zhang. A feature re-weighting approach for relevance feedback in image retrieval. In Proc. IEEE Int. Conf. on Image Proc., 02. [18] Y. Yang and J. O. Pedersen. A comparative study on feature section in text categorization. In Proc. of ICML-97, [19] X. S. Zhou and T. S. Huang. Comparing discriminating transformations and svm for learning during multimedia retrieval. In Proc. of ACM Multimedia, 01.

Interactive Pattern Analysis for Relevance Feedback in Multimedia Information Retrieval

ACM Multimedia Systems Journal manuscript No. (will be inserted by the editor) Interactive Pattern Analysis for Relevance Feedback in Multimedia Information Retrieval Yimin Wu 1, Aidong Zhang 2 Department