Harris-SIFT Descriptor for Video Event Detection based on a Machine Learning Approach

Size: px
Start display at page:

Download "Harris-SIFT Descriptor for Video Event Detection based on a Machine Learning Approach"

Transcription

1 Harris-SIFT Descriptor for Video Event Detection based on a Machine Learning Approach Guillermo Cámara-Chávez Computer Science Department Federal University of Ouro Preto Campus Universitário - Morro do Cruzeiro Ouro Preto/MG - Brazil guillermo@iceb.ufop.br Arnaldo de Albuquerque Araújo Computer Science Deparment Federal University of Minas Gerais Av. Antônio Carlos 6627 Belo Horizonte/MG- Brazil arnaldo@dcc.ufmg.br Abstract Video data is becoming increasingly important in many commercial and scientific areas with the advent of applications such as digital broadcasting, video-conferencing and multimedia processing tools, and with the development of the hardware and communications infrastructure necessary to support visual applications. The objective of this work is to propose a method for event detection in a video stream. We combine Harris-SIFT descriptor with motion information in order to detect human actions in video. We tested our method in KTH database and compared it to spacetime interest points (STIP) descriptor. The results obtained achieved similar results to the STIP method. 1. Introduction In the recent years, digital video has been growing rapidly. Nowadays, most digital cameras and many cell phones can record videos and consumers are recording and uploading their videos online. As a consequence, many video sharing sites like YouTube, Google Video, etc., have grown. YouTube had over 6 million videos in August 2006 [6] and has grown to over 90 millon in December 2008 [5]. Unfortunately, these video search engines (YouTube, Google Video, etc.) often rely on just a filename and text metadata in the form of closed captions. This results in a disappointing performance, as quite often the visual content is not mentioned, or properly described by the associated text. Due to the limitations of text search, we would like to search based on the automated recognition of the visual events in the video. Many video search engines that analyze visual information only process the key-frames in the video [15, 1]. Events are long-term temporal objects, which usually extend over tens or hundreds of frames. Polana and Nelson [16] made a classification of temporal events, dividing them into three groups: (i) temporal textures which are of indefinite spatial and temporal extend (e.g., flowing water), (ii) activities which are temporally periodic but spatial restricted (e.g., walking, running) and (iii) motion events which are isolated events that are not periodic, i.e., they do not repeat either in space or in time (e.g., smiling). In this paper, we refer to temporal events as the activities. Thus, our objective is to automatically detect some events that occur in a video based on local features. Methods based on local features or interest points have shown a promising result in action recognition [19, 22]. This approach uses only several relevant parts of a whole spatiotemporal volume avoiding less informative regions like stationary backgrounds. The most straight-forward way to detect interesting regions consists in extending 2D interest point detection algorithm (e.g., methods presented in [18]) to 3D video analysis. Laptev [8] extended 2D Harris corner detector to a 3D Harris detector. This extended feature detector find regions with high intensity variations in both spatial and temporal dimensions. A certain limitation of this descriptor is the small amount of detections which is insufficient for most part based classifiers [22]. Dollar et al. [4] improved the 3D Harris detector relaxing its constraints. Their detector applies Gabor filtering to the temporal domain only and selects regions which give high responses. Eli Shechtman and Michal Irani [20] proposed a method that extends the notion of traditional 2D image correlation into a 3D space-time video-template correlation. Many methods explore the temporal dimension in order to obtain a better performance. Based on this, we decided to use the Harris-SIFT [27] descriptor, which is a 2D feature, complemented with motion vectors computed through the phase correlation method. In the literature, most of the existing frameworks in

2 Figure 1. Our proposed model for event detection. video event detection are conducted in a two-step procedure. In the first step, representative features are extracted. The second step is called the decision making process. Knowledge-based approaches typically combine the output of different media descriptors into rule-based classifiers. Some statistical approaches were used to detect events like C4.5 decision trees [21], support vector machines [14], etc. to improve the framework robustness. There also exist clustering techniques (k-means [25], Linde-Buzo-Gray algorithm [11], etc.) used for constructing a middle-level visual vocabulary where each visual word is formed by a group of local descriptors. Each visual word is considered as a robust and denoised visual term for describing images. This method is known as bag-of-features (BoF) and was used to deal with the problem of object/event detection [23, 17]. We adopted a clustering method for constructing the visual vocabulary, then a conventional machine learning algorithm is applied to classify the word frequency vector created from the vocabulary. 2. Machine learning approach In Fig. 1, we show our proposed model, which consists of two principal steps: the training and the testing steps. The training step is divided in three phases: detection and feature extraction of keypoints of interest within motion regions, video codebook generation and classification of codewords. The testing step is also composed of three steps: detection and feature extraction of keypoints of interest within motion regions, compute of the bag-of-words and event detection. In the detection and feature extraction of keypoints phase, we detect the keypoints of interest that are within a region with motion. Motion regions of currect frame f t are detected by a simple difference of previous frame f t 1 and next frame f t+1. Pixels with differences greater than a certain threshold are considered as part of motion regions. The Scale-Invariant Feature Detection (SIFT) [10] method is used for detecting the keypoints. The number of keypoint is refined by selecting those SIFT features with salient corner in the neighborhoods. We use the Harris corner detector for computing the corner response in original image, this method is known as Harris-SIFT method [27]. In order to have temporal information, we use the phase correlation method for detecting the motion vectors of each keypoint. A block of 8 8 pixels around each keypoint in frame f t 1 is searched in a corresponding block of pixels in frame f t+1. Histograms of velocity vectors are also generated in this phase. Another descriptor we use is the entropy of the phase correlation between the blocks, in order to determine the similarity between the blocks. After calculating all features, we execute the next phase which consists in clustering the features. The Linde-Buzo- Gray (LBG) clustering algorithm [9] is used to form an appearance codebook. A location codebook is formed from clustering of spatiotemporal locations of the interest points. The interest point features are then vector quantised into codewords according to the codebooks and each video sample is eventually represented as an occurrence histogram of codewords. Occurrence histograms reflect how many features points are assigned to each cluster. We use these histograms to train the SVM classifier [2]. In the test step, we evaluate the accuracy of our method. We enter with unknown video sequence and extract the same features used in the previous step. We calculate the occurrence histogram using the centroids found in the clustering process. Finally, the histogram is used as the input for our SVM classifier. 3. Local descriptors We use the algorithm presented in [27] which is an improvement of SIFT descriptor. Harris-SIFT detector selects those SIFT features with salient corner in the neighborhoods, improving the salience of the SIFT feature and at the same time reducing the time complexity SIFT descriptor This descriptor consists of four major stages: (1) scalespace extrema detection, (2) keypoint location, (3) orientation assignment and (4) keypoint descriptor [10]. The scale-space extrema detection searches over all scales and image locations. In the keypoint location stage, at each candidate location, a detailed model is fit to determine location, scale, and contrast. The orientation assignment provides

3 one or more orientations are assigned to each keypoint location based on local image properties, providing invariance to orientation and scale. And finally, the keypoint descriptor stage computes image gradients which are measured at the selected scale in the region around each keypoint. However, SIFT method suffers from three problems [27]: (1) it takes much time due to the complexity for computing the SIFT features; (2) longer detection time due to the large amount of features; and (3) SIFT feature set is not salient enough because it can not accurately localize the corners Harris-SIFT descriptor This descriptor checks and keeps those SIFT features with salient corner in the neighborhood in order to improve the distinctiveness of the feature set. Keypoints found by SIFT detector that are not near a corner are discarded, speeding up the features detection. Harris-SIFT executes the first stage (scale-space extrema detection) of the SIFT detector. At this point, it is possible to compute the corner response in original image. SIFT feature can not precisely detect corner located points due to scale-space smoothing and pixel error. Mikolajczyk [13] compared different feature detectors and found that Difference of Gaussian is only an approximation to Laplacian of Gaussian, and has certain pixel errors in the feature position. In order to avoid this problem, we have to examine the neighborhood of SIFT feature finding out whether there is a corner next to SIFT feature. Here, we apply Harris corner detector [7] defined by formulas (1) and (2) [ µ(x, σ I, σ S ) = g(σ I ) L 2 x(x, σ S ) ] L x L y (x, σ S ) L x L y (x, σ S ) L 2 y(x, σ S ) (1) cornerness = det(µ(x, σ I, σ S )) α trace 2 (µ(x, σ I, σ S )) (2) where σ I is the integration scale, σ S is the scale of the SIFT feature, and L α is the derivative computed in α direction. If the maximum corner response within the neighborhood is greater than a threshold, the feature is supposed to be salient, and later a SIFT descriptor is generated. 4. Motion detection The phase-correlation method (PCM) [26] measures the motion directly from the phase correlation map (shift in the spatial domain is reflected as a phase change in the spectrum domain). This method is based on block matching: for each block r in frame f t 1 it is sought the best match in the neighborhood around the corresponding block in frame f t+1. In this work, we chose a block size of 8 8 pixels f t 1 f t+1 Figure 2. Block matching of phase correlation. (the same size of SIFT block). In Figure 2, a block of 8 8 pixels in frame f t 1 is searched in the corresponding neighborhood of pixels in frame f t+1 in order to find the correlation coefficients and offset. The PCM for one block is defined as: ρ(r t ) = F T 1 { r t 1 (ω) r t+1 (ω)} rt 1 (ω) 2 dω r t+1 (ω) 2 dω where r t is the spatial coordinate vector and ω is the spatial frequency coordinate vector, r t 1 (ω) denotes the Fourier transform of block r t 1, F T 1 denotes the inverse Fourier transform, {} is the complex conjugate and ρ(.) is the block of correlation coefficients. We use the entropy E r of the block r as the goodnessof-fit measure for each block. The entropy give us global information of the block. Thus for each keypoint detected by Harris-SIFT in frame f t, we use corresponding blocks in frame f t 1 and f t+1 in order to obtain first a similarity measure based on entropy and the corresponding histograms of motion vectors using for that the phase correlation method. 5. Codebook generation After calculating the local features a process of clustering, Bag-of-features (BoF) method [12, 3], is executed. Based in the clusters formed (the clusters are referred to as codebook and a single cluster is referred to as visual word), occurrence histograms are generated and a classifier is trained on these new features (histograms). Occurrence histograms reflect how many feature points are assigned to each cluster. The Linde-Buzo-Gray (LBG) clustering algorithm [9] is usually used to design a codebook for encoding images in the vector quantization. In each iteration of this algorithm, we must search the full codebook in order to assign the training vectors to their corresponding codewords. The steps of the LGB algorithm for the design of a N vector codebook are straightforward and intuitive. Starting with large training set (much larger than N), one first (3)

4 selects N initial codevectors. Initial codevectors can be selected randomly from the training set. There are two basic steps in the algorithm: enconding of the training vectors and computing of the centroids. To begin, we first encode all the training vector using the initial codebook. This process assigns a subset of the training vectors to each cell defined by the initial code vectors. Next, the centroid is computed for each cell. The centroids are then used to form an updated codebook. The process the repeats iteratively involving a recoding of the training vectors and a new computation of the centroids, to update de codebook. Ideally, at each iteration, the average distortion is reduced until convergence. Once descriptors have been assigned to clusters to form feature vectors, we reduce the problem of generic visual categorization to that of multi-class supervised learning, with as many classes as defined visual categories. The categorizer performs two separate steps in order to predict the classes of unlabeled images: training and testing. During training, labeled data is sent to the classifier and used to adapt a statistical decision procedure for distinguishing categories. We use the SVM classifier since is it often known to produce state-of-the-art results in high dimensional problems. 6. Support vector machine (SVM) There are some learning approaches that use SVM as classifier. Recently, Schuldt et al. [19], and Wong Shu- Fai and R. Cipolla [22] transformed video event detection into a multi-class categorization. Schuldt et al. [19] use a SVM to classify a 3D Harris detector. Wong Shu-Fai and R. Cipolla [22] tested different classifiers: quantised feature vector (generates a codebook using k-mean clustering), probabilistic latent semantic analysis (unsupervised method), SVM and nearest neighbour classifier. Among these classifiers, SVM outperforms of all the other classifiers. The SVM has been developed as a robust tool for classification and regression in noisy and complex domains as multimedia retrieval [24]. SVM can be used to extract valuable information from data sets and construct fast classification algorithms for massive data. Another important characteristic of the SVM classifier is to allow a non-linear classification without requiring explicitly a non-linear algorithm thanks to kernel theory. Common kernel functions are: linear, polynomial, Gaussian radial basis and triangular kernel. Each kernel function results in a different type of decision boundary. 7. Results The human activity data set (KHT) used in the experiments was collected by Schuldt et al. [19]. There are totally Figure 3. Examples of KHT dataset [19]. six human activities including boxing, hand clapping, hand waving, jogging, running and walking (see Figure 3). There are 25 subjects engaged in the above activities in four different scenarios including indoor, outdoor, changes in clothing and variations in scale. Each video sample contains one subject engaged in a single activity in a certain condition. Even though the KHT data set was taken over homogeneous backgrounds with a static camera, there is some noise embedded in it. The image sequences appear, in some parts, out of focus (blurred images) due to camera movement (zoom, pan). In one walk video the actor runs in part of the video instead of walking. We compared our method against the descritor proposed by Laptev [8] (STIP). We used the implementation of Laptev available on http: // Laptev/download/stip-1.0-winlinux.zip. The KHT data set was divided in two parts of 50%. One part is used for training and the second one for testing Experiments We conducted numerous experiments that provide interesting and meaningful results. We tried out different sizes of codebook: 50, 100, 200 and 300 code words (clusters). For the features used, we worked with the default parameters of STIP descriptor. In the case of our proposed method, we first find the mask with the pixels that present motion in consecutive frame. We compute the motion of the current frame through the difference between previous frame and next frame. If the difference is over 20, we consider it as a pixel with motion. We also eliminate very small motion regions using an erode filter. Then SIFT keypoints are detected inside the motion regions. We also compute the corners using the Harris detector. We only consider the SIFT keypoints that have a corner response inside a predefined neighbourhood (16 16 pixels around the keypoint). In Tables 1, 2 and 3, we show the confusion matrix using the Harris-SIFT, Harris-SIFT with motion information (PCM), and STIP descriptors. We can see that Harris-SIFT detector performed a little bit worst than Harris-SIFT with

5 walk run jog box hclp hwav walk run jog box hclp hwav Table 1. Harris-SIFT confusion matrix. walk run jog box hclp hwav walk run jog box hclp hwav Table 2. Harris-SIFT + motion confusion matrix. motion vectors. Now, comparing the results of our proposed model with the well known STIP descriptor, we can find that our method is as good as STIP detector. In some cases it performed better than STIP. For the most part, confusion occurs between jogging and running sequences as well as between boxing and hand clapping sequences. We observed similar structure for all other methods as well. Now, evaluating the performance of Harris-SIFT against STIP, we can say that our descriptor works as well as STIP descriptor. The confusion between box and handclapping event is due to the small portion with motion (only hand motion). In our experiments, we set the same motion detection parameter. Adapting the parameter to the percentage of motion we could detect more points of interest. With this gain of information we could get better results. In Tables 4 and 5, we present the performance of Harris- SIFT with motion information (PCM) detector and the STIP detector. Here, we tested using four size of codebook. The performance is measure in precision and recall statistics. correct correct + missed These statistics are defined as: recall = correct and precision = correct + false. A good detector should have high precision and high recall. We can see that our detector performed better than STIP when the size of the code book was small. As the number of points of interest of STIP is small, reducing even more with a small code book, the performance showed to be even lower than using a bigger code book. In our method, we did not have this behavior since the number of points of interest were bigger than STIP. The difference in performance for small code book sizes and bigger ones was little. Thus, our method could work with smaller size of code books. walk run jog box hclp hwav walk run jog box hclp hwav Table 3. STIP confusion matrix. 50 clusters HarrisSIFT + Motion STIP Prec Recall Prec Recall walk run jog box hclp hwav Table 4. Comparison in precision/recall of Harris-SIFT against STIP using 50 clusters in the codebook. 8. Conclusion This paper considered event detection from a supervised classification perspective. We proposed a method that combines a 2D descriptor (Harris-SIFT) and motion vector histogram and correlation between corresponding block in previous frame and next frame for detecting the similarity in the current frame. We evaluated the entropy of each block in order to determine a similarity measure. Our method worked as well with low number of clusters as with large number of clusters. Thus, we can reduce the time complexity computing a small number of visual code words. The next step is to test our method with real life videos and see how it performs with multiple motion and dynamic background. Parameters for detecting the portion of motion must be set according to the type of event Acknowledgments The authors are grateful to CNPq, CAPES and FAPEMIG for the financial support of this work. References [1] G. Cámara-Chávez, F. Precioso, M. Cord, S. Phillip-Foliguet, and A. de Albuquerque Araújo. An interactive video contentbased retrieval system. In 15th International Conference on Systems, Signals and Image Processing, 2008, IWSSIP 2008, pages , Bratislava, June 2008.

6 100 clusters 200 clusters 300 clusters HarrisSIFT + Motion STIP HarrisSIFT + Motion STIP HarrisSIFT + Motion STIP Prec Recall Prec Recall Prec Recall Prec Recall Prec Recall Prec Recall walk run jog box hclp hwav Table 5. Comparison in precision/recall of Harris-SIFT against STIP using 100, 200 and 300 clusters in the codebook. [2] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3): , [3] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1 22, [4] P. Dollar, V. Rabaud, G. W. Cottrell, and B. Serge. Behavior recognition via sparse spatio-temporal features. In Joint IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), [5] D. Frommer. Youtube hits 100 millon u.s. viewers, hulu surges. Silicon Alley Insider. Digital Business, December million-viewers-hulu-surges. [6] L. Gomes. Will all of us get our 15 minutes on a youtube video? The WallStreet Journal, [7] C. Harris and M. Stephens. A combined corner and edge detector. In Vision Conference, pages , [8] I. Laptev. On space-time interest points. International Journal of Computer Vision (IJCV), 63(2-3): , [9] Y. Linde, A. Buzo, and R. Gray. An algorithm fo vector quantizar design. IEEE Transactions on Communications, 28:84 94, [10] D. Lowe. Distintive image features from scale-invariant keypoints. In IJCV, volume 60, pages , [11] Z. Lu, H. Lou, and J. Pan. 3d model retrieval based on vector quantisation index histograms. Journal of Physics: Conference Series, 48: , [12] M. Marszaek and C. Schmid. Spatial weighting for bag-offeatures. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages , [13] K. Mikolajzyk and C. Schmid. Indexing based on scale invariant interest points. In International Conference on Computer Vision, pages , [14] C. Min, C. Shu-Ching, S. Mei-Ling, and K. Wickramaratna. Semantic event detection via multimodal data mining. IEEE Signal Processing Magazine, 23(2):38 46, March [15] M. Pickering, S. Rger, and D. Sinclair. Video retrieval by feature learning in key frames. In International Conference on Image and Video Retrieval, [16] R. Polana and R. Nelson. Recognition of motion from temporal texture. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1992), pages , Champaign, IL, USA, Jun [17] P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez, and T. Tuytelaars. A thousand words in a scene. IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(9): , [18] C. Schmid, R. Mohr, and C. Bauckhage. Evaluation of interesting point detectors. International Journal of Computer Vision (IJCV), 37(2): , [19] C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: a local svm approach. In 17th International Conference on Pattern Recognition 2004 (ICPR2004), volume 3, pages 32 36, Aug [20] E. Shechtman and M. Irani. Space-time behavior-based correlation. IEEE Trans. on Pattern Analysis and Machine Inteelligence, 29(11): , [21] C. Shu-Ching, S. Mei-Ling, C. Min, and Z. Chengcui. A decision tree-based multimodal data mining framework for soccer goal detection. In IEEE International Conference on- Multimedia and Expo (ICME 2004), volume 1, pages , June, [22] W. Shu-Fai and R. Cipolla. Extracting spatiotemporal interest points using global information. In IEEE 11th International Conference on Computer Vision, 2007 (ICCV2007), pages 1 8, Rio de Janeiro, Oct [23] J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In 9th IEEE International Conference on Computer Vision (ICCV 2003), volume 2, pages , [24] S. Tong. Active Learning: Theory and Applications. PhD thesis, Stanford University, [25] F. Wang, Y.-G. Jiang, and C.-W. Ngo. Video event detection using motion relativity and visual relatedness. In MM 08: Proceeding of the 16th ACM international conference on Multimedia, pages , New York, NY, USA, ACM. [26] J. Z. Wang. Methodological review - wavelets and imaging informatics : A review of the literature. Journal of Biomedical Informatics, pages , July Avaliable on [27] N. Xu and W. dong Chen. A high real-time and robust object recognition and localization algorithm. China Journal of Image and Graphics, submitted 10/2007.

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION Maral Mesmakhosroshahi, Joohee Kim Department of Electrical and Computer Engineering Illinois Institute

More information

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada Spatio-Temporal Salient Features Amir H. Shabani Vision and Image Processing Lab., University of Waterloo, ON CRV Tutorial day- May 30, 2010 Ottawa, Canada 1 Applications Automated surveillance for scene

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it What is an Event? Dictionary.com definition: something that occurs in a certain place during a particular

More information

Action Recognition & Categories via Spatial-Temporal Features

Action Recognition & Categories via Spatial-Temporal Features Action Recognition & Categories via Spatial-Temporal Features 华俊豪, 11331007 huajh7@gmail.com 2014/4/9 Talk at Image & Video Analysis taught by Huimin Yu. Outline Introduction Frameworks Feature extraction

More information

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE. Project Plan

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE. Project Plan FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE Project Plan Structured Object Recognition for Content Based Image Retrieval Supervisors: Dr. Antonio Robles Kelly Dr. Jun

More information

Incremental Action Recognition Using Feature-Tree

Incremental Action Recognition Using Feature-Tree Incremental Action Recognition Using Feature-Tree Kishore K Reddy Computer Vision Lab University of Central Florida kreddy@cs.ucf.edu Jingen Liu Computer Vision Lab University of Central Florida liujg@cs.ucf.edu

More information

Artistic ideation based on computer vision methods

Artistic ideation based on computer vision methods Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,

More information

Patch-based Object Recognition. Basic Idea

Patch-based Object Recognition. Basic Idea Patch-based Object Recognition 1! Basic Idea Determine interest points in image Determine local image properties around interest points Use local image properties for object classification Example: Interest

More information

Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task CPPP/UFMS at ImageCLEF 2014: Robot Vision Task Rodrigo de Carvalho Gomes, Lucas Correia Ribas, Amaury Antônio de Castro Junior, Wesley Nunes Gonçalves Federal University of Mato Grosso do Sul - Ponta Porã

More information

Lecture 18: Human Motion Recognition

Lecture 18: Human Motion Recognition Lecture 18: Human Motion Recognition Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Introduction Motion classification using template matching Motion classification i using spatio

More information

MoSIFT: Recognizing Human Actions in Surveillance Videos

MoSIFT: Recognizing Human Actions in Surveillance Videos MoSIFT: Recognizing Human Actions in Surveillance Videos CMU-CS-09-161 Ming-yu Chen and Alex Hauptmann School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 September 24, 2009 Copyright

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Comparing Local Feature Descriptors in plsa-based Image Models

Comparing Local Feature Descriptors in plsa-based Image Models Comparing Local Feature Descriptors in plsa-based Image Models Eva Hörster 1,ThomasGreif 1, Rainer Lienhart 1, and Malcolm Slaney 2 1 Multimedia Computing Lab, University of Augsburg, Germany {hoerster,lienhart}@informatik.uni-augsburg.de

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features

Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features Fast Realistic Multi-Action Recognition using Mined Dense Spatio-temporal Features Andrew Gilbert, John Illingworth and Richard Bowden CVSSP, University of Surrey, Guildford, Surrey GU2 7XH United Kingdom

More information

Person Action Recognition/Detection

Person Action Recognition/Detection Person Action Recognition/Detection Fabrício Ceschin Visão Computacional Prof. David Menotti Departamento de Informática - Universidade Federal do Paraná 1 In object recognition: is there a chair in the

More information

Action Recognition using Randomised Ferns

Action Recognition using Randomised Ferns Action Recognition using Randomised Ferns Olusegun Oshin Andrew Gilbert John Illingworth Richard Bowden Centre for Vision, Speech and Signal Processing, University of Surrey Guildford, Surrey United Kingdom

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Video Google: A Text Retrieval Approach to Object Matching in Videos

Video Google: A Text Retrieval Approach to Object Matching in Videos Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman Visual Geometry Group University of Oxford The vision Enable video, e.g. a feature

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town Recap: Smoothing with a Gaussian Computer Vision Computer Science Tripos Part II Dr Christopher Town Recall: parameter σ is the scale / width / spread of the Gaussian kernel, and controls the amount of

More information

TEXTURE CLASSIFICATION METHODS: A REVIEW

TEXTURE CLASSIFICATION METHODS: A REVIEW TEXTURE CLASSIFICATION METHODS: A REVIEW Ms. Sonal B. Bhandare Prof. Dr. S. M. Kamalapur M.E. Student Associate Professor Deparment of Computer Engineering, Deparment of Computer Engineering, K. K. Wagh

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

Spatio-temporal Feature Classifier

Spatio-temporal Feature Classifier Spatio-temporal Feature Classifier Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 1-7 1 Open Access Yun Wang 1,* and Suxing Liu 2 1 School

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Velocity adaptation of space-time interest points

Velocity adaptation of space-time interest points Velocity adaptation of space-time interest points Ivan Laptev and Tony Lindeberg Computational Vision and Active Perception Laboratory (CVAP) Dept. of Numerical Analysis and Computer Science KTH, SE-1

More information

Local invariant features

Local invariant features Local invariant features Tuesday, Oct 28 Kristen Grauman UT-Austin Today Some more Pset 2 results Pset 2 returned, pick up solutions Pset 3 is posted, due 11/11 Local invariant features Detection of interest

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Evaluation of local descriptors for action recognition in videos

Evaluation of local descriptors for action recognition in videos Evaluation of local descriptors for action recognition in videos Piotr Bilinski and Francois Bremond INRIA Sophia Antipolis - PULSAR group 2004 route des Lucioles - BP 93 06902 Sophia Antipolis Cedex,

More information

Learning Human Actions with an Adaptive Codebook

Learning Human Actions with an Adaptive Codebook Learning Human Actions with an Adaptive Codebook Yu Kong, Xiaoqin Zhang, Weiming Hu and Yunde Jia Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France

More information

Generic object recognition using graph embedding into a vector space

Generic object recognition using graph embedding into a vector space American Journal of Software Engineering and Applications 2013 ; 2(1) : 13-18 Published online February 20, 2013 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j. ajsea.20130201.13 Generic

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

A Probabilistic Framework for Recognizing Similar Actions using Spatio-Temporal Features

A Probabilistic Framework for Recognizing Similar Actions using Spatio-Temporal Features A Probabilistic Framework for Recognizing Similar Actions using Spatio-Temporal Features Alonso Patron-Perez, Ian Reid Department of Engineering Science, University of Oxford OX1 3PJ, Oxford, UK {alonso,ian}@robots.ox.ac.uk

More information

A Novel Algorithm for Color Image matching using Wavelet-SIFT

A Novel Algorithm for Color Image matching using Wavelet-SIFT International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 A Novel Algorithm for Color Image matching using Wavelet-SIFT Mupuri Prasanth Babu *, P. Ravi Shankar **

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Feature Based Registration - Image Alignment

Feature Based Registration - Image Alignment Feature Based Registration - Image Alignment Image Registration Image registration is the process of estimating an optimal transformation between two or more images. Many slides from Alexei Efros http://graphics.cs.cmu.edu/courses/15-463/2007_fall/463.html

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

Spatial-Temporal correlatons for unsupervised action classification

Spatial-Temporal correlatons for unsupervised action classification Spatial-Temporal correlatons for unsupervised action classification Silvio Savarese 1, Andrey DelPozo 2, Juan Carlos Niebles 3,4, Li Fei-Fei 3 1 Beckman Institute, University of Illinois at Urbana Champaign,

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Evaluation and comparison of interest points/regions

Evaluation and comparison of interest points/regions Introduction Evaluation and comparison of interest points/regions Quantitative evaluation of interest point/region detectors points / regions at the same relative location and area Repeatability rate :

More information

The SIFT (Scale Invariant Feature

The SIFT (Scale Invariant Feature The SIFT (Scale Invariant Feature Transform) Detector and Descriptor developed by David Lowe University of British Columbia Initial paper ICCV 1999 Newer journal paper IJCV 2004 Review: Matt Brown s Canonical

More information

Learning realistic human actions from movies

Learning realistic human actions from movies Learning realistic human actions from movies Ivan Laptev, Marcin Marszalek, Cordelia Schmid, Benjamin Rozenfeld CVPR 2008 Presented by Piotr Mirowski Courant Institute, NYU Advanced Vision class, November

More information

Action Recognition with HOG-OF Features

Action Recognition with HOG-OF Features Action Recognition with HOG-OF Features Florian Baumann Institut für Informationsverarbeitung, Leibniz Universität Hannover, {last name}@tnt.uni-hannover.de Abstract. In this paper a simple and efficient

More information

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Efficient Kernels for Identifying Unbounded-Order Spatial Features Efficient Kernels for Identifying Unbounded-Order Spatial Features Yimeng Zhang Carnegie Mellon University yimengz@andrew.cmu.edu Tsuhan Chen Cornell University tsuhan@ece.cornell.edu Abstract Higher order

More information

Motion illusion, rotating snakes

Motion illusion, rotating snakes Motion illusion, rotating snakes Local features: main components 1) Detection: Find a set of distinctive key points. 2) Description: Extract feature descriptor around each interest point as vector. x 1

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Patch Descriptors. EE/CSE 576 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1 Feature Detection Raul Queiroz Feitosa 3/30/2017 Feature Detection 1 Objetive This chapter discusses the correspondence problem and presents approaches to solve it. 3/30/2017 Feature Detection 2 Outline

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

SIFT - scale-invariant feature transform Konrad Schindler

SIFT - scale-invariant feature transform Konrad Schindler SIFT - scale-invariant feature transform Konrad Schindler Institute of Geodesy and Photogrammetry Invariant interest points Goal match points between images with very different scale, orientation, projective

More information

Spatiotemporal saliency for event detection and representation in the 3D Wavelet Domain: Potential in human action recognition

Spatiotemporal saliency for event detection and representation in the 3D Wavelet Domain: Potential in human action recognition Spatiotemporal saliency for event detection and representation in the 3D Wavelet Domain: Potential in human action recognition Konstantinos Rapantzikos, Yannis Avrithis and Stefanos Kollias School of Electrical

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Outline 7/2/201011/6/

Outline 7/2/201011/6/ Outline Pattern recognition in computer vision Background on the development of SIFT SIFT algorithm and some of its variations Computational considerations (SURF) Potential improvement Summary 01 2 Pattern

More information

Semantic-based image analysis with the goal of assisting artistic creation

Semantic-based image analysis with the goal of assisting artistic creation Semantic-based image analysis with the goal of assisting artistic creation Pilar Rosado 1, Ferran Reverter 2, Eva Figueras 1, and Miquel Planas 1 1 Fine Arts Faculty, University of Barcelona, Spain, pilarrosado@ub.edu,

More information

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014 SIFT SIFT: Scale Invariant Feature Transform; transform image

More information

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection

More information

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi hrazvi@stanford.edu 1 Introduction: We present a method for discovering visual hierarchy in a set of images. Automatically grouping

More information

Idle Object Detection in Video for Banking ATM Applications

Idle Object Detection in Video for Banking ATM Applications Research Journal of Applied Sciences, Engineering and Technology 4(24): 5350-5356, 2012 ISSN: 2040-7467 Maxwell Scientific Organization, 2012 Submitted: March 18, 2012 Accepted: April 06, 2012 Published:

More information

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882

Building a Panorama. Matching features. Matching with Features. How do we build a panorama? Computational Photography, 6.882 Matching features Building a Panorama Computational Photography, 6.88 Prof. Bill Freeman April 11, 006 Image and shape descriptors: Harris corner detectors and SIFT features. Suggested readings: Mikolajczyk

More information

Object Detection Using Segmented Images

Object Detection Using Segmented Images Object Detection Using Segmented Images Naran Bayanbat Stanford University Palo Alto, CA naranb@stanford.edu Jason Chen Stanford University Palo Alto, CA jasonch@stanford.edu Abstract Object detection

More information

Computer Vision for HCI. Topics of This Lecture

Computer Vision for HCI. Topics of This Lecture Computer Vision for HCI Interest Points Topics of This Lecture Local Invariant Features Motivation Requirements, Invariances Keypoint Localization Features from Accelerated Segment Test (FAST) Harris Shi-Tomasi

More information

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features

Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Action Recognition in Low Quality Videos by Jointly Using Shape, Motion and Texture Features Saimunur Rahman, John See, Chiung Ching Ho Centre of Visual Computing, Faculty of Computing and Informatics

More information

Human Action Recognition Using Independent Component Analysis

Human Action Recognition Using Independent Component Analysis Human Action Recognition Using Independent Component Analysis Masaki Yamazaki, Yen-Wei Chen and Gang Xu Department of Media echnology Ritsumeikan University 1-1-1 Nojihigashi, Kusatsu, Shiga, 525-8577,

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words 1 Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words Juan Carlos Niebles 1,2, Hongcheng Wang 1, Li Fei-Fei 1 1 University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at Dienstags, 16.oo Uhr This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Agenda Evaluations

More information

Leveraging Textural Features for Recognizing Actions in Low Quality Videos

Leveraging Textural Features for Recognizing Actions in Low Quality Videos Leveraging Textural Features for Recognizing Actions in Low Quality Videos Saimunur Rahman 1, John See 2, and Chiung Ching Ho 3 Centre of Visual Computing, Faculty of Computing and Informatics Multimedia

More information

OBJECT CATEGORIZATION

OBJECT CATEGORIZATION OBJECT CATEGORIZATION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it Slides: Ing. Lamberto Ballan November 18th, 2009 What is an Object? Merriam-Webster Definition: Something material that may be

More information

Lecture 12 Recognition. Davide Scaramuzza

Lecture 12 Recognition. Davide Scaramuzza Lecture 12 Recognition Davide Scaramuzza Oral exam dates UZH January 19-20 ETH 30.01 to 9.02 2017 (schedule handled by ETH) Exam location Davide Scaramuzza s office: Andreasstrasse 15, 2.10, 8050 Zurich

More information

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach Vandit Gajjar gajjar.vandit.381@ldce.ac.in Ayesha Gurnani gurnani.ayesha.52@ldce.ac.in Yash Khandhediya khandhediya.yash.364@ldce.ac.in

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

A Comparison of SIFT, PCA-SIFT and SURF

A Comparison of SIFT, PCA-SIFT and SURF A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea qiuhehappy@hotmail.com Oubong Gwun Computer Graphics Lab, Chonbuk National

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning Izumi Suzuki, Koich Yamada, Muneyuki Unehara Nagaoka University of Technology, 1603-1, Kamitomioka Nagaoka, Niigata

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

Spatio-temporal Shape and Flow Correlation for Action Recognition

Spatio-temporal Shape and Flow Correlation for Action Recognition Spatio-temporal Shape and Flow Correlation for Action Recognition Yan Ke 1, Rahul Sukthankar 2,1, Martial Hebert 1 1 School of Computer Science, Carnegie Mellon; 2 Intel Research Pittsburgh {yke,rahuls,hebert}@cs.cmu.edu

More information

An Evaluation of Volumetric Interest Points

An Evaluation of Volumetric Interest Points An Evaluation of Volumetric Interest Points Tsz-Ho YU Oliver WOODFORD Roberto CIPOLLA Machine Intelligence Lab Department of Engineering, University of Cambridge About this project We conducted the first

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

CS 4495 Computer Vision Motion and Optic Flow

CS 4495 Computer Vision Motion and Optic Flow CS 4495 Computer Vision Aaron Bobick School of Interactive Computing Administrivia PS4 is out, due Sunday Oct 27 th. All relevant lectures posted Details about Problem Set: You may *not* use built in Harris

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Learning Realistic Human Actions from Movies

Learning Realistic Human Actions from Movies Learning Realistic Human Actions from Movies Ivan Laptev*, Marcin Marszałek**, Cordelia Schmid**, Benjamin Rozenfeld*** INRIA Rennes, France ** INRIA Grenoble, France *** Bar-Ilan University, Israel Presented

More information

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM

CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM CORRELATION BASED CAR NUMBER PLATE EXTRACTION SYSTEM 1 PHYO THET KHIN, 2 LAI LAI WIN KYI 1,2 Department of Information Technology, Mandalay Technological University The Republic of the Union of Myanmar

More information