Object Recognition and Detection

Size: px
Start display at page:

Download "Object Recognition and Detection"

Transcription

1 CS 2770: Computer Vision Object Recognition and Detection Prof. Adriana Kovashka University of Pittsburgh March 16, 21, 23, 2017

2 Plan for the next few lectures Recognizing the category in the image as a whole Detecting the region in the image that corresponds to a category Using window templates Face detection Pedestrian detection Using parts Implicit Shape Models Deformable Part Models Using Convolutional Neural Networks R-CNN, Fast R-CNN YOLO (You Only Look Once)

3 Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories CVPR 2006 Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign Cordelia Schmid INRIA Rhône-Alpes, France Jean Ponce Ecole Normale Supérieure, France

4 Scene category dataset Fei-Fei & Perona (2005), Oliva & Torralba (2001) Slide credit: L. Lazebnik

5 Bag-of-words representation 1. Extract local features 2. Learn visual vocabulary using clustering 3. Quantize local features using visual vocabulary 4. Represent images by frequencies of visual words Slide credit: L. Lazebnik

6 Image categorization with bag of words Training 1. Compute bag-of-words representation for training images 2. Train classifier on labeled examples using histogram values as features 3. Labels are the scene types (e.g. mountain vs field) Testing 1. Extract keypoints/descriptors for test images 2. Quantize into visual words using the clusters computed at training time 3. Compute visual word histogram for test images 4. Compute labels on test images using classifier obtained at training time 5. Measure accuracy of test predictions by comparing them to groundtruth test labels (obtained from humans) Adapted from D. Hoiem

7 Feature extraction (on which BOW is based) Weak features Strong features Edge points at 2 scales and 8 orientations (vocabulary size 16) SIFT descriptors of 16x16 patches sampled on a regular grid, quantized to form visual vocabulary (size 200, 400) Slide credit: L. Lazebnik

8 What about spatial layout? All of these images have the same color histogram Slide credit: D. Hoiem

9 Spatial pyramid Compute histogram in each spatial bin Slide credit: D. Hoiem

10 Spatial pyramid [Lazebnik et al. CVPR 2006] Slide credit: D. Hoiem

11 Adapted from L. Lazebnik Pyramid matching Indyk & Thaper (2003), Grauman & Darrell (2005) Matching using pyramid and histogram intersection for some particular visual word: x i x j Original images Feature histograms: Level 3 Level 2 Level 1 Level 0 K( x i, x j ) Total weight (value of pyramid match kernel):

12 Scene category dataset Fei-Fei & Perona (2005), Oliva & Torralba (2001) Multi-class classification results (100 training images per class) Fei-Fei & Perona: 65.2% Slide credit: L. Lazebnik

13 Scene category confusions Difficult indoor images kitchen living room bedroom Slide credit: L. Lazebnik

14 Caltech101 dataset Fei-Fei et al. (2004) Multi-class classification results (30 training images per class) Slide credit: L. Lazebnik

15 Plan for the next few lectures Recognizing the category in the image as a whole Detecting the region in the image that corresponds to a category Using window templates Face detection Pedestrian detection Using parts Implicit Shape Models Deformable Part Models Using Convolutional Neural Networks R-CNN, Fast R-CNN YOLO (You Only Look Once)

16 Category detection: basic framework Build/train object model Choose a representation Learn or fit parameters of model / classifier Generate candidates in new image Score the candidates Kristen Grauman

17 Category detection: representation choice Window-based Part-based Kristen Grauman

18 Window-template-based models Building an object model Consider edges, contours, oriented intensity gradients Summarize local distribution of gradients with histogram Locally orderless: offers invariance to small shifts and rotations Adapted from Kristen Grauman

19 Window-template-based models Building an object model Given the representation, train a binary classifier Car/non-car Classifier No, Yes, not car. a car. Kristen Grauman

20 Window-template-based models Generating and scoring candidates Car/non-car Classifier Kristen Grauman

21 Window-template-based object detection: recap Training: 1. Obtain training data 2. Define features 3. Define classifier Given new image: 1. Slide window 2. Score by classifier Training examples Car/non-car Classifier Feature extraction Kristen Grauman

22 Special case: Faces Detection Recognition Sally Lana Lazebnik

23 Challenges of face detection Sliding window detector must evaluate tens of thousands of location/scale combinations Faces are rare: 0 10 per image A megapixel image has ~10 6 pixels and a comparable number of candidate face locations For computational efficiency, we should try to spend as little time as possible on the non-face windows To avoid having a false positive in every image, our false positive rate has to be less than 10-6 Lana Lazebnik

24 Viola-Jones face detector

25 Boosting intuition Weak Classifier 1 Paul Viola

26 Boosting illustration Weights Increased Paul Viola

27 Boosting illustration Weak Classifier 2 Paul Viola

28 Boosting illustration Weights Increased Paul Viola

29 Boosting illustration Weak Classifier 3 Paul Viola

30 Boosting illustration Final classifier is a combination of weak classifiers Paul Viola

31 Boosting: training Initially, weight each training example equally In each boosting round: Find the weak learner that achieves the lowest weighted training error Raise weights of training examples misclassified by current weak learner Compute final classifier as linear combination of all weak learners (weight of each learner is directly proportional to its accuracy) Exact formulas for re-weighting and combining weak learners depend on the particular boosting scheme (e.g., AdaBoost) Lana Lazebnik, Kristen Grauman

32 Main idea: Viola-Jones face detector Represent local texture with efficiently computable rectangular features within window of interest Select discriminative features to be weak classifiers Use boosted combination of them as final classifier Form a cascade of such classifiers, rejecting clear negatives quickly Kristen Grauman

33 Viola-Jones detector: features Rectangular filters Feature output is difference between adjacent regions Value = (pixels in white area) (pixels in black area) Efficiently computable with integral image: any sum can be computed in constant time Value at (x,y) is sum of pixels above and to the left of (x,y) Integral image Adapted from Kristen Grauman and Lana Lazebnik

34 Fast computation with integral images The integral image computes a value at each pixel (x,y) that is the sum of the pixel values above and to the left of (x,y), inclusive This can quickly be computed in one pass through the image (x,y) Lana Lazebnik

35 Lana Lazebnik Computing sum within a rectangle Let A,B,C,D be the values of the integral image at the corners of a rectangle Then the sum of original image values within the rectangle can be computed as: sum = A B C + D Only 3 additions are required for any size of rectangle! D C B A

36 Lana Lazebnik Example Source Result

37 Viola-Jones detector: features Which subset of these features should we use to determine if a window has a face? Considering all possible filter parameters: position, scale, and type: 180,000+ possible features associated with each 24 x 24 window Use AdaBoost both to select the informative features and to form the classifier Kristen Grauman

38 Viola-Jones detector: AdaBoost Want to select the single rectangle feature and threshold that best separates positive (faces) and negative (nonfaces) training examples, in terms of weighted error. Resulting weak classifier: Outputs of a possible rectangle feature on faces and non-faces. For next round, reweight the examples according to errors, choose another filter/threshold combo. Kristen Grauman

39 Start with uniform weights on training examples. For M rounds Evaluate weighted error for each weak learner, pick best learner. y m (x n ) is the prediction, t n is ground truth for x n Figure from C. Bishop, notes from K. Grauman (d) Normalize the weights so they sum to 1 Re-weight the examples: Incorrectly classified get more weight, correctly classified get less weight. Final classifier is combination of weak ones, weighted according to error they had.

40 Boosting for face detection First two features selected by boosting: This feature combination can yield 100% detection rate and 50% false positive rate Lana Lazebnik

41 Boosting: pros and cons Advantages of boosting Integrates classification with feature selection Complexity of training is linear in the number of training examples Flexibility in the choice of weak learners, boosting scheme Testing is fast Easy to implement Disadvantages Needs many training examples Often found not to work as well as an alternative discriminative classifier, support vector machine (SVM) Lana Lazebnik

42 Are we done? Even if the filters are fast to compute, each new image has a lot of possible windows to search. How to make the detection more efficient? Kristen Grauman

43 Cascading classifiers for detection Form a cascade with low false negative rates early on Apply less accurate but faster classifiers first to immediately discard windows that clearly appear to be negative Kristen Grauman

44 Viola-Jones detector: summary Train cascade of classifiers with AdaBoost Faces New image Selected features, thresholds, and weights Non-faces Train with 5K positives, 350M negatives Real-time detector using 38 layer cascade (0.067s) 6061 features in all layers Adapted from Kristen Grauman

45 Viola-Jones detector: summary A seminal approach to real-time object detection Training is slow, but detection is very fast Key ideas Integral images for fast feature evaluation Boosting for feature selection Attentional cascade of classifiers for fast rejection of non-face windows P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), Matlab demo: Adapted from Kristen Grauman

46 Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Kristen Grauman Viola-Jones Face Detector: Results

47 Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Kristen Grauman Viola-Jones Face Detector: Results

48 Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Kristen Grauman Viola-Jones Face Detector: Results

49 Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Kristen Grauman Detecting profile faces? Can we use the same detector?

50 Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Viola-Jones Face Detector: Results Paul Viola, ICCV tutorial Kristen Grauman

51 Dalal-Triggs pedestrian detector 1. Extract fixed-sized (64x128 pixel) window at each position and scale 2. Compute HOG (histogram of gradient) features within each window 3. Score the window with a linear SVM classifier 4. Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

52 Histograms of oriented gradients (HOG) Divide image into 8x8 regions Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Votes weighted by magnitude Adapted from Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

53 Histograms of oriented gradients (HOG) 10x10 cells 20x20 cells N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005 Image credit: N. Snavely

54 Histograms of oriented gradients (HOG) N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005 Image credit: N. Snavely

55 Histograms of oriented gradients (HOG) N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005

56 Histograms of oriented gradients (HOG) N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005

57 Train SVM for pedestrian detection using HoG pos w neg w + pedestrian Adapted from Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

58 Remove overlapping detections Non-max suppression Score = 0.8 Score = 0.8 Score = 0.1 Adapted from Derek Hoiem

59 Plan for the next few lectures Recognizing the category in the image as a whole Detecting the region in the image that corresponds to a category Using window templates Face detection Pedestrian detection Using parts Implicit Shape Models Deformable Part Models Using Convolutional Neural Networks R-CNN, Fast R-CNN YOLO (You Only Look Once)

60 Sliding window detector

61 Are window templates enough? Single rigid window template usually not enough to represent a category Many objects (e.g. humans) are articulated, or have parts that can vary in configuration Many object categories look very different from different viewpoints, or from instance to instance Slide by N. Snavely

62 Deformable objects Images from Caltech-256 Slide Credit: Duan Tran

63 Deformable objects Images from D. Ramanan s dataset Slide Credit: Duan Tran

64 Parts-based Models Define object by collection of parts modeled by 1. Appearance 2. Spatial configuration Slide credit: Rob Fergus

65 How to model spatial relations? One extreme: fixed template Derek Hoiem

66 Fixed part-based template Object model = sum of scores of features at fixed positions = -0.5? > 7.5 Non-object = 10.5 Object? > 7.5 Derek Hoiem

67 How to model spatial relations? Another extreme: bag of words = Derek Hoiem

68 How to model spatial relations? Star-shaped model X = X X Derek Hoiem

69 How to model spatial relations? Star-shaped model Part Part Part Root Part Part Derek Hoiem

70 Parts-based Models Articulated parts model Object is configuration of parts Each part is detectable and can move around Adapted from Derek Hoiem, images from Felzenszwalb

71 Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = part ] training image annotated with object localization info visual codeword with displacement vectors B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004 Lana Lazebnik

72 Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering Lana Lazebnik

73 Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering 2. Map the patch around each interest point to closest word Lana Lazebnik

74 Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering 2. Map the patch around each interest point to closest word 3. For each word, store all positions it was found, relative to object center Lana Lazebnik

75 Recall: Generalized Hough transform Template representation: for each type of landmark point, store all possible displacement vectors towards the center Template Model Svetlana Lazebnik

76 Implicit shape models: Testing 1. Given new test image, extract patches, match to vocabulary words 2. Cast votes for possible positions of object center 3. Search for maxima in voting space Lana Lazebnik

77 Visual Perceptual Object and Recognition Sensory Augmented Tutorial Computing Detection Results Qualitative Performance Recognizes different kinds of objects Robust to clutter, occlusion, noise, low contrast K. Grauman, B. Leibe

78 Discriminative part-based models Root filter Part filters Deformation weights P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010 Lana Lazebnik

79 Discriminative part-based models Multiple components P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010 Lana Lazebnik

80 Scoring an object hypothesis The score of a hypothesis is the sum of appearance scores minus the sum of deformation costs part loc anchor loc Displacements i.e. how much the part p i moved from its expected anchor location in the x, y directions Appearance weights Part features Deformation weights i.e. how much we ll penalize the part p i Felzenszwalb et al. for moving from its expected location

81 Felzenszwalb et al. Detection

82 Training Training data: images with labeled bounding boxes Parts are not annotated Need to learn the weights and deformation parameters Adapted from Lana Lazebnik

83 Training Our classifier has the form f ( x) max w HΦ ( x, z) z w are model parameters, z are latent hypotheses Latent SVM training: Initialize w and iterate: Fix w and find the best z for each training example Fix z and solve for w (standard SVM training) Lana Lazebnik

84 Car model Component 1 Component 2 Lana Lazebnik

85 Car detections Lana Lazebnik

86 Person model Lana Lazebnik

87 Person detections Lana Lazebnik

88 Cat model Lana Lazebnik

89 Cat detections Lana Lazebnik

90 Speeding up detection: Restrict set of windows we pass through SVM to those w/ high objectness Alexe et al., CVPR 2010

91 Alexe et al., CVPR 2010 Objectness cue #1: Where people look

92 Objectness cue #2: color contrast at boundary Alexe et al., CVPR 2010

93 Objectness cue #3: no segments straddling the object box Alexe et al., CVPR 2010

94 Boxes found to have high objectness Cyan = ground truth bounding boxes, yellow = correct and red = incorrect predictions for objectness Only run the sheep / horse / chair etc. classifier on the yellow/red boxes. Alexe et al., CVPR 2010

95 How do detectors fail? Most errors that detectors make are reasonable Localization error and confusion with similar objects Misdetection of occluded or small objects Detectors have different sensitivity to different factors E.g. less sensitive to truncation than to size differences Failure analysis code and annotations available online Adapted from Hoiem et al., ECCV 2012

96 Analysis of object characteristics Additional annotations for seven categories: occlusion level, parts visible, sides visible Hoiem et al., ECCV 2012

97 Top false positives: Airplane (DPM) AP = Background 27% Localization 29% Other Objects 11% Similar Objects 33% Bird, Boat, Car Hoiem et al., ECCV 2012

98 Object characteristics: Aeroplane Occlusion: poor robustness to occlusion, but little impact on overall performance Easier (None) Hoiem et al., ECCV 2012 Harder (Heavy)

99 Object characteristics: Aeroplane Size: strong preference for average to above average sized airplanes Large Medium X-Large Small X-Small Easier Hoiem et al., ECCV 2012 Harder

100 Object characteristics: Aeroplane Aspect Ratio: 2-3x better at detecting wide (side) views than tall views X-Wide Wide Medium X-Tall Tall Easier (Wide) Hoiem et al., ECCV 2012 Harder (Tall)

101 Object characteristics: Aeroplane Sides/Parts: best performance = direct side view with all parts visible Easier (Side) Hoiem et al., ECCV 2012 Harder (Non-Side)

102 Summary Window-template-based approaches Assume object appears in roughly the same configuration in different images Look for alignment with a global template Part-based methods Allow parts to move somewhat from their usual locations Look for good fits in appearance, for both the global template and the individual part templates Speed up by only scoring boxes that look like any object Models prefer that objects appear in certain views

103 Plan for the next few lectures Recognizing the category in the image as a whole Detecting the region in the image that corresponds to a category Using window templates Face detection Pedestrian detection Using parts Implicit Shape Models Deformable Part Models Using Convolutional Neural Networks R-CNN, Fast R-CNN YOLO (You Only Look Once)

104 map (%) Complexity and theplateau [Source: esults/index.html] % DPM 23% DPM, HOG+BOW 28% DPM, MKL plateau & increasing complexity 37% DPM++ 41% 41% DPM++, MKL, Selective Search Selective Search, DPM++, MKL Top competition results ( ) 0 VOC 07 VOC 08 VOC 09 VOC 10 VOC 11 VOC 12 PASCAL VOC challenge dataset Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

105 map (%) R-CNN: Regions with CNNfeatures R-CNN 58.5% R-CNN 53.7% R-CNN 53.3% Postcompetition results ( present) Top competition results ( ) 0 VOC 07 VOC 08 VOC 09 VOC 10 VOC 11 VOC 12 PASCAL VOC challenge dataset Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

106 R-CNN: Regions with CNNfeatures CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Compute CNN features Classify regions (linear SVM) Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

107 R-CNN at test time: Step 1 CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Proposal-method agnostic, many choices - Selective Search [van de Sande, Uijlings et al.] (Used in this work) - Objectness [Alexe etal.] - Category independent object proposals [Endres & Hoiem] - CPMC [Carreira & Sminchisescu] Active area, at this CVPR - BING [Ming et al.] fast - MCG [Arbelaez et al.] high-quality segmentation Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

108 R-CNN at test time: Step 2 CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Compute CNN features Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

109 R-CNN at test time: Step 2 CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Compute CNN features Dilate proposal Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

110 R-CNN at test time: Step 2 CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Compute CNN features a. Crop Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

111 R-CNN at test time: Step 2 CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Compute CNN features 227 x 227 a. Crop b. Scale (anisotropic) Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

112 R-CNN at test time: Step 2 CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Compute CNN features Crop b. Scale (anisotropic) c. Forward propagate Output: fc7 features Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

113 R-CNN at test time: Step 3 CNN aeroplane? no.. person? yes... tvmonitor? no. Input image Extract region proposals (~2k / image) Compute CNN features Classify regions person? horse? proposal 4096-dimensional fc7 feature vector linear classifiers (SVM or softmax) Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

114 Step 4: Object proposal refinement Linear regression on CNNfeatures Original proposal Predicted object bounding box Bounding-box regression Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

115 R-CNN results onpascal VOC 2007 VOC 2010 DPM v5 (Girshick et al. 2011) 33.7% 29.6% UVA sel. search (Uijlings et al. 35.1% 2013) Regionlets (Wang et al. 2013) 41.7% 39.7% SegDPM (Fidler et al. 2013) 40.4% Reference systems R-CNN 54.2% 50.2% R-CNN + bbox regression 58.5% 53.7% metric: mean average precision (higher is better) Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

116 R-CNN results onpascal VOC 2007 VOC 2010 DPM v5 (Girshick et al. 2011) 33.7% 29.6% UVA sel. search (Uijlings et al. 35.1% 2013) Regionlets (Wang et al. 2013) 41.7% 39.7% SegDPM (Fidler et al. 2013) 40.4% R-CNN 54.2% 50.2% R-CNN + bbox regression 58.5% 53.7% metric: mean average precision (higher is better) Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

117 R-CNN on ImageNet detection ILSVRC2013 detection test set map *R CNN BB *OverFeat (2) UvA Euvision *NEC MU *OverFeat (1) Toronto A SYSU_Vision GPU_UCLA 31.4% 24.3% 22.6% 20.9% 19.4% 11.5% 10.5% 9.8% Delta UIUC IFP 1.0% 6.1% post competition result competition result mean average precision (map) in % 0 Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

118 Training R-CNN Bounding-box labeled detection data is scarce Key insight: Use supervised pre-training on a data-rich auxiliary task and transfer to detection Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

119 R-CNN training: Step 1 Supervised pre-training Train a SuperVision CNN* for the 1000-way ILSVRC image classification task train CNN Auxiliary task: ILSVRC 2012 classification (1.2 million images) *Network from Krizhevsky, Sutskever & Hinton. NIPS 2012 Also called AlexNet Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

120 R-CNN training: Step 2 Fine-tune the CNN for detection Transfer the representation learned for ILSVRC classification to PASCAL (or ImageNet detection) fine-tune CNN Target task: PASCAL VOCdetection (~25k object labels) Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

121 R-CNN training: Step 3 Train detection SVMs (With the softmax classifier from fine-tuning map decreases from 54% to 51%) PASCAL VOC object proposals ~ 2k windows / image CNN features training labels per-class SVM Girshick et al., R i c h Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

122 Slow R-CNN Apply bounding-box regressors Bbox reg SVMs Classify regions with SVMs Bbox reg SVMs Bbox reg SVMs ConvNet Forward each region through ConvNet ConvNet ConvNet Warped image regions Regions of Interest (RoI) from a proposal method (~2k) Girshick et al. CVPR14 Input image Post hoc component

123 What s wrong with slow R-CNN? Ad hoc training objectives Fine-tune network with softmax classifier (log loss) Train post-hoc linear SVMs (hingeloss) Train post-hoc bounding-box regressions (least squares) Training is slow (84h), takes a lot of disk space Inference (detection) is slow 47s / image with VGG16 [Simonyan & Zisserman, ICLR15] Girshick, Fast R-CNN, ICCV 2015 ~2000 ConvNet forward passes per image

124 Fast R-CNN Fast test time One network, trained in one stage Higher mean average precision Girshick, Fast R-CNN, ICCV 2015

125 Fast R-CNN (test time) Regions of Interest (RoIs) from a proposal method conv5 feature map of image Forward whole image through ConvNet ConvNet Input image Girshick, Fast R-CNN, ICCV 2015

126 Fast R-CNN (test time) RoI Pooling layer Regions of Interest (RoIs) from a proposal method conv5 feature map of image Forward whole image through ConvNet ConvNet Input image Girshick, Fast R-CNN, ICCV 2015

127 Fast R-CNN (test time) Softmax classifier Linear + softmax FCs Fully-connected layers RoI Pooling layer Regions of Interest (RoIs) from a proposal method conv5 feature map of image Forward whole image through ConvNet ConvNet Input image Girshick, Fast R-CNN, ICCV 2015

128 Fast R-CNN (test time) Softmax classifier Linear + softmax Linear Bounding-box regressors FCs Fully-connected layers RoI Pooling layer Regions of Interest (RoIs) from a proposal method conv5 feature map of image Forward whole image through ConvNet ConvNet Input image Girshick, Fast R-CNN, ICCV 2015

129 Fast R-CNN (training) Linear + softmax Linear FCs ConvNet Girshick, Fast R-CNN, ICCV 2015

130 Fast R-CNN (training) Log loss + smooth L1 loss Multi-task loss Linear + softmax Linear FCs ConvNet Girshick, Fast R-CNN, ICCV 2015

131 Fast R-CNN (training) Linear + softmax Log loss + smooth L1 loss Linear Multi-task loss FCs Trainable ConvNet Girshick, Fast R-CNN, ICCV 2015

132 Main results Fast R-CNN R-CNN [1] SPP-net[2] Train time (h) Speedup 8.8x 1x 3.4x Test time / image 0.32s 47.0s 2.3s Test speedup 146x 1x 20x map 66.9% 66.0% 63.1% Timings exclude object proposal time, which is equal for all methods. All methods use VGG16 from Simonyan and Zisserman. [1] Girshick et al. CVPR14 [2] He et al. ECCV14 Girshick, Fast R-CNN, ICCV 2015

133 Accurate object detection is slow! Pascal 2007 map Speed DPM v FPS 14 s/img R-CNN FPS 20 s/img ⅓ Mile, 1760 feet Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

134 Accurate object detection is slow! Pascal 2007 map Speed DPM v FPS 14 s/img R-CNN FPS 20 s/img Fast R-CNN FPS 2 s/img Faster R-CNN FPS 140 ms/img YOLO FPS 22 ms/img 2 feet Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

135 Split the image into a grid Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

136 Each cell predicts boxes and confidences: P(Object) Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

137 Each cell also predicts a probability P(Class Object) Bicycle Car Dog Dining Table Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

138 Combine the box and class predictions Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

139 Finally do NMS and threshold detections Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

140 YOLO works across many natural images Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

141 It also generalizes well to new domains Redmon et al., You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018 CS 1674: Intro to Computer Vision Object Recognition Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018 Different Flavors of Object Recognition Semantic Segmentation Classification + Localization

More information

Detection III: Analyzing and Debugging Detection Methods

Detection III: Analyzing and Debugging Detection Methods CS 1699: Intro to Computer Vision Detection III: Analyzing and Debugging Detection Methods Prof. Adriana Kovashka University of Pittsburgh November 17, 2015 Today Review: Deformable part models How can

More information

Window based detectors

Window based detectors Window based detectors CS 554 Computer Vision Pinar Duygulu Bilkent University (Source: James Hays, Brown) Today Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Recap Image Classification with Bags of Local Features

Recap Image Classification with Bags of Local Features Recap Image Classification with Bags of Local Features Bag of Feature models were the state of the art for image classification for a decade BoF may still be the state of the art for instance retrieval

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Object Category Detection. Slides mostly from Derek Hoiem

Object Category Detection. Slides mostly from Derek Hoiem Object Category Detection Slides mostly from Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical template matching with sliding window Part-based Models

More information

Generic Object-Face detection

Generic Object-Face detection Generic Object-Face detection Jana Kosecka Many slides adapted from P. Viola, K. Grauman, S. Lazebnik and many others Today Window-based generic object detection basic pipeline boosting classifiers face

More information

Face Detection and Alignment. Prof. Xin Yang HUST

Face Detection and Alignment. Prof. Xin Yang HUST Face Detection and Alignment Prof. Xin Yang HUST Many slides adapted from P. Viola Face detection Face detection Basic idea: slide a window across image and evaluate a face model at every location Challenges

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 04/10/12 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class: Object Category Detection Overview of object category detection Statistical

More information

Object Recognition II

Object Recognition II Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

Previously. Window-based models for generic object detection 4/11/2011

Previously. Window-based models for generic object detection 4/11/2011 Previously for generic object detection Monday, April 11 UT-Austin Instance recognition Local features: detection and description Local feature matching, scalable indexing Spatial verification Intro to

More information

Category vs. instance recognition

Category vs. instance recognition Category vs. instance recognition Category: Find all the people Find all the buildings Often within a single image Often sliding window Instance: Is this face James? Find this specific famous building

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Object detection as supervised classification

Object detection as supervised classification Object detection as supervised classification Tues Nov 10 Kristen Grauman UT Austin Today Supervised classification Window-based generic object detection basic pipeline boosting classifiers face detection

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Object Category Detection: Sliding Windows

Object Category Detection: Sliding Windows 03/18/10 Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Goal: Detect all instances of objects Influential Works in Detection Sung-Poggio

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Bias-Variance Trade-off + Other Models and Problems

Bias-Variance Trade-off + Other Models and Problems CS 1699: Intro to Computer Vision Bias-Variance Trade-off + Other Models and Problems Prof. Adriana Kovashka University of Pittsburgh November 3, 2015 Outline Support Vector Machines (review + other uses)

More information

Modern Object Detection. Most slides from Ali Farhadi

Modern Object Detection. Most slides from Ali Farhadi Modern Object Detection Most slides from Ali Farhadi Comparison of Classifiers assuming x in {0 1} Learning Objective Training Inference Naïve Bayes maximize j i logp + logp ( x y ; θ ) ( y ; θ ) i ij

More information

Object Detection. Computer Vision Yuliang Zou, Virginia Tech. Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick

Object Detection. Computer Vision Yuliang Zou, Virginia Tech. Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick Object Detection Computer Vision Yuliang Zou, Virginia Tech Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick Administrative stuffs HW 4 due 11:59pm on Wed, November 8 HW 3 grades are out Average:

More information

Classifier Case Study: Viola-Jones Face Detector

Classifier Case Study: Viola-Jones Face Detector Classifier Case Study: Viola-Jones Face Detector P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection.

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1

Object Detection. Sanja Fidler CSC420: Intro to Image Understanding 1/ 1 Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1/ 1 Object Detection The goal of object detection is to localize objects in an image and tell their class Localization: place a tight

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic

More information

Ensemble Methods, Decision Trees

Ensemble Methods, Decision Trees CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

OBJECT DETECTION HYUNG IL KOO

OBJECT DETECTION HYUNG IL KOO OBJECT DETECTION HYUNG IL KOO INTRODUCTION Computer Vision Tasks Classification + Localization Classification: C-classes Input: image Output: class label Evaluation metric: accuracy Localization Input:

More information

Object recognition. Methods for classification and image representation

Object recognition. Methods for classification and image representation Object recognition Methods for classification and image representation Credits Slides by Pete Barnum Slides by FeiFei Li Paul Viola, Michael Jones, Robust Realtime Object Detection, IJCV 04 Navneet Dalal

More information

Face detection and recognition. Detection Recognition Sally

Face detection and recognition. Detection Recognition Sally Face detection and recognition Detection Recognition Sally Face detection & recognition Viola & Jones detector Available in open CV Face recognition Eigenfaces for face recognition Metric learning identification

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin Object detection Wed Feb 24 Kristen Grauman UT Austin Announcements Reminder: Assignment 2 is due Mar 9 and Mar 10 Be ready to run your code again on a new test set on Mar 10 Vision talk next Tuesday 11

More information

Optimizing Object Detection:

Optimizing Object Detection: Lecture 10: Optimizing Object Detection: A Case Study of R-CNN, Fast R-CNN, and Faster R-CNN Visual Computing Systems Today s task: object detection Image classification: what is the object in this image?

More information

Project 3 Q&A. Jonathan Krause

Project 3 Q&A. Jonathan Krause Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018 Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4

More information

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN CS6501: Deep Learning for Visual Recognition Object Detection I: RCNN, Fast-RCNN, Faster-RCNN Today s Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster

More information

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking Feature descriptors Alain Pagani Prof. Didier Stricker Computer Vision: Object and People Tracking 1 Overview Previous lectures: Feature extraction Today: Gradiant/edge Points (Kanade-Tomasi + Harris)

More information

Diagnosing Error in Object Detectors

Diagnosing Error in Object Detectors Diagnosing Error in Object Detectors Derek Hoiem Yodsawalai Chodpathumwan Qieyun Dai (presented by Yuduo Wu) Most of the slides are from Derek Hoiem's ECCV 2012 presentation Object detecion is a collecion

More information

Find that! Visual Object Detection Primer

Find that! Visual Object Detection Primer Find that! Visual Object Detection Primer SkTech/MIT Innovation Workshop August 16, 2012 Dr. Tomasz Malisiewicz tomasz@csail.mit.edu Find that! Your Goals...imagine one such system that drives information

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Object recognition (part 1)

Object recognition (part 1) Recognition Object recognition (part 1) CSE P 576 Larry Zitnick (larryz@microsoft.com) The Margaret Thatcher Illusion, by Peter Thompson Readings Szeliski Chapter 14 Recognition What do we mean by object

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets! Image Analysis 2 Face detection and recognition Window-based face detection: The Viola-Jones algorithm Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision:

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Supervised learning. y = f(x) function

Supervised learning. y = f(x) function Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the

More information

Category-level Localization

Category-level Localization Category-level Localization Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Includes slides from: Ondra Chum, Alyosha Efros, Mark Everingham, Pedro Felzenszwalb,

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

2D Image Processing Feature Descriptors

2D Image Processing Feature Descriptors 2D Image Processing Feature Descriptors Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1 Overview

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 1

More information

Object Detection with Discriminatively Trained Part Based Models

Object Detection with Discriminatively Trained Part Based Models Object Detection with Discriminatively Trained Part Based Models Pedro F. Felzenszwelb, Ross B. Girshick, David McAllester and Deva Ramanan Presented by Fabricio Santolin da Silva Kaustav Basu Some slides

More information

Segmentation as Selective Search for Object Recognition in ILSVRC2011

Segmentation as Selective Search for Object Recognition in ILSVRC2011 Segmentation as Selective Search for Object Recognition in ILSVRC2011 Koen van de Sande Jasper Uijlings Arnold Smeulders Theo Gevers Nicu Sebe Cees Snoek University of Amsterdam, University of Trento ILSVRC2011

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document

More information

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Many slides adapted from K. Grauman and D. Lowe Face detection and recognition Detection Recognition Sally History Early face recognition systems: based on features and distances

More information

Development in Object Detection. Junyuan Lin May 4th

Development in Object Detection. Junyuan Lin May 4th Development in Object Detection Junyuan Lin May 4th Line of Research [1] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection, CVPR 2005. HOG Feature template [2] P. Felzenszwalb,

More information

Fitting: The Hough transform

Fitting: The Hough transform Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data

More information

Selective Search for Object Recognition

Selective Search for Object Recognition Selective Search for Object Recognition Uijlings et al. Schuyler Smith Overview Introduction Object Recognition Selective Search Similarity Metrics Results Object Recognition Kitten Goal: Problem: Where

More information

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model Johnson Hsieh (johnsonhsieh@gmail.com), Alexander Chia (alexchia@stanford.edu) Abstract -- Object occlusion presents a major

More information

Learning Representations for Visual Object Class Recognition

Learning Representations for Visual Object Class Recognition Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features

More information

YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou

YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou YOLO: You Only Look Once Unified Real-Time Object Detection Presenter: Liyang Zhong Quan Zou Outline 1. Review: R-CNN 2. YOLO: -- Detection Procedure -- Network Design -- Training Part -- Experiments Rich

More information

Model Fitting: The Hough transform II

Model Fitting: The Hough transform II Model Fitting: The Hough transform II Guido Gerig, CS6640 Image Processing, Utah Theory: See handwritten notes GG: HT-notes-GG-II.pdf Credits: S. Narasimhan, CMU, Spring 2006 15-385,-685, Link Svetlana

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Raymond Ptucha, Rochester Institute of Technology, USA Tutorial-9 May 19, 218 www.nvidia.com/dli R. Ptucha 18 1 Fair Use Agreement

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Fitting: The Hough transform

Fitting: The Hough transform Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Object Detection with YOLO on Artwork Dataset

Object Detection with YOLO on Artwork Dataset Object Detection with YOLO on Artwork Dataset Yihui He Computer Science Department, Xi an Jiaotong University heyihui@stu.xjtu.edu.cn Abstract Person: 0.64 Horse: 0.28 I design a small object detection

More information

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 CS 1674: Intro to Computer Vision Neural Networks Prof. Adriana Kovashka University of Pittsburgh November 16, 2016 Announcements Please watch the videos I sent you, if you haven t yet (that s your reading)

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah Human detection using histogram of oriented gradients Srikumar Ramalingam School of Computing University of Utah Reference Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection,

More information

Part-based models. Lecture 10

Part-based models. Lecture 10 Part-based models Lecture 10 Overview Representation Location Appearance Generative interpretation Learning Distance transforms Other approaches using parts Felzenszwalb, Girshick, McAllester, Ramanan

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

High Level Computer Vision

High Level Computer Vision High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de http://www.d2.mpi-inf.mpg.de/cv Please Note No

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Fitting: The Hough transform

Fitting: The Hough transform Fitting: The Hough transform Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not vote consistently for any single model Missing data

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

G-CNN: an Iterative Grid Based Object Detector

G-CNN: an Iterative Grid Based Object Detector G-CNN: an Iterative Grid Based Object Detector Mahyar Najibi 1, Mohammad Rastegari 1,2, Larry S. Davis 1 1 University of Maryland, College Park 2 Allen Institute for Artificial Intelligence najibi@cs.umd.edu

More information

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser Templates and Background Subtraction Prof. D. Stricker Doz. G. Bleser 1 Surveillance Video: Example of multiple people tracking http://www.youtube.com/watch?v=inqv34bchem&feature=player_embedded As for

More information