YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou

Size: px

Start display at page:

Download "YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou"

Brittany Allen
5 years ago
Views:

1 YOLO: You Only Look Once Unified Real-Time Object Detection Presenter: Liyang Zhong Quan Zou

2 Outline 1. Review: R-CNN 2. YOLO: -- Detection Procedure -- Network Design -- Training Part -- Experiments

3 Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

4 Proposal + Classification

5 Shortcoming: 1. Slow, impossible for real-time detection 2. Hard to optimize Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

6 WHAT S NEW Regression

7 YOLO Features 1. Extremely fast (45 frames per second) 2. Reason Globally on the Entire Image 3. Learn Generalizable Representations

8 Detection Procedure 9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44

9 We split the image into an S*S grid

10 We split the image into an S*S grid 7*7 grid

11 Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

12 Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

13 Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object) each box predict: B=2 P(Object): probability that the box contains an object

14 Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

15 Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

16 Each cell predicts boxes and confidences: P(Object)

17 Each cell also predicts a class probability. Bicycle Car Dog Dining Table

18 Conditioned on object: P(Car Object) Bicycle Car Dog Eg. Dog = 0.8 Cat = 0 Bike = 0 Dining Table

19 Then we combine the box and class predictions. P(class Object) * P(Object) =P(class)

20 Finally we do threshold detections and NMS

21 2oh4-GTdX6M/edit#slide=id.g151008b386_0_44 S * S * (B * 5 + C) tensor

22 Network

24 pretrain

25 pretrain stride = 2

26 Train

27 4-GTdX6M/edit#slide=id.g151008b386_0_44 During training, match example to the right cell

28 During training, match example to the right cell

29 Adjust that cell s class prediction Dog = 1 Cat = 0 Bike = 0...

30 Look at that cell s predicted boxes

31 Find the best one, adjust it, increase the confidence

32 Find the best one, adjust it, increase the confidence

33 Find the best one, adjust it, increase the confidence

34 Decrease the confidence of the other box

35 Decrease the confidence of the other box

36 Some cells don t have any ground truth detections!

37 Some cells don t have any ground truth detections!

38 Decrease the confidence of boxes boxes

39 Decrease the confidence of these boxes

40 Don t adjust the class probabilities or coordinates

45 Experiments Datasets PASCAL VOC 2007 & VOC 2012

46 Experiments Datasets

47 Accurate object detection is slow! DPM v5 Pascal 2007 map Speed FPS 14 s/img Ref:

48 Accurate object detection is slow! Pascal 2007 map Speed DPM v FPS 14 s/img R-CNN FPS 20 s/img Ref:

49 Accurate object detection is slow! Pascal 2007 map Speed DPM v FPS 14 s/img R-CNN FPS 20 s/img ⅓ Mile, 1760 feet Ref:

50 Accurate object detection is slow! Pascal 2007 map Speed DPM v FPS 14 s/img R-CNN FPS 20 s/img Fast R-CNN FPS 2 s/img 176 feet Ref:

51 Accurate object detection is slow! Pascal 2007 map Speed DPM v FPS 14 s/img R-CNN FPS 20 s/img Fast R-CNN FPS 2 s/img Faster R-CNN FPS 140 ms/img 8 feet 12 feet Ref:

0.5 FPS 2 s/img Faster R-CNN 73.2 7 FPS 140 ms/img YOLO 63.

52 Accurate object detection is slow! Pascal 2007 map Speed DPM v FPS 14 s/img R-CNN FPS 20 s/img Fast R-CNN FPS 2 s/img Faster R-CNN FPS 140 ms/img YOLO FPS 22 ms/img 2 feet Ref:

53 Error Analysis Loc: Localization Error Correct class,.1<iou<.5 Background: IOU<0.1

54 YOLO generalizes well to new domains (like art) Ref:

55 It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork S. Ginosar, D. Haas, T. Brown, and J. Malik. Detecting people in cubist art. In Computer Vision-ECCV 2014 Workshops, pages Springer, H. Cai, Q. Wu, T. Corradi, and P. Hall. The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs.

56 Demo

57 Strengths and Weaknesses Strengths: Fast: 45fps, smaller version 155fps End2end training Background error is low

58 Strengths and Weaknesses Weaknesses: Performance is lower than state-of-art Makes more localization errors

59 Open Questions How to determine the number of cell, bounding box and the size of the box Why normalization x,y,w,h even all the input images have the same resolution?

60 Extension Part YOLOv2! arxiv:

Lecture 5: Object Detection

Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based