YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou

Similar documents
Lecture 5: Object Detection

YOLO9000: Better, Faster, Stronger

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Object Detection with YOLO on Artwork Dataset

Object Detection Based on Deep Learning

Spatial Localization and Detection. Lecture 8-1

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Rich feature hierarchies for accurate object detection and semantic segmentation

Yiqi Yan. May 10, 2017

Rich feature hierarchies for accurate object detection and semant

Rich feature hierarchies for accurate object detection and semantic segmentation

G-CNN: an Iterative Grid Based Object Detector

Unified, real-time object detection

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

You Only Look Once: Unified, Real-Time Object Detection

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

OBJECT DETECTION HYUNG IL KOO

YOLO 9000 TAEWAN KIM

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Object Recognition II

Feature-Fused SSD: Fast Detection for Small Objects

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Gradient of the lower bound

Object detection with CNNs

Deformable Part Models

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Optimizing Object Detection:

Deep Learning for Object detection & localization

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

arxiv: v1 [cs.cv] 31 Mar 2016

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Abandoned Luggage Detection

Traffic Multiple Target Detection on YOLOv2

G-CNN: an Iterative Grid Based Object Detector

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Object Detection and Its Implementation on Android Devices

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

Object Detection in Sports Videos

Object Recognition and Detection

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Hand Detection For Grab-and-Go Groceries

Project 3 Q&A. Jonathan Krause

Single-Shot Refinement Neural Network for Object Detection -Supplementary Material-

arxiv: v2 [cs.cv] 13 Jun 2017

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Regionlet Object Detector with Hand-crafted and CNN Feature

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

A Lightweight YOLOv2:

Optimizing Object Detection:

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Lecture 7: Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

DPATCH: An Adversarial Patch Attack on Object Detectors

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

arxiv: v3 [cs.cv] 15 Sep 2018

Learning Spatial Context: Using Stuff to Find Things

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

OBJECT TRACKING IN GAMES USING CONVOLUTIONAL NEURAL NETWORKS. A Thesis. presented to. the Faculty of California Polytechnic State University,

Synscapes A photorealistic syntehtic dataset for street scene parsing Jonas Unger Department of Science and Technology Linköpings Universitet.

Self Driving. DNN * * Reinforcement * Unsupervised *

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

SSD: Single Shot MultiBox Detector

Linear combinations of simple classifiers for the PASCAL challenge

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

Perception Deception: Physical Adversarial Attack Challenges and Tactics for DNN-based Object Detection

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Category-level localization

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Multi-View 3D Object Detection Network for Autonomous Driving

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Final Report: Smart Trash Net: Waste Localization and Classification

arxiv: v2 [cs.cv] 30 Mar 2016

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

Egocentric Hand Tracking for Virtual Reality

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Crowdsourcing Part Annotations for Visual Verification

Unstructured Data. CS102 Winter 2019

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

CS5670: Computer Vision

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

arxiv: v1 [cs.cv] 4 Jun 2015

Towards Real-Time Detection and Tracking of Basketball Players using Deep Neural Networks

Transcription:

YOLO: You Only Look Once Unified Real-Time Object Detection Presenter: Liyang Zhong Quan Zou

Outline 1. Review: R-CNN 2. YOLO: -- Detection Procedure -- Network Design -- Training Part -- Experiments

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Proposal + Classification

Shortcoming: 1. Slow, impossible for real-time detection 2. Hard to optimize Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

WHAT S NEW Regression

YOLO Features 1. Extremely fast (45 frames per second) 2. Reason Globally on the Entire Image 3. Learn Generalizable Representations

Detection Procedure https://docs.google.com/presentation/d/1kaa7noambt4calbu9ihgt8a86rrhz 9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44

We split the image into an S*S grid

We split the image into an S*S grid 7*7 grid

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object) each box predict: B=2 P(Object): probability that the box contains an object

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)

Each cell predicts boxes and confidences: P(Object)

Each cell also predicts a class probability. Bicycle Car Dog Dining Table

Conditioned on object: P(Car Object) Bicycle Car Dog Eg. Dog = 0.8 Cat = 0 Bike = 0 Dining Table

Then we combine the box and class predictions. P(class Object) * P(Object) =P(class)

Finally we do threshold detections and NMS

https://docs.google.com/presentation/d/1kaa7noambt4calbu9ihgt8a86rrhz9yz 2oh4-GTdX6M/edit#slide=id.g151008b386_0_44 S * S * (B * 5 + C) tensor

Network

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote pretrain

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote pretrain stride = 2

Train

https://docs.google.com/presentation/d/1kaa7noambt4calbu9ihgt8a86rrhz9yz2oh 4-GTdX6M/edit#slide=id.g151008b386_0_44 During training, match example to the right cell

During training, match example to the right cell

Adjust that cell s class prediction Dog = 1 Cat = 0 Bike = 0...

Look at that cell s predicted boxes

Find the best one, adjust it, increase the confidence

Find the best one, adjust it, increase the confidence

Find the best one, adjust it, increase the confidence

Decrease the confidence of the other box

Decrease the confidence of the other box

Some cells don t have any ground truth detections!

Some cells don t have any ground truth detections!

Decrease the confidence of boxes boxes

Decrease the confidence of these boxes

Don t adjust the class probabilities or coordinates

https://www.slideshare.net/taegyunjeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save

https://www.slideshare.net/taegyunjeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save

https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote

https://www.slideshare.net/taegyunjeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save

Experiments Datasets PASCAL VOC 2007 & VOC 2012

Experiments Datasets

Accurate object detection is slow! DPM v5 Pascal 2007 map Speed 33.7.07 FPS 14 s/img Ref: https://pjreddie.com/publications/

Accurate object detection is slow! Pascal 2007 map Speed DPM v5 33.7.07 FPS 14 s/img R-CNN 66.0.05 FPS 20 s/img Ref: https://pjreddie.com/publications/

Accurate object detection is slow! Pascal 2007 map Speed DPM v5 33.7.07 FPS 14 s/img R-CNN 66.0.05 FPS 20 s/img ⅓ Mile, 1760 feet Ref: https://pjreddie.com/publications/

Accurate object detection is slow! Pascal 2007 map Speed DPM v5 33.7.07 FPS 14 s/img R-CNN 66.0.05 FPS 20 s/img Fast R-CNN 70.0.5 FPS 2 s/img 176 feet Ref: https://pjreddie.com/publications/

Accurate object detection is slow! Pascal 2007 map Speed DPM v5 33.7.07 FPS 14 s/img R-CNN 66.0.05 FPS 20 s/img Fast R-CNN 70.0.5 FPS 2 s/img Faster R-CNN 73.2 7 FPS 140 ms/img 8 feet 12 feet Ref: https://pjreddie.com/publications/

Accurate object detection is slow! Pascal 2007 map Speed DPM v5 33.7.07 FPS 14 s/img R-CNN 66.0.05 FPS 20 s/img Fast R-CNN 70.0.5 FPS 2 s/img Faster R-CNN 73.2 7 FPS 140 ms/img YOLO 63.4 45 FPS 22 ms/img 2 feet Ref: https://pjreddie.com/publications/

Error Analysis Loc: Localization Error Correct class,.1<iou<.5 Background: IOU<0.1

YOLO generalizes well to new domains (like art) Ref: https://pjreddie.com/publications/

It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork S. Ginosar, D. Haas, T. Brown, and J. Malik. Detecting people in cubist art. In Computer Vision-ECCV 2014 Workshops, pages 101 116. Springer, 2014. H. Cai, Q. Wu, T. Corradi, and P. Hall. The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs.

Demo https://youtu.be/voc3huqhrss https://youtu.be/voc3huqhrss

Strengths and Weaknesses Strengths: Fast: 45fps, smaller version 155fps End2end training Background error is low

Strengths and Weaknesses Weaknesses: Performance is lower than state-of-art Makes more localization errors

Open Questions How to determine the number of cell, bounding box and the size of the box Why normalization x,y,w,h even all the input images have the same resolution?

Extension Part YOLOv2! arxiv:1612.08242