Progress in Computer Vision in the Last Decade & Open Problems: People Detection & Human Pose Estimation

Size: px
Start display at page:

Download "Progress in Computer Vision in the Last Decade & Open Problems: People Detection & Human Pose Estimation"

Transcription

1 Progress in Computer Vision in the Last Decade & Open Problems: People Detection & Human Pose Estimation Bernt Schiele Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken

2 Overview People Detection & Tracking progress since today s state-of-the-art open problems Human Pose Estimation progress since [Felzenszwalb,Huttenlocher@ijcv05] today s state-of-the-art open problems Some More Open Problems some recent progress on open problems!2

3 People Detection Ten years of pedestrian detection, what have we learned? R. Benenson, M. Omran, J. Hosang and B. Schiele, ECCV workshop 14 How Far are We from Solving Pedestrian Detection? S. Zhang, R. Benenson, M. Omran, J. Hosang and B. Schiele, CVPR 16 Learning Non-Maximum Suppression J. Hosang, R. Benenson, and B. Schiele, CVPR 17 CityPersons: A Diverse Dataset for Pedestrian Detection S. Zhang, R. Benenson and B. Schiele, CVPR 17 Occluded Pedestrian Detection Through Guided Attention in CNNs S. Zhang, J. Yang and B. Schiele, CVPR 18 Shanshan Zhang Rodrigo Benenson Jan Hosang Mohamed Omran Bernt Schiele

4 12] Caltech Pedestrian Benchmark Features of the Pedestrian Dataset: 11h of normal driving in urban environment (greater LA area) annotation: - 250,000 frames (~137 min) annotated with 350,000 labeled bounding boxes of 2,300 unique pedestrians - occlusion annotation: 2 bounding boxes for entire pedestrian & visible region - difference between single person and groups of people Caltech-USA currently the most active dataset!4

5 Great Progress in Pedestrian Detection Within 10+ Years Current Deep Networks lower is better 5

6 workshop 14] What is Driving the Detection Quality? Multi-scale models? Additional (test time) data (e.g. flow, stereo)? Exploiting context? Training data? Better features? 6

7 workshop 14] What is Driving the Detection Quality? Multi-scale models - not much helps a bit (1-2%), but not key for quality in Caltech-USA Additional (test time) data (e.g. flow, stereo) - a bit using more frames (flow or stereo) helps Exploiting context? a bit more... expect 2-5% improvement when using context Training data Better features 7

8 workshop 14] What is Driving the Detection Quality? Multi-scale models - not much Additional (test time) data (e.g. flow, stereo) - a bit Exploiting context - a bit more Training data? Better features 8

9 workshop 14] Training Data Matters (you knew this already) 9

10 Features Alone Can Explain (Almost) 10 Years of Progress workshop 14]!10

11 workshop 14] What have we learned? The key role of data More training data improves performance Data collection (& annotation) is key requirement for good performance - question: how to obtain sufficient data? The key role of features Features alone explain a decade of people detection progress Flow, context, and strong features are complementary (currently) What about deep learning??? 11

12 Leading Methods on Caltech All Leading Methods are CNN-based: SAF R-CNN: R-CNN (jointly) trained for 2 scales ~9% - Scale-Aware Fast R-CNN for Pedestrian Detection by J. Li, X. Liang, S. Shen, T. Xu, J. Feng & S. TransMultimedia'17 F-DNN: Cascade of multiple CNNs ~9% - Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection by X. Du, M. El-Khamy, J. Lee & L. wacv 17 SA-FastRCNN: FasterRCNN trained better ~10% - Is Faster R-CNN Doing Well for Pedestrian Detection? by L. Zhang, L. Lin, Z. Liang, K. eccv 16 Faster-RCNN: FasterRCNN trained better ~10% - CityPersons: A Diverse Dataset for Pedestrian Detection by S. Zhang, R. Benenson & B. cvpr 17!12

13 How Far are We from Solving Pedestrian Detection? S. Zhang, R. Benenson, M. Omran, J. Hosang, CVPR 16

14 How Good are Today s Detectors? [Zhang,Benenson,Omran,Hosang,Schiele@CVPR 16] Human baseline Annotators: pedestrian detection experts Single frame observation Upper bound of performance State-of-the-art detectors Far behind human baseline A lot of room for improvement - False positives - False negatives Human Baseline!14

15 16] False Positive: Background Errors Vertical structures Lights Car parts Tree leaves Other background!15

16 False Negative Sources 16] Small Scales Partial Occlusion Side View Annotation Errors Cyclists Others False Negative Statistics!16

17 ConvNets and Pedestrian Detection Failure Cases False positives: - ConvNets reduce background errors, but still significant problem - ConvNets still have problems with localization (double detection, etc.) an attempt to overcome this problem: "Learning Non-Maxmimum Suppression" [Hosang,Benenson,Schiele@cvpr17] False negatives: - ConvNets should improve for certain cases (side views, cyclists, etc.) provided sufficient training data for such cases - today s ConvNets have issues with small scale, partial occlusion, etc. Annotation quality appears crucial for best performance at least for today s ConvNets!17

18 Multi-Person Tracking Based on Multicut & Subgraph Partitioning Subgraph Decomposition for Multi-Object Tracking S. Tang, B. Andres, M. Andriluka and B. Schiele, CVPR 15 Multi-Person Tracking by Multicuts and Deep Matching S. Tang, B. Andres, M. Andriluka and B. Schiele ECCV 16 workshop winner of the Multi Object Tracking (MOT) 2016 challenge Multi-Person Tracking by Lifted Multicut and Multi-Person Re-Identification S. Tang, M. Andriluka, B. Andres and B. Schiele, CVPR 17 winner of the Multi Object Tracking (MOT) 2017 challenge Siyu Tang Björn Andres Micha Andriluka Bernt Schiele

19 17] Multicut Tracking - Motivation Tracking as global association problem frame 10 frame 30 frame 50 Typically addressed as disjoint paths problem!19

20 17] Multicut Tracking - Motivation Subgraph decomposition for multi-object tracking frame 10 frame 30 frame 50 Desired property of tracking by graph decomposition joint spatial-temporal association resulting in robust tracking results!20

21 The Underlying Graph in Space-Time Domain: Visualizing Disjoint Paths Solution x Red dots: detection hypotheses Black Red dots lines: and linking lines: hypotheses the disjoint paths of each person Disjoint Paths Associations are brittle and fragile. y time!21

22 The Underlying Graph in Space-Time Domain: Visualizing MultiCut Subgraph Solution x Decompositions (clusters) Associations more robust y Black dots and lines: the decomposition (cluster) for each person. time!22

23 15] Multicut Tracking - Results Detection Hypotheses Tracklet Hypotheses Multicut Decomposition Final Tracks Dotted rectangles are interpolated tracks.!23

24 15] Multicut Tracking - Results Dotted rectangles are interpolated tracks. Decompositions (clusters) Tracks!24

25 17] Results on MOT 16 Lifted Multicut SenseTime!25

26 People Detection & Tracking Take Home Messages / Open Problems Training data matters how to obtain sufficient training data? quality of annotations matters too? - at least using today s CNNs and loss functions open problems: - how to train with weaker or no supervision? - how to generate relevant training data e.g. for rare cases Key role of features CNNs / deep networks promise to help open problems: - classifiers for background (both false negatives & false positives) - small scale & partial occlusion are problematic for todays CNNs Modeling of structure often crucial briefly discussed for multi-person tracking, similar for people detection open problem: - what is the right balance between (manual) structure and end-to-end trained models!26

27 Overview People Detection progress since today s state-of-the-art open problems Human Pose Estimation progress since [Felzenszwalb,Huttenlocher@ijcv05] today s state-of-the-art open problems Some More Open Problems!27

28 Human Pose Estimation Discriminative Appearance Models for Pictorial Structures M. Andriluka, S. Roth, and B. Schiele, IJCV 11 2D Human Pose Estimation: New Benchmark and State-of-the-Art Analysis M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, CVPR 14 DeepCut: Joint Subset Partition and Labeling for Multi-Person Pose Estimation L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. Gehler, and B. Schiele, CVPR 16 DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, ECCV 16 ArtTrack: Articulated Multi-Person Tracking in the Wild E. Insafutdinov, L. Pishchulin, M. Andriluka, and B. Schiele, CVPR 17 Leonid Pishchulin Eldar Insafutinov Siyu Tang Björn Andres Micha Andriluka Peter Gehler Bernt Schiele

29 Human Pose Estimation - what happened so far Single Person Pose Estimation - two phases Phase 1: pictorial structures models e.g. [Felzenszwalb&Huttenlocher@ijcv05], [Andriluka&al@ijcv11], [Yang&Ramanan@pami13], [Pishchulin&al@iccv13], Phase 2: using deep learning e.g. [Thoshev,Szegedy@cvpr14], [Thompson&al@nips14], [Chen&Yuille@nips14], [Carreira&al@cvpr16], [Hu&Ramanan@cvpr16], [Wei&al@cvpr16], [Newell&al@cvpr16],!29

30 MPII Human Pose Dataset: Dataset demo 410 human activities (after merging similar activities) over 40,000 annotated poses over 1.5M video frames Activity Categories Activities 14] Images

31 Analysis - overall performance Best Methods now: deep learning takes over PCKh total, MPII Single Person Best Method as of ICCV 13 since CVPR 14, dataset has become de-facto standard benchmark large training set facilitated development of deep learning methods!31

32 Human Pose Estimation Single Person Pose Estimation - two phases Phase 1: pictorial structures models e.g. [Felzenszwalb&Huttenlocher@ijcv05], [Andriluka&al@ijcv11], [Yang&Ramanan@pami13], [Pishchulin&al@iccv13], Phase 2: using deep learning e.g. [Thoshev,Szegedy@cvpr14], [Thompson&al@nips14], [Chen&Yuille@nips14], [Carreira&al@cvpr16], [Hu&Ramanan@cvpr16], [Wei&al@cvpr16], [Newell&al@cvpr16], Multi Person Pose Estimation - far fewer publications in particular: [Eichner&Ferrari@eccv10], [Ghiasi&al@cvpr14], [Chen&Yuille@cvpr15],...!32

33 Deep(er)Cut: Joint Subset Partition & Labeling for Multi Person Pose Estimation DeepCut: Joint Subset Partition and Labeling for Multi-Person Pose Estimation L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. Gehler, and B. Schiele, CVPR 16 DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, ECCV 16 ArtTrack: Articulated Multi-Person Tracking in the Wild E. Insafutdinov, L. Pishchulin, M. Andriluka, and B. Schiele, CVPR 17

34 Multi Person Pose Estimation Standard 2-stage approach: 1. stage: person detection to estimate bounding box 2. stage: single person pose estimation per bounding box - cannot recover from errors of 1. stage - occlusion reasoning across people!34

35 Multi Person Pose Estimation Our approach - DeepCut [cvpr 16] & DeeperCut [eccv 16] joint formulation for person detection and pose estimation formulated as subset partitioning and labeling problem jointly estimates: number of people and their pose, partial occlusion & truncation,!35

36 Deep(er)Cut - Overview dense graph joint subset partitioning person clusters II detection candidates III body part labeling labeled body parts I!36

37 I: Detection Candidates: Unary probabilities Sliding CNN VGG / ResNet conv maps cross-entropy self-regression pairwise regression scoremaps unary vector fields pairwise vector fields FCs FCs loss functions dense outputs (unary) body part detections!37

38 I: Detection Candidates: Unary probabilities Sliding CNN VGG conv maps / ResNet cross-entropy self-regression pairwise regression scoremaps unary vector fields pairwise vector fields FCs FCs loss functions dense outputs sampling top detection candidates sho 0.9 elb 0.5 wri 0.1 ank 0.0!38

39 II: Graph Construction Pairwise Probabilities: same person & same body part (clustering of part detections) - green: high probability - red: low probability d4 d1 d3 d2 same person & different body parts (kinematic relations) - pairwise regression to other body parts - logistic regression from both spatial relation and appearance (image conditioned pairwise probabilities) 1 2 different persons (multi-person & occlusion reasoning) - logistic regression from both spatial relation and appearance d4 d1 d3 d2!39

40 Deep(er)Cut - Overview dense graph joint subset partitioning person clusters II detection candidates III body part labeling labeled body parts I!40

41 Qualitative Results MPII-Multi Person 2-stage DeepCut!41

42 ArtTrack [cvpr 17] Key idea: extend DeepCut to temporal domain joint inference performed in spatio-temporal graph Figure 2: Top: Body joint detection hypotheses shown for three frames. Middle: Spatio-temporal graph with spatial edges (blue) and temporal edges for head (red) and neck (yellow). We only show a subset of the edges. Bottom: Estimated poses for all persons in the video. Each color corresponds to a unique person identity. union of edges of aprogress fully connected in Person graph Detection for eachand frame, Human is Pose removed Estimation while t in (df the,d 0 f last Decade Bernt Schiele 0 ) =0 with (d f,d 0 f 0) 2 E t implies!42

43 ArtTrack: Qualitative Results [cvpr 17]!43

44 Human Pose Estimation: Take Home Messages / Open Problems Training data matters large datasets like MPII Human Pose push performance Single Person Pose Estimation - impressive performance already off-the-shelf CNNs like ResNet obtain top-performance iterative refinements (Stacked Hourglass, Convolutional Pose Machines) also obtain top-performance Multi Person Pose Estimation - not working as well underrepresented in literature & more work needed - training data lacking (can we collect sufficient training data?) structured models & reasoning important - problem complexity higher!44

45 Overview People Detection & Tracking progress since today s state-of-the-art open problems Human Pose Estimation progress since [Felzenszwalb,Huttenlocher@ijcv05] today s state-of-the-art open problems Some More Open Problems!45

46 Take Home Messages & Some Open Problems Machine learning has been and will continue to be a driver Lots of data (internet, storage, ) Fast processing (CPU and GPU clusters, ) Powerful machine learning models (deep learning, graphical models, ) BUT we should not get carried away by current successes of deep learning! current successes largely depend on fully supervised training using large datasets!46

47 Some Open Problems (1/3) It is impossible to get sufficient training data for everything! Then: How to train from insufficient training data? lack of annotation (costly) - how to train with NO or WEAKER supervision - unsupervised & semi-supervised learning!47

48 Simple Does It: Weakly Supervised (Instance and) Semantic Segmentation CVPR 2017 Anna Khoreva Rodrigo Benenson Jan Hosang Matthias Hein Bernt Schiele

49 Weakly Supervised Semantic Segmentation Question: Is it possible to obtain high-quality segmentations with weak supervision such as bounding box annotations? Person Person Horse Horse Full supervision Time-consuming. Weak supervision Only 2 clicks per object (bounding box annotations)!49

50 Generation of Annotations classic methods Ground truth Image + Boxes Bounding boxe contain info about the object: 1. Background 2. Object extend 3. Objectness Non-consensus regions are set to ignore labels. Segment proposals [MCG, Pont-Tuset et al.] Consensus Non-consensus Consensus GrabCut [Rother et al.] Input for CNN!50

51 Quantitative Results: DeepLab ResNet classic methods Image + Boxes Input for CNN Main result: With only box supervision we achieve 95% quality of the fully supervised model. miou Weak Full Supervision!51

52 Image Ground truth Weak supervision Full supervision!52

53 Learning Video Object Segmentation from Static Images CVPR 2017 Anna Khoreva * Frederico Perazzi * Rodrigo Benenson Bernt Schiele Alexander Sorkine-Hornung

54 Video Object Segmentation Goal: Separating a specific foreground object from background in a video given its 1 st frame mask annotation. Object 1 Object 2 1 st frame t DAVIS 2016 [Perazzi et al. 16]!54

55 MaskTrack - Proposed Approach we process video per-frame, using guidance from previous frame MaskTrack Frame t-1 output mask Frame t input DeepLab [Chen et al., ICLR 15] we can train using static images only Frame t output mask!55

56 Qualitative Results

57 Some Open Problems (1/3) It is impossible to get sufficient training data for everything! Then: How to train from insufficient training data? lack of annotation (costly) - how to train with NO or WEAKER supervision - unsupervised & semi-supervised learning lack of necessary data for training (with and without labels): - e.g. autonomous driving: child running across street - data generation methods (computer graphics & generative adversarial networks) real data distributions skewed & exponential decrease of samples - most classes have close to no training data (how to train from very few samples)!57

58 Generative Adversarial Network for Features (F-GAN) Idea learn to generate features for any object class generation is conditioned on some class embedding c(y) c(y) can be class attributes, sentences, Word2Vec, etc. Head color: brown Belly color: yellow Bill shape: pointy discriminator CNN Head color: brown Belly color: yellow Bill shape: pointy z ~ N(0, 1) f-clswgan generator!58

59 Some Open Problems (2/3) How to integrate prior knowledge & end-to-end learning linked to lack of necessary & sufficient data prior knowledge helps whenever training data is limited - examples: tracking, pose estimation, scene understanding manual modeling vs. automatic learning - manual modeling vs. automatic mining (e.g. from wikipedia) of prior knowledge - what is the right (structural) model to integrate prior knowledge? (e.g. for scene understanding) - what to learn: prior knowledge? model structure itself? just parameters? something else?!59

60 13] 3D Scene Understanding 3D scene analysis for mobile platforms (i.e. robots, cars) mobile observer aims to understand its 3D mobile environment i.e. traffic, people, etc Application scenarios Traffic safety and driver assistance Autonomous vehicles Robotics!60

61 13] A state-of-the-art Approach (monocular camera) Image sequence Bayesian 3D Scene Model Object detec:ons T-1 T T+1 Scene tracklets Seman:c scene labels Prior informa:on (camera, objects)!61

62 11] Sample Result including Occlusion Reasoning!62

63 13] System sample video (different types of vehicles) Message modeling and reasoning in 3D is powerful: e.g. for occlusion reasoning, inclusion of prior information!63

64 Some Open Problems (3/3) Understanding and introspection of deep-learning results status-quo - neural networks often considered black boxes - lots of trial-and-error (for network structure, training, etc.) - rather unclear why certain architectures / learning regimes work better ideally - we should understand which architectures work when, and why - deep architectures should allow for introspection why a certain (in)correct result was obtained explain its own reasoning!64

65 Current Example from our Group [Park, Hendricks, Akata, Schiele, Darrell, Rohrbach; CVPR18] Visual and Textual Explanation Interface: VQA!65

66 Progress in Computer Vision in the Last Decade & Open Problems: People Detection & Human Pose Estimation Bernt Schiele Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken

67

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Occluded Pedestrian Detection Through Guided Attention in CNNs

Occluded Pedestrian Detection Through Guided Attention in CNNs Occluded Pedestrian Detection Through Guided Attention in CNNs Shanshan Zhang 1, Jian Yang 2, Bernt Schiele 3 1 Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry

More information

ECE 6554:Advanced Computer Vision Pose Estimation

ECE 6554:Advanced Computer Vision Pose Estimation ECE 6554:Advanced Computer Vision Pose Estimation Sujay Yadawadkar, Virginia Tech, Agenda: Pose Estimation: Part Based Models for Pose Estimation Pose Estimation with Convolutional Neural Networks (Deep

More information

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, Yichen Wei UT Austin & MSRA & Fudan Human Pose Estimation Pose representation

More information

arxiv: v1 [cs.cv] 20 Nov 2015

arxiv: v1 [cs.cv] 20 Nov 2015 eepcut: Joint Subset Partition and Labeling for Multi Person Pose Estimation Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka,, Peter Gehler, and Bernt Schiele arxiv:.0v

More information

arxiv: v2 [cs.cv] 26 Apr 2016

arxiv: v2 [cs.cv] 26 Apr 2016 eepcut: Joint Subset Partition and Labeling for Multi Person Pose Estimation Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka,, Peter Gehler, and Bernt Schiele arxiv:.0v

More information

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Combining PGMs and Discriminative Models for Upper Body Pose Detection Combining PGMs and Discriminative Models for Upper Body Pose Detection Gedas Bertasius May 30, 2014 1 Introduction In this project, I utilized probabilistic graphical models together with discriminative

More information

Human Pose Estimation with Deep Learning. Wei Yang

Human Pose Estimation with Deep Learning. Wei Yang Human Pose Estimation with Deep Learning Wei Yang Applications Understand Activities Family Robots American Heist (2014) - The Bank Robbery Scene 2 What do we need to know to recognize a crime scene? 3

More information

Learning Video Object Segmentation from Static Images Supplementary material

Learning Video Object Segmentation from Static Images Supplementary material Learning Video Object ation from Static Images Supplementary material * Federico Perazzi 1,2 * Anna Khoreva 3 Rodrigo Benenson 3 Bernt Schiele 3 Alexander Sorkine-Hornung 1 1 Disney Research 2 ETH Zurich

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Part-Based Models for Object Class Recognition Part 2

Part-Based Models for Object Class Recognition Part 2 High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Class of Object

More information

Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications Supplement

Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications Supplement Joint Graph ecomposition & Node Labeling: Problem, Algorithms, Applications Supplement Evgeny Levinkov 1, Jonas Uhrig 3,4, Siyu Tang 1,2, Mohamed Omran 1, Eldar Insafutdinov 1, Alexander Kirillov 5, Carsten

More information

ArtTrack: Articulated Multi-person Tracking in the Wild

ArtTrack: Articulated Multi-person Tracking in the Wild ArtTrack: Articulated Multi-person Tracking in the Wild Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, Bernt Schiele Max Planck Institute for Informatics

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Flow-Based Video Recognition

Flow-Based Video Recognition Flow-Based Video Recognition Jifeng Dai Visual Computing Group, Microsoft Research Asia Joint work with Xizhou Zhu*, Yuwen Xiong*, Yujie Wang*, Lu Yuan and Yichen Wei (* interns) Talk pipeline Introduction

More information

Occlusion Patterns for Object Class Detection

Occlusion Patterns for Object Class Detection Occlusion Patterns for Object Class Detection Bojan Pepik1 Michael Stark1,2 Peter Gehler3 Bernt Schiele1 Max Planck Institute for Informatics, 2Stanford University, 3Max Planck Institute for Intelligent

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

Part-Based Models for Object Class Recognition Part 3

Part-Based Models for Object Class Recognition Part 3 High Level Computer Vision! Part-Based Models for Object Class Recognition Part 3 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de! http://www.d2.mpi-inf.mpg.de/cv ! State-of-the-Art

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Intro to Deep Learning for Computer Vision

Intro to Deep Learning for Computer Vision High Level Computer Vision Intro to Deep Learning for Computer Vision Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de https://www.mpi-inf.mpg.de/hlcv Overview Today Recent successes

More information

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction Marc Pollefeys Joined work with Nikolay Savinov, Christian Haene, Lubor Ladicky 2 Comparison to Volumetric Fusion Higher-order ray

More information

Human Pose Estimation using Global and Local Normalization. Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang

Human Pose Estimation using Global and Local Normalization. Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang Human Pose Estimation using Global and Local Normalization Ke Sun, Cuiling Lan, Junliang Xing, Wenjun Zeng, Dong Liu, Jingdong Wang Overview of the supplementary material In this supplementary material,

More information

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou Amodal and Panoptic Segmentation Stephanie Liu, Andrew Zhou This lecture: 1. 2. 3. 4. Semantic Amodal Segmentation Cityscapes Dataset ADE20K Dataset Panoptic Segmentation Semantic Amodal Segmentation Yan

More information

Illuminating Pedestrians via Simultaneous Detection & Segmentation

Illuminating Pedestrians via Simultaneous Detection & Segmentation Illuminating Pedestrians via Simultaneous Detection & Segmentation Garrick Brazil, Xi Yin, Xiaoming Liu Michigan State University, East Lansing, MI 48824 {brazilga, yinxi1, liuxm}@msu.edu Abstract Pedestrian

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

A Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation

A Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation A Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation Submission ID: 2065 Abstract Accurate keypoint localization of human pose needs diversified features:

More information

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA) Detecting and Parsing of Visual Objects: Humans and Animals Alan Yuille (UCLA) Summary This talk describes recent work on detection and parsing visual objects. The methods represent objects in terms of

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Multiple-Person Tracking by Detection

Multiple-Person Tracking by Detection http://excel.fit.vutbr.cz Multiple-Person Tracking by Detection Jakub Vojvoda* Abstract Detection and tracking of multiple person is challenging problem mainly due to complexity of scene and large intra-class

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell Constrained Convolutional Neural Networks for Weakly Supervised Segmentation Deepak Pathak, Philipp Krähenbühl and Trevor Darrell 1 Multi-class Image Segmentation Assign a class label to each pixel in

More information

A Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation

A Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) A Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation Wentao Liu, 1,2 Jie Chen,

More information

Joint Object Detection and Viewpoint Estimation using CNN features

Joint Object Detection and Viewpoint Estimation using CNN features Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel, David Martín and José M. Armingol cguindel@ing.uc3m.es Intelligent Systems Laboratory Universidad Carlos III de Madrid

More information

PoseTrack: Joint Multi-Person Pose Estimation and Tracking

PoseTrack: Joint Multi-Person Pose Estimation and Tracking PoseTrack: Joint Multi-Person Pose Estimation and Tracking Umar Iqbal1, Anton Milan2, and Juergen Gall1 1 Computer Vision Group, University of Bonn, Germany 2 Australian Centre for Visual Technologies,

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

VISION FOR AUTOMOTIVE DRIVING

VISION FOR AUTOMOTIVE DRIVING VISION FOR AUTOMOTIVE DRIVING French Japanese Workshop on Deep Learning & AI, Paris, October 25th, 2017 Quoc Cuong PHAM, PhD Vision and Content Engineering Lab AI & MACHINE LEARNING FOR ADAS AND SELF-DRIVING

More information

arxiv: v2 [cs.cv] 23 Nov 2016 Abstract

arxiv: v2 [cs.cv] 23 Nov 2016 Abstract Simple Does It: Weakly Supervised Instance and Semantic Segmentation Anna Khoreva 1 Rodrigo Benenson 1 Jan Hosang 1 Matthias Hein 2 Bernt Schiele 1 1 Max Planck Institute for Informatics, Saarbrücken,

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Raymond Ptucha, Rochester Institute of Technology, USA Tutorial-9 May 19, 218 www.nvidia.com/dli R. Ptucha 18 1 Fair Use Agreement

More information

A Study of Vehicle Detector Generalization on U.S. Highway

A Study of Vehicle Detector Generalization on U.S. Highway 26 IEEE 9th International Conference on Intelligent Transportation Systems (ITSC) Windsor Oceanico Hotel, Rio de Janeiro, Brazil, November -4, 26 A Study of Vehicle Generalization on U.S. Highway Rakesh

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Introduction In this supplementary material, Section 2 details the 3D annotation for CAD models and real

More information

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction

More information

Fast and Accurate Online Video Object Segmentation via Tracking Parts

Fast and Accurate Online Video Object Segmentation via Tracking Parts Fast and Accurate Online Video Object Segmentation via Tracking Parts Jingchun Cheng 1,2 Yi-Hsuan Tsai 3 Wei-Chih Hung 2 Shengjin Wang 1 * Ming-Hsuan Yang 2 1 Tsinghua University 2 University of California,

More information

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation Lecturer: Yunchao Wei Image Formation and Processing (IFP) Group University of Illinois at Urbanahttps://weiyc.githu Champaign

More information

Exploiting noisy web data for largescale visual recognition

Exploiting noisy web data for largescale visual recognition Exploiting noisy web data for largescale visual recognition Lamberto Ballan University of Padova, Italy CVPRW WebVision - Jul 26, 2017 Datasets drive computer vision progress ImageNet Slide credit: O.

More information

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research) Joint Inference in Image Databases via Dense Correspondence Michael Rubinstein MIT CSAIL (while interning at Microsoft Research) My work Throughout the year (and my PhD thesis): Temporal Video Analysis

More information

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning

Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Allan Zelener Dissertation Proposal December 12 th 2016 Object Localization, Segmentation, Classification, and Pose Estimation in 3D Images using Deep Learning Overview 1. Introduction to 3D Object Identification

More information

Pose for Action Action for Pose

Pose for Action Action for Pose Pose for Action Action for Pose Umar Iqbal, Martin Garbade, and Juergen Gall Computer Vision Group, University of Bonn, Germany {uiqbal, garbade, gall}@iai.uni-bonn.de Abstract In this work we propose

More information

Fast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017

Fast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017 Fast scene understanding and prediction for autonomous platforms Bert De Brabandere, KU Leuven, October 2017 Who am I? MSc in Electrical Engineering at KU Leuven, Belgium Last year PhD student with Luc

More information

arxiv: v2 [cs.cv] 10 Feb 2017

arxiv: v2 [cs.cv] 10 Feb 2017 Pose for Action Action for Pose Umar Iqbal, Martin Garbade, and Juergen Gall Computer Vision Group, University of Bonn, Germany {uiqbal, garbade, gall}@iai.uni-bonn.de arxiv:1603.04037v2 [cs.cv] 10 Feb

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Segmenting Objects in Weakly Labeled Videos

Segmenting Objects in Weakly Labeled Videos Segmenting Objects in Weakly Labeled Videos Mrigank Rochan, Shafin Rahman, Neil D.B. Bruce, Yang Wang Department of Computer Science University of Manitoba Winnipeg, Canada {mrochan, shafin12, bruce, ywang}@cs.umanitoba.ca

More information

Object Detection by 3D Aspectlets and Occlusion Reasoning

Object Detection by 3D Aspectlets and Occlusion Reasoning Object Detection by 3D Aspectlets and Occlusion Reasoning Yu Xiang University of Michigan Silvio Savarese Stanford University In the 4th International IEEE Workshop on 3D Representation and Recognition

More information

High Level Computer Vision

High Level Computer Vision High Level Computer Vision Part-Based Models for Object Class Recognition Part 2 Bernt Schiele - schiele@mpi-inf.mpg.de Mario Fritz - mfritz@mpi-inf.mpg.de http://www.d2.mpi-inf.mpg.de/cv Please Note No

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Krishna Kumar Singh and Yong Jae Lee University of California, Davis ---- Paper Presentation Yixian

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

CS231N Section. Video Understanding 6/1/2018

CS231N Section. Video Understanding 6/1/2018 CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image

More information

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection ILSVRC 2016 Object Detection from Video Byungjae Lee¹, Songguo Jin¹, Enkhbayar Erdenee¹, Mi Young Nam², Young Gui Jung², Phill Kyu

More information

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Supplementary Material Chi Li, M. Zeeshan Zia 2, Quoc-Huy Tran 2, Xiang Yu 2, Gregory D. Hager, and Manmohan Chandraker 2 Johns

More information

How Far are We from Solving Pedestrian Detection?

How Far are We from Solving Pedestrian Detection? How Far are We from Solving Pedestrian Detection? Shanshan Zhang, Rodrigo Benenson, Mohamed Omran, Jan Hosang and Bernt Schiele Max Planck Institute for Informatics Saarbrücken, Germany firstname.lastname@mpi-inf.mpg.de

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Multi-Scale Structure-Aware Network for Human Pose Estimation

Multi-Scale Structure-Aware Network for Human Pose Estimation Multi-Scale Structure-Aware Network for Human Pose Estimation Lipeng Ke 1, Ming-Ching Chang 2, Honggang Qi 1, and Siwei Lyu 2 1 University of Chinese Academy of Sciences, Beijing, China 2 University at

More information

Depth from Stereo. Dominic Cheng February 7, 2018

Depth from Stereo. Dominic Cheng February 7, 2018 Depth from Stereo Dominic Cheng February 7, 2018 Agenda 1. Introduction to stereo 2. Efficient Deep Learning for Stereo Matching (W. Luo, A. Schwing, and R. Urtasun. In CVPR 2016.) 3. Cascade Residual

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Exploiting scene constraints to improve object detection algorithms for industrial applications

Exploiting scene constraints to improve object detection algorithms for industrial applications Exploiting scene constraints to improve object detection algorithms for industrial applications PhD Public Defense Steven Puttemans Promotor: Toon Goedemé 2 A general introduction Object detection? Help

More information

Multi-Scale Structure-Aware Network for Human Pose Estimation

Multi-Scale Structure-Aware Network for Human Pose Estimation Multi-Scale Structure-Aware Network for Human Pose Estimation Lipeng Ke 1, Ming-Ching Chang 2, Honggang Qi 1, Siwei Lyu 2 1 University of Chinese Academy of Sciences, Beijing, China 2 University at Albany,

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

arxiv: v1 [cs.ro] 24 Jul 2018

arxiv: v1 [cs.ro] 24 Jul 2018 ClusterNet: Instance Segmentation in RGB-D Images Lin Shao, Ye Tian, Jeannette Bohg Stanford University United States lins2,yetian1,bohg@stanford.edu arxiv:1807.08894v1 [cs.ro] 24 Jul 2018 Abstract: We

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Region-based Segmentation and Object Detection

Region-based Segmentation and Object Detection Region-based Segmentation and Object Detection Stephen Gould Tianshi Gao Daphne Koller Presented at NIPS 2009 Discussion and Slides by Eric Wang April 23, 2010 Outline Introduction Model Overview Model

More information

arxiv: v2 [cs.cv] 28 May 2017

arxiv: v2 [cs.cv] 28 May 2017 Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection Xianzhi Du 1, Mostafa El-Khamy 2, Jungwon Lee 2, Larry S. Davis 1 arxiv:1610.03466v2 [cs.cv] 28 May 2017 1 Computer

More information

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation Outline Exploring Context with Deep Structured Models Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel;

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Generative Adversarial Text to Image Synthesis

Generative Adversarial Text to Image Synthesis Generative Adversarial Text to Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee Presented by: Jingyao Zhan Contents Introduction Related Work Method

More information

Flow-free Video Object Segmentation

Flow-free Video Object Segmentation 1 Flow-free Video Object Segmentation Aditya Vora and Shanmuganathan Raman Electrical Engineering, Indian Institute of Technology Gandhinagar, Gujarat, India, 382355 Email: aditya.vora@iitgn.ac.in, shanmuga@iitgn.ac.in

More information

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016

ECCV Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 ECCV 2016 Presented by: Boris Ivanovic and Yolanda Wang CS 331B - November 16, 2016 Fundamental Question What is a good vector representation of an object? Something that can be easily predicted from 2D

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta

Vision based autonomous driving - A survey of recent methods. -Tejus Gupta Vision based autonomous driving - A survey of recent methods -Tejus Gupta Presently, there are three major paradigms for vision based autonomous driving: Directly map input image to driving action using

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Multi-Person Pose Estimation with Local Joint-to-Person Associations

Multi-Person Pose Estimation with Local Joint-to-Person Associations Multi-Person Pose Estimation with Local Joint-to-Person Associations Umar Iqbal and Juergen Gall omputer Vision Group University of Bonn, Germany {uiqbal,gall}@iai.uni-bonn.de Abstract. Despite of the

More information

An Efficient Convolutional Network for Human Pose Estimation

An Efficient Convolutional Network for Human Pose Estimation RAFI, LEIBE, GALL: 1 An Efficient Convolutional Network for Human Pose Estimation Umer Rafi 1 rafi@vision.rwth-aachen.de Juergen Gall 2 gall@informatik.uni-bonn.de Bastian Leibe 1 leibe@vision.rwth-aachen.de

More information

Revisiting 3D Geometric Models for Accurate Object Shape and Pose

Revisiting 3D Geometric Models for Accurate Object Shape and Pose Revisiting 3D Geometric Models for Accurate Object Shape and Pose M. 1 Michael Stark 2,3 Bernt Schiele 3 Konrad Schindler 1 1 Photogrammetry and Remote Sensing Laboratory Swiss Federal Institute of Technology

More information

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany

Deep Incremental Scene Understanding. Federico Tombari & Christian Rupprecht Technical University of Munich, Germany Deep Incremental Scene Understanding Federico Tombari & Christian Rupprecht Technical University of Munich, Germany C. Couprie et al. "Toward Real-time Indoor Semantic Segmentation Using Depth Information"

More information

An Efficient Convolutional Network for Human Pose Estimation

An Efficient Convolutional Network for Human Pose Estimation RAFI, KOSTRIKOV, GALL, LEIBE: EFFICIENT CNN FOR HUMAN POSE ESTIMATION 1 An Efficient Convolutional Network for Human Pose Estimation Umer Rafi 1 rafi@vision.rwth-aachen.de Ilya Kostrikov 1 ilya.kostrikov@rwth-aachen.de

More information

Pose estimation using a variety of techniques

Pose estimation using a variety of techniques Pose estimation using a variety of techniques Keegan Go Stanford University keegango@stanford.edu Abstract Vision is an integral part robotic systems a component that is needed for robots to interact robustly

More information

Multi-person Tracking by Multicut and Deep Matching

Multi-person Tracking by Multicut and Deep Matching Multi-person Tracking by Multicut and Deep Matching Siyu Tang (B), Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele Max Planck Institute for Informatics, Saarbrücken Informatics Campus, Saarbrücken,

More information

arxiv: v1 [cs.cv] 11 May 2018

arxiv: v1 [cs.cv] 11 May 2018 Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer Hao-Shu Fang 1, Guansong Lu 1, Xiaolin Fang 2, Jianwen Xie 3, Yu-Wing Tai 4, Cewu Lu 1 1 Shanghai Jiao Tong University,

More information

Data-driven Depth Inference from a Single Still Image

Data-driven Depth Inference from a Single Still Image Data-driven Depth Inference from a Single Still Image Kyunghee Kim Computer Science Department Stanford University kyunghee.kim@stanford.edu Abstract Given an indoor image, how to recover its depth information

More information

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos Sung Chun Lee, Chang Huang, and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA sungchun@usc.edu,

More information

RECURRENT NEURAL NETWORKS

RECURRENT NEURAL NETWORKS RECURRENT NEURAL NETWORKS Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning based method Supervised SVM MLP CNN RNN (LSTM) Localizati on GPS, SLAM Self Driving Perception Pedestrian

More information

Generative Adversarial Network

Generative Adversarial Network Generative Adversarial Network Many slides from NIPS 2014 Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio Generative adversarial

More information