JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Size: px

Start display at page:

Download "JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA"

Nancy Perkins
5 years ago
Views:

1 JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA

ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

2 ABOUT ME 5th year PhD student in Stanford by day, deep learning computer vision scientist by night. Intern with Deep Learning Applied Research (Autonomous NVIDIA, Oct-Dec

3 TALK OVERVIEW (1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations. 3

4 TALK OVERVIEW (1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations. 4

5 FROM SINGLE TO MULTITASK LEARNING Putting deep learning to work in the real world Detection Model... Object Bounding Boxes Segmentation Model... Segmentation Mask 5

6 FROM SINGLE TO MULTITASK LEARNING Putting deep learning to work in the real world Detection Model... Object Bounding Boxes Segmentation Model... Poor scalability + inefficient use of information! Segmentation Mask 6

7 FROM SINGLE TO MULTITASK LEARNING Putting deep learning to work in the real world How do we use one model to perform multiple tasks faster and better? Shared Model... Object Bounding Boxes Segmentation Mask 7

8 FROM SINGLE TO MULTITASK LEARNING Putting deep learning to work in the real world How do we use one model to perform multiple tasks faster and better? Shared Model... Object Bounding Boxes + edge detection, + surface normals, + distance estimation Segmentation Mask 8

9 FROM SINGLE TO MULTITASK LEARNING Putting deep learning to work in the real world How do we use one model to perform multiple tasks faster and better? Shared Model... Object Bounding Boxes Segmentation Mask How do you relate various tasks to each other in a multi-task neural network? 9

10 WHAT WE WILL SHOW By ordering tasks based on receptive field and information density, we improve segmentation and detection accuracy by ~2% and ~8% over single networks, respectively. The joint network is robust and easy to tune compared to non-hierarchical baselines. 10

11 TALK OVERVIEW (1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations. 11

CITYSCAPES DATASET 2975 Training Images @ resolution 1024 x 2048. 20 classes for semantic segmentation, including 8 object classes.

12 CITYSCAPES DATASET 2975 Training resolution 1024 x classes for semantic segmentation, including 8 object classes. Of these 8, 4 are much more represented (car, bicycle, person, rider): the easy classes. Both segmentation, bounding box, and edge ground truth can be generated. Raw Image Semantic Seg. Edge Detection Bounding Box 12

13 HOW TO TRAIN A SEGMENTATION NETWORK Standard FCN (Shelhamer 2015) Architecture: Convolutions followed by a deconvolution to retrieve a pixel-dense prediction mask. 13

14 HOW TO TRAIN A DETECTION NETWORK Network outputs confidence that a pixel lies near the center of an object. Points of high confidence produce bounding box coordinates. Confidences are rougher than full segmentation but robust to occlusion. 14

15 TALK OVERVIEW (1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations. 15

16 Input (1024 x 2048) Shared Feature Map (from base CNN) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv L = αl seg + (1- α)l det 16

17 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 17

18 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 18

19 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 19

20 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 20

21 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 21

22 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 22

23 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 23

24 OUR BASELINE MODEL PERFORMANCE Seg. Weight Det. Weight = α (α controls how much attention we pay to segmentation vs detection at training) 24

25 A LABEL HIERARCHY ALONG TWO AXES Required Receptive Field Object Bounding Boxes Density of Information 25

26 A LABEL HIERARCHY ALONG TWO AXES Required Receptive Field Object Bounding Boxes Object Confidence Density of Information 26

27 A LABEL HIERARCHY ALONG TWO AXES Required Receptive Field Object Bounding Boxes Object Confidence Semantic Segmentation Density of Information 27

28 A LABEL HIERARCHY ALONG TWO AXES Required Receptive Field Object Bounding Boxes Object Confidence Edge Detection (plus) Semantic Segmentation Density of Information 28

29 Input (1024 x 2048) Shared Feature Map (from base CNN) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv 29

30 Input (1024 x 2048) Shared Feature Map (from base CNN) Segmentation Obj. Confidence Obj. BBox Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv 30

31 Input (1024 x 2048) Shared Feature Map (from base CNN) Segmentation Obj. Confidence Obj. BBox Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv Decreasing information density 31

32 Input (1024 x 2048) Shared Feature Map (from base CNN) Edge Segmentation Obj. Confidence Obj. BBox Low-Res Edge Predictions (W x H x 3) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv Deconv Decreasing information density 32

33 Input (1024 x 2048) Shared Feature Map (from base CNN) Edge Segmentation Obj. Confidence Obj. BBox Low-Res Edge Predictions (W x H x 3) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv Deconv Decreasing information density 33

34 Input (1024 x 2048) Shared Feature Map (from base CNN) Edge Segmentation Obj. Confidence X Obj. BBox Low-Res Edge Predictions (W x H x 3) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv Deconv Decreasing information density 34

35 Input (1024 x 2048) Shared Feature Map (from base CNN) Edge Segmentation Obj. Confidence X Obj. BBox Low-Res Edge Predictions (W x H x 3) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Bbox Coordinate Positions Deconv Deconv Increasing receptive field 35

36 Input (1024 x 2048) Shared Feature Map (from base CNN) Edge Segmentation Obj. Confidence Obj. BBox Dilated Convs Low-Res Edge Predictions (W x H x 3) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Dilated Bbox Coordinate Positions Deconv Deconv Increasing receptive field 36

37 Input (1024 x 2048) Shared Feature Map (from base CNN) Edge Segmentation Obj. Confidence Obj. BBox Dilated Convs Low-Res Edge Predictions (W x H x 3) Low-Res Seg Predictions (W x H x 20) Obj. Confidence Positions Dilated Bbox Coordinate Positions Deconv Deconv Deep Hierarchical Network (DHM) 37

38 TALK OVERVIEW (1) Problem statement and summary. (2) Dataset and preliminaries. (3) Model motivation. (4) Results and visualizations. 38

39 RESULTS: HIGH ROBUSTNESS 39

40 RESULTS: HIGH ROBUSTNESS 40

41 41

42 RAW IMAGE Edge Predictions Bounding Box Predictions Segmentation Predictions 42

43 VISUALIZATIONS DETECTION SEGMENTAITION SINGLE NETWORK DHM (OURS) 43

44 VISUALIZATIONS SALIENCY (CAR) SEGMENTAITION SINGLE NETWORK DHM (OURS) 44

45 VISUALIZATIONS DETECTION SEGMENTAITION SINGLE NETWORK DHM (OURS) 45

46 VISUALIZATIONS DETECTION SEGMENTAITION SINGLE NETWORK DHM (OURS) 46

47 VISUALIZATIONS SALIENCY (BUS) SEGMENTAITION SINGLE NETWORK DHM (OURS) 47

48 VISUALIZATIONS DETECTION SEGMENTAITION SINGLE NETWORK DHM (OURS) 48

49 VISUALIZATIONS DETECTION SEGMENTAITION SINGLE NETWORK DHM (OURS) 49

50 VISUALIZATIONS DETECTION SEGMENTAITION SINGLE NETWORK DHM (OURS) 50

51 SUMMARY Our two hierarchies within our model allow our network to reason about intratask relationships: Information density: (Seg +) Edge > Seg > Object Conf > Bbox Receptive field: (Seg +) Edge = Bbox >> Object Conf > Seg With these relationships wired in, our network is: More accurate Robust to tuning Simultaneously better at fine detail and more instance aware Efficient and scalable (3 tasks, 1 network!) 51

52 REFERENCES J. Yao, S. Fidler, and R. Urtasun. Describing the scene as a whole: Joint object detection, scene classificationa and semantic segmentation. In CVPR, S. Gidaris and N. Komodakis. Object detection via a multiregion and semantic segmentation-aware cnn model. In ICCV, B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In ECCV, S. Liu, X. Qi, J. Shi, H. Zhang, and J. Jia. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In CVPR, E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Hypercolumns for object segmentation and finegrained localization. In CVPR, J. Dai, K. He, and J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In

53 THANK YOU! Special thanks to: My internship mentor: Jian Yao My managers: John Zedlewski and Andrew Tao All the wonderful people in DLAR/DLAV. Additional questions/comments: 53

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related