Todo before next class

Size: px

Start display at page:

Download "Todo before next class"

Rudolf Park
5 years ago
Views:

1 Todo before next class Each project group should submit a short project report (4 pages presentation slides) including 1. Problem definition 2. Related work 3. Preliminary results 4. Future plan Submission: to chad.dechant@columbia.edu by April 5 Note: your slides will be put on course website. The submission must be PDF file, named by your group number

2 Deep Networks for Image Classification and Detection Liangliang Cao llcao.net/cu-deeplearning17 2

3 Outline Difference of vision and speech and NLP ImageNet and model adaptation Recent trends in industry and academia Two recent works HyperFace Mask R-CNN with ResNet 3

4 Image Recognition is Lucky Why? - Data: Images are easier to label than speech/language - Data: Fei-Fei et al. made a lot of to release ImageNet - Platform: Nvidia s cudnn standardizes most important comp. - Platform: A number of great toolkits built on cudnn 4

5 ImageNet LSVRC 5

6 Treasure from ImageNet Dataset By adapting models trained from ImageNet, we can build a decent classifier with limited data. Very few new label Tune the last layer Or last layer as feature for SVM Example code : /gathered/examples/finetune_ flickr_style.html Enough new labels New tasks Tune the whole network 6

7 More Data, More Computation Caltech101, 8K Image ImageNet, 1.2M Image Yahoo YFCC, 100M Image Will this trend be plateaued or keep expanding? NVidia Stock Price 7

8 Deep Learning in Industry Data cleaning Startup examples More data Better model Model evolving 8

9 Deep Learning for Competition Kaggle small-scale image recognition Adapt several ImageNet models Dataaugmentation Study the failure examples, and find ways to conquer them ImageNet LSVRC Fuse complementary features using multi-gpu systems Identify the problem of existing models and fix it Larger scale (e.g., Youtube 8M video) Explore new scalable models 9

10 Plan for the remaining time: Fusing complementary features HyperFace Ranjan, Patel, Chellappa, arxiv , 2016 Integrating multiple tasks Mask R-CNN (w. ResNet) He, Gkioxari, Dollar, Girshick arxiv , 2017 Deeper understanding of the challenge for existing models 10

11 HyperFace A Deep Multi-task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition Rajeev Ranjan, Vishal M. Patel, Rama Chellappa arxiv ,

12 Tasks of HyperFace Face Detection Landmark Localization Pose Estimation Gender Recognition 12

13 HyperFace: Basic Idea Lower layers respond to edges and corners, and hence contain better localization properties higher layers are class-specific and suitable for semantic recognition including face recognition and gender. Features from lower and higher layers are complimentary. Fuse them! 13

14 HyperFace Network 14

15 Baseline 15

16 Loss function Detection Landmark location Visibility Pose Gender Total 16

17 Procedure 1. Selective search to get candidate regions 2. Normalize and scale each region 3. Predict the four tasks 4. Refine the prediction based on landmark detections 17

18 Face Detection Performance Face in the wild (AFW) 18

19 Face Detection Performance Face Detection Dataset and Benchmark (FDDB) 19

20 Landmark localization AFW 20

21 Landmark localization Annotated Facial Landmarks in the Wild (AFLW) 21

22 Landmark localization Annotated Facial Landmarks in the Wild (AFLW) 22

23 Pose Estimation AFW 23

24 Gender Recognition 24

25 Speed GTX Titan-X GPUs 3 seconds per image 2s for selective search to generate region proposals 0.2s for evaluate HyperFace network Questions or comments? Ok, my question: will a better region detector help HyperFace? 25

26 Mask R-CNN Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick arxiv ,

27 History Selective Search R-CNN Fast R-CNN Faster R-CNN Residual Network Mask R-CNN 27

28 R-CNN and Fast R-CNN R-CNN Fast R-CNN 200x faster than R- CNN in testing stage 28

29 Faster R-CNN à Mask R-CNN No longer use Selective Search Instead use network for region-proposal task Add a segmentation (mask) branch in addition to detection RoI pooling -> RoI aligment 29

30 RoI Pooling Layer Typical pooling layer: the size of output is (1/wh) of the input size. RoI pooling layer: the size of output is (7x7) no matter how large the input size is. for (int n=0; n<num_rois; n++){ for (int c = 0; c < channels_; ++c){ for (int ph = 0; ph < pooled_height_; ++ph){ for (int pw = 0; pw < pooled_width_; ++pw){ for (int h = hstart; h < hend; ++h){ for (int w = wstart; w < wend; ++w){ if (batch_data[index] > top_data[pool_index]) top_data[pool_index] = batch_data[index]; } } } } } 30

31 From RoI pooling to RoI align (source code pending) RoI pooling is not designed for pixel-pixel alignment RoI align Use bilinear interpolation instead of hard quantization Sample four locations per RoI bin and aggregate them The idea of RoI align seems simple but it may requires some efforts to implement efficiently on GPUs 31

32 Use Residual Network structure Benefits of Residual Network in ImageNet/COCO

33 Why Residual Network? Problem: Is learning better networks as simple as stacking more layers? Deep network + residual learning can solve this problem. 33

34 Residual net 34

35 Back to Mask R-CNN Combine cost for Classification Detection Mask (new) Training speed: 32-40hours on 8GPU machine to train CoCo data Testing speed: 200ms per image on Tesla M40 35

36 Experiments 36

37 Summary of this class Now we have covered a number of vision applications: - Image classification (programming in class 3 ) - Face recognition/detection/alignment - Object detection/segmentation

38 Any questions so far? - No good results for your projects? - Problem with GoogleCloud/Paperspace? - Problem with Keras/TensorFlow/Caffe? - Others? 38

39 Todo before next class Each project group should submit a short project report (4 pages presentation slides) including 1. Problem definition 2. Related work 3. Preliminary results 4. Future plan Submission: to chad.dechant@columbia.edu by April 5 Note: your slides will be put on course website. The submission must be PDF file, named by your group number

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Mask R-CNN By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Types of Computer Vision Tasks http://cs231n.stanford.edu/ Semantic vs Instance Segmentation Image