Exploiting scene constraints to improve object detection algorithms for industrial applications

Exploiting scene constraints to improve object detection algorithms for industrial applications PhD Public Defense Steven Puttemans Promotor: Toon Goedemé

2 A general introduction

Object detection? Help smart systems understand and interact with their environment, through computer vision on images Locate and identify objects in an input image or video Robots, UAV s, computer systems, 3

Object detection? 1. collect training images start with training an object model 2. manual annotation 3. smart computer system using machine learning object samples non-object samples 4

Object detection? then use that model to process a new input image 3. smart computer system machine learning + stored object model 4. new input image detect objects 5

The base algorithm Boosted cascades of weak classifiers - Viola&Jones Known from face detection in portable cameras Idea: use weak classifiers to ignore most windows only evaluate complete model on the difficult ones using sliding window approach early rejection principle HAAR LBP 6

Evaluation metrics Performance evaluation of detectors TP = true positive FP = false positive FN = False negative TN = true negative true false true positive negative = no = no = object in in in image, no object detected 7

Evaluation metrics Performance evaluation of detectors Each detection has a certainty score --> if we loop over those scores Precision Recall = Optimal R = Best = P RP = TP set-up / point (TP / (TP + FN) + FP) = = number of of correct Precision actual detections objects 100% 68% in the versus dataset all detection that Recall are given being 100% 82% by the detected system Minimal distance to optimal point 8

Problem statement ACADEMIA Large public datasets Time consuming + expensive Training #days-weeks Training as fast as possible INDUSTRY Top accuracy OR speed High accuracy AND speed In-the-wild Known case + object 9

Our main contributions Make use of application specific knowledge in object detection Boost accuracy of off-the-shelf detectors in industrial context 1 2 3 Keep real-time execution as a hard constraint Minimize the manual labour in training models using active learning Investigate possibilities using deep learning 4 5 6 Evaluate the relation of training data towards accuracy 10

Scene constraints IN GENERAL Aim for invariance against Illumination changes Colour differences Different poses Occlusion Different viewpoints Intra-class variance IN INDUSTRIAL CASES We have knowledge based constraints Fixed background colour Controlled illumination Fixed camera position Known texture Known movement pattern Let s not ignore this, but benefit as much as possible! 11

12 Step 1: Introduce constraints as pre- or post-filters to the off-the-shelf object detector

Object- and application specific constraints General principle: black box + pre- or post-filtering Ideal Post-filter Pre-filter case 13

Object- and application specific constraints Application 1: visual detection/classification of orchid flowers detect as much flowers as possible avoid false positive detections known background known illumination known camera position 14

Object- and application specific constraints Application 1: visual detection/classification of orchid flowers Locations No false positives Enough detections / plant Segment out flowers 15

Object- and application specific constraints Application 1: visual detection/classification of orchid flowers Single detector not fail-proof BUT always enough flowers per plant for classification Majority voting for correct texture class 16

Object- and application specific constraints Application 2: automated walking aid detector two separate trajectories is walker used or not (cane/nothing) known camera set-up known walking aid challenging illumination conditions 17

Object- and application specific constraints Application 2: automated walking aid detector Mask of possible locations Remove false positives Link subsequent frames Connect to closest previous detection 18

Object- and application specific constraints Application 2: automated walking aid detector Walker Non-Walker walker seq 14 2 cane seq 0 6 no aid seq 0 4 Sequence based accuracy of 92,3% given C walker = 0.2! 19

Object- and application specific constraints Application 3: safeguarding privacy by automatic face blurring millions of images captured daily, manually annotated can we automatically safeguard privacy of pedestrians? known fixed camera position weak threshold, auto-remove false detections 20

Object- and application specific constraints Application 3: safeguarding privacy by automatic face blurring 21

22 Step 2: Integrating constraints directly into the object detection process, adapting the existing algorithm

Enhancing algorithms: integration Integrate the constraints directly into the training data Constraint-adapted training data instead of separate filters Enhance the power of existing object detection algorithm 23

Enhancing algorithms: integration Our original base algorithm uses grayscale images for model training, and thus ignores colour information. However colour can be an important feature! How many strawberries in the image? Which ones are (un)ripe? 24

Enhancing algorithms: integration Application 1: automated visual fruit detection for harvest estimation and robotic fruit picking using grayscale training data does not work difference between ripe and unripe fruit known camera position known object colour range 25

Enhancing algorithms: integration Application 1: visual fruit detection for harvest estimation 26

Enhancing algorithms: integration Application 1: visual fruit detection for harvest estimation FP FN Precision Recall TP 27

Enhancing algorithms: integration Application 2: detection of solar panels in aerial images fixed size due to constant flight height a known colour range known structure and texture 28

Enhancing algorithms: integration Application 2: detection of photovoltaic installations Compared 4 algorithms for detecting these solar panels 3) Viola&Jones 4) ACF 1) Pixel based segmentation 2) MSER 29

Enhancing algorithms: integration Application 2: detection of photovoltaic installations Area of 1km x 1km Viola&Jones visueel op stukje beeld niet beste, maar op dataset wel Have to be interpreted No optimized versions GPU acceleration possible 30

31 Step 3: Reducing the manual effort as much as possible

Enhancing algorithms: active learning Remember: Collecting and annotating all this training data is a timeconsuming and costly process for the industry Reduce manual work by integrating an innovative active learning Minimal effort Small batches Let the machine decide From weak to strong Samples that matter 32

Enhancing detection algorithms Application 3: improving open-source face detection RED = OpenCV baseline GREEN = our best model 33

Enhancing detection algorithms Application 3: improving open-source face detection 90% precision 68% recall 40% precision 40% recall closing the gap towards the state-of-the-art 34

35 Step 4: Can deep learning help out?

Step towards deep learning Deep learning seems to be the holy grail for computer vision but need for enormous datasets need for a lot of computing power --> GPGPU / clusters / it takes long to process a decent resolution input image 2015 - explosion off-the-shelf models publicly available Public availability of enormous datasets to reproduce research GPGPU hardware becomes more than affordable Fast execution speeds start getting reported in literature No longer any reason to keep ignoring it 36

Step towards deep learning But what if your have an object class that is not available in literature and thus there is no off-the-shelf detection model? The concept of transfer learning 37

Step towards deep learning But what if your have an object class that is not available in literature and thus there is no off-the-shelf detection model? The concept of transfer learning transfer learning to one specific class 38

Step towards deep learning Application 1: transfer learning and single-pass architectures is it worth the hassle? very limited training data (15-100) does transfer learning work for our cases? based on publicly available frameworks 39

Step towards deep learning Application 1: transfer learning and single-pass architectures Works out-of-the-box Simple test case Only +-60 samples/class 40

Step towards deep learning Application 1: transfer learning and single-pass architectures Seems visually doing its job On frame basis - bad curve? ACF still producing better results? Motion blur / only need to detect object once Not completely convinced 41

Step towards deep learning Application 1: transfer learning and single-pass architectures Works for single-class But also for multi-class Detector + classifier 100% accuracy obtained 42

Step towards deep learning Application 2: boosted cascades versus deep learning So we noticed deep learning is powerful, but does it beat boosted cascades? known object size challenging, e.g. coconut versus oil palm and shadows always under sunny conditions deep learning robust to complete different viewpoints? 43

Step towards deep learning Application 2: boosted cascades versus deep learning Deep learned classification network Best: P=97%,R=88% You want accuracy? --> deep learning You have limited hardware? --> boosted cascades You want limited training time? --> boosted cascades Viola & Jones Best: P=90%,R=80% Train 2 h / Detect 10 min ACF Best: P=90%,R=86% Train 30 min / Detect 5 min 44

Conclusions We proved the importance of object- and scene-specific constraints, and how they influence object detection in industrial applications. We experienced that training data is critical to the whole training process, especially selecting the valuable/important samples. We reckon that we built application-specific object detectors. We do not guarantee they work outside the application context. We prove that you do not always need top-notch detectors, if you use application related knowledge to add extra processing steps. We investigated the difference between classical computer vision and deep learning for object detection and provided a set of lessons learned (only in thesis text) to base your decision on. 45

Thank you for your attention! Any questions? PhD Public Defense Steven Puttemans Promotor: Toon Goedemé