Computer Vision. From traditional approaches to deep neural networks. Stanislav Frolov München,

Size: px

Start display at page:

Download "Computer Vision. From traditional approaches to deep neural networks. Stanislav Frolov München,"

Maximilian Garrison
5 years ago
Views:

1 Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München,

2 Outline of this talk What we are going to talk about Computer vision Human vision Traditional approaches and methods Artificial neural networks Summary 2

3 Stanislav Frolov Big Data trained deep neural networks for object detection during master thesis still fascinated and interested 3

4 What is computer vision General Teach computers how to see Automatic extraction, analysis and understanding of images Infer useful information, interpret and make decisions Automate tasks that human visual system can do One of the most exciting fields in AI and ML 4

5 What is computer vision Motivation Era of pixels Internet consists mostly of images Explosion of visual data Cannot be labeled by humans 5

6 What is computer vision Drivers Two drivers for computer vision explosion Compute (faster and cheaper) Data (more data > algorithms) 6

7 What is computer vision Interdisciplinary field Computer Science Graphs, Algorithms Systems Architecture Engineering Speech, NLP Image Processing Robotics Machine Learning Mathematics Information Retrieval Biology Optics Biological vision Physics Solid-State Physics Neuroscience Psychology Cognitive Sciences 7

8 Synonyms? 8

9 What is computer vision Related fields - image processing Imaging for statistical pattern recognition Image transformations such as pixel-by-pixel operations Contrast enhancement Edge extraction Noise reduction Geometrical and spatial operations (i.e rotations) 9

10 What is computer vision Related fields - computer graphics Creates new images from scene descriptions Produces image data from 3D models Inverse of computer vision AR as a combination of both 10

11 What is computer vision Related fields - machine vision Mainly manufacturing applications Image-based automatic inspection, process control, robot guidance Usually employs strong assumptions (colour, shape, light, structure, orientation,...) -> works very well Output often pass/fail or good/bad Additionally numerical/measurement data, counts 11

12 What is computer vision Related fields - AI Create intelligent systems Studying computational aspects of intelligence Make computers do things at which, at the moment, people are better Many techniques play an important role (ML, ANNs) Currently does a few things better/faster at scale than humans can Ability to do anything human is not answered 12

13 What is computer vision Related fields- summary Related fields have a large intersection Basic techniques used, developed and studied are very similar 13

14 Short trip to human vision 14

15 What is human vision General Two stage process Eyes take in light reflected off the objects and retina converts 3D objects into 2D images Brain s visual system interprets 2D images and rebuilds a 3D model 15

16 What is human vision Stereoscopic vision Pair of 2D images with slightly different view allows to infer depth Position of nearby objects will vary more across the two images than the position of more distant objects 16

17 What is human vision Prior knowledge Prior knowledge of relative sizes and depths is often key for understanding and interpretation 17

18 What is human vision Texture pattern Texture and texture change helps solving depth perception 18

19 What is human vision Biases and illusions in human perception Shadows make all the difference in interpretation Gradual changes in light ignored to not be misled by shadow 19

20 What is human vision A few more illusions Two arrows with different orientations have the same length 20

21 What is human vision Biases and illusions in human perception Assumptions and familiarity (distorted room) Face recognition bias Up-down orientation bias 21

22 What is human vision Summary Illusions are fun, but the complete puzzle to understand human vision is far from being complete 22

23 Back to computer vision 23

24 What is computer vision Typical tasks Recognition Localization Detection Segmentation 24

25 What is computer vision Typical tasks Part-based detection Deformable parts model Pose estimation and poselets 25

26 What is computer vision Typical tasks Image captioning (actions, attributes) 26

27 What is computer vision Typical tasks Motion analysis Egomotion (camera) Optical flow (pixels) 27

28 What is computer vision Typical tasks Scene understanding and reconstruction 28

29 What is computer vision Typical tasks Image restoration Colouring black & white photos 29

30 Solving this is useful for many applications 30

31 What is computer vision Typical applications Assistance systems for cars and people Surveillance Navigation (obstacle avoidance, road following, path planning) Photo interpretation Military ( smart weapons) Manufacturing (inspection, identification) Robotics Autonomous vehicles (dangerous zones) 31

32 What is computer vision Typical applications Recognition and tracking Event detection Interaction (man-machine interfaces) Modeling (medical, manufacturing, training, education) Organizing (database index, sorting/clustering) Fingerprint and biometrics 32

33 Why so difficult? 33

34 What is computer vision Why it is difficult Occlusion Deformation Scale Clutter Illumination Viewpoint Object pose Tons of classes and variants Often n:1 mapping Computationally expensive Full understanding of biological vision is missing 34

35 System overview 35

36 What is computer vision System overview Input: image(s) + labels Output: Semantic data, labels Digital image pixels usually have three channels [R,G,B] each [ ] + Location[x,y] Digital images are just vectors 36

37 What is computer vision System overview 1. Image acquisition (camera, sensors) 2. Pre-processing (sampling, noise reduction, augmentation) 3. Feature extraction (lines, edges, regions, points) 4. Detection and segmentation 5. Post-processing (verification, estimation, recognition) 6. Decision making -> Ability of a machine to step back and interpret the big picture of those pixels 37

38 Some history 38

39 What is computer vision History 1950s 2D imaging for statistical pattern recognition Theory of optical flow based on a fixed point towards which one moves 39

40 What is computer vision Traditional approaches Image processing Histograms Filtering Stitching Thresholding... 40

41 What is computer vision History 1960s Desire to extract 3D structure from 2D images for scene understanding Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots Summer vision project at MIT: attach camera to computer and having it describe what it saw 41

42 What is computer vision History: summer vision 1966 Given to 10 undergraduate students an attempt to use our summer workers effectively construction of a significant part of a visual system task can be segmented into sub-problems participate in the construction of a system complex enough to be a real landmark in the development of pattern recognition 42

43 What is computer vision History: summer vision 1966 Goal: analyse scenes and identify objects Structure of system: Region proposal Property lists for regions Boundary construction Match with properties Segment Basic foreground/background segmentation with simple objects (cubes, cylinders,.) 43

44 What is computer vision History: summer vision 1966 Unlike general intelligence, computer vision seemed tractable Amusing anecdote, but it did never aimed to solve computer vision Computer vision today differs from what it was thought to be in

45 What is computer vision History 1970s Formed many algorithms that exist today Edges, lines and objects as interconnected structures 45

46 What is computer vision Traditional approaches Edge detection based on Brightness Gradients Geometry Illumination 46

47 What is computer vision Traditional approaches - part based detector Objects composed of features of parts and their spatial relationship Challenge: how to define and combine 47

48 What is computer vision History 1980s More rigorous mathematical analysis and quantitative aspects Optical character recognition Sliding window approaches Usage of artificial neural networks 48

49 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) Concept in 80s but used only in 2005 Create HOG descriptors (object generalizations) One feature vector per object Train with SVM Sliding scales 49

50 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) Computation of HOG descriptors: 1. Compute gradients 2. Compute histograms on cells 3. Normalize histograms 4. Concatenate histograms Requires a lot of engineering Must build ensembles of feature descriptors 50

51 What is computer vision History 1990s Significant interaction with computer graphics (rendering, morphing, stitching) Approaches using statistical learning Eigenface (Ghostfaces) through principal component analysis (PCA) 51

52 What is computer vision Traditional approaches - deformable parts model (DPM) Objects constructed by its parts First match whole object, then refine on the parts HOG + part-based + modern features Slow but good at difficult objects Involves many heuristics 52

53 What is computer vision Features Feature points Small area of pixels with certain properties Feature detection Use features for identification Activate if object present Examples: Lines, edges, colours, blobs, Animals, faces, cars,... 53

54 What is computer vision Traditional approaches - classical recognition Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels Inference: extract features from query image and find closest match in database or train a classifier Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches 54

55 What is computer vision History Before the new era Bags of features Handcrafted ensembles Feat. 1 Input Feat. 2 Final Decision Feat. n Feature Extraction 55

56 The new era of computer vision 56

57 Artificial neural networks Fundamentals - artificial neuron Elementary building block Inspired by biological neurons Mathematical function y=f(wx+b) Learnable weights 57

58 Artificial neural networks Fundamentals - artificial neural networks Collection of neurons organized in layers Universal approximators Fully-connected network here 58

59 Artificial neural networks Fundamentals - training Basically an optimization problem Find minimum of a loss function by an iterative process (training) Designing the loss function is sometimes tricky 59

60 Artificial neural networks Fundamentals - training Simple optimizer algorithm: 1. Forward pass with a batch of data 2. Calculate error between actual and wanted output 3. Nudge weights in proportion to error into the right direction (same data would result in smaller error) 4. Repeat until convergence 60

61 Artificial neural networks Fundamentals - CNN Local neighborhood contributes to activation Exploit spatial information Hierarchical feature extractors Less parameters input filters receptive field activation 61

62 Artificial neural networks Fundamentals - CNN Filter of size 3x3 applied to an input of 7x7 62

63 Artificial neural networks Fundamentals - pooling Max-pooling Dimension reduction/adaption Existence is more important than location 63

64 Artificial neural networks Fundamentals - pooling Zero-padding Controlling dimensions 64

65 Artificial neural networks Fundamentals - general network architecture Input image... Final decision convolutional layers 65

66 Artificial neural networks Fundamentals - hierarchical feature extractors First layers Deeper layers Activations for: Lines, edges, blobs, colours,... Parts of abstract objects Abstract objects 66

67 Modern history of object recognition 67

68 Benchmark Datasets - PASCAL VOC Classification and detection 27k images 20 classes person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor 68

69 Benchmark Datasets - ImageNet Challenges on a subset of ImageNet 14kk labeled images 20k object categories ILSVRC* usually on 10k categories including 90 out of 120 dog breeds *ImageNet Large Scale Visual Recognition Challenge 69

70 Artificial neural networks Roadmap - AlexNet ILSVRC 2012 winner by a large margin from 25% to 16% Proved effectiveness of CNNs and kicked of a new era 8 layers, 650k neurons, 60kk parameters 70

71 Artificial neural networks Roadmap - ZFNet ILSVRC 2013 winner with a best top-5 error of 11.6% AlexNet but using smaller 7x7 kernels to keep more information in deeper layers 71

72 Artificial neural networks Roadmap - OverFeat ILSVRC 2013 localization winner Uses AlexNet on multi-scale input images with sliding window approach Accumulates bounding boxes for final detection (instead of non-max suppression) 72

73 Artificial neural networks Roadmap - RCNN (region based CNN) 2k proposals generated by selective search SVM trained for classification Multi-stage pipeline 73

74 Artificial neural networks Roadmap - VGGNet Not a winner but famous due to simplicity and effectiveness Replace large-kernel convolutions by stacking several small-kernel convolutions 74

75 Artificial neural networks Roadmap - InceptionNet (GoogleNet) ILSVRC 2014 winner Stacks up inception modules 22 layers, 5kk parameters 75

76 Artificial neural networks Roadmap - Fast RCNN Jointly learns region proposal and detection Employs a region of interest (RoI) that allows to reuse the computations 76

77 Artificial neural networks Roadmap - YOLO (you only look once) Directly predicts all objects and classes in one shot Very fast Processes images at ~40 FPS on a Titan X GPU First real-time state-of-the-art detector Divides input images into multiple grid cells which are then classified 77

78 Artificial neural networks Roadmap - ResNet (Microsoft) ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%) Employs residual blocks which allows to build deep networks (hundreds of layers) Additional identity mapping 78

79 Artificial neural networks Roadmap - MultiBox Not a recognition network A region proposal network Popularized prior/anchor boxes (found through clustering) to predict offsets Much better strategy than starting the predictions with random coordinates Since then heuristic approaches have been gradually fading out and replaced 79

80 Artificial neural networks Roadmap - Faster RCNN Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox RPN shares full-image convolutional features with the detection network (cost-free region proposal) RPN uses attention mechanism to tell where to look ~5 FPS on a Titan K40 GPU End-to-end training 80

81 Artificial neural networks Roadmap - SSD (single shot multibox detector) SSD leverages the Faster RCNN s RPN to directly classify objects inside each prior box (similar to YOLO) Predicts category scores and box offsets for a fixed set of default bounding boxes Fixes the predefined grid cells used in YOLO by using multiple aspect ratios Produces predictions of different scales ~59 FPS 81

82 Artificial neural networks TensorFlow object detection API Open-source software library for machine learning applications Tensorflow Object Detection API A collection of pretrained models construct, train and deploy object detection models 82

83 Summary 83

84 Summary Human vs machine Humans are good at understanding the big picture Neural networks are good at details But they can be fooled... 84

85 Summary Computer vision is still difficult Need a large amount data Lots of engineering Trial and error Long training time Still lots of hyperparameter parameter tuning No general network (generalization not answered) Little mathematical foundation 85

86 Summary Computer vision is hard Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized 86

Thank You Stanislav Frolov Big Data Engineer sfrolov@inovex.

87 Thank You Stanislav Frolov Big Data Engineer inovex GmbH Lindberghstraße München

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf