New approaches to pattern recognition and automated learning

Z Y X New approaches to pattern recognition and automated learning Technology Forum 2015 Johannes Zuegner STEMMER IMAGING GmbH, Puchheim, Germany

OUTLINE Introduction Description of the task What does pose estimation exactly mean? Presentation of two approaches Image Features & Bag of Words Classification Presentation of the search classifier Direct comparison of the two approaches Summary and outlook Evaluation of the current state Future developments 17. November 2015 2

DESCRIPTION OF THE TASK The task of pattern recognition: searching and finding of pre-learned objects Example applications: counting of parts, pick & place etc. pattern image scene image [Blendswap] 4

DESCRIPTION OF THE TASK The task of pattern recognition: searching and finding of pre-learned objects Example applications: counting of parts, pick & place etc. The mission often exeeds the task of finding an object 2D: exact orientation and scale of the objects is of interest orientation 0 orientation 45 orientation 135 GE 10 SE-128 M12 GE 10 AS-96 M12 GE 10 CX-32 M12 6

DESCRIPTION OF THE TASK The task of pattern recognition: searching and finding of pre-learned objects Example applications: counting of parts, pick & place etc. The mission often exeeds the task of finding an object 2D: exact orientation and scale of the objects is of interest 3D: rigid transform [R,t] with 6 degrees of freedom Z Y Z Y X [R,t] X 2D-image plane of the camera 3D scene 7

DESCRIPTION OF THE TASK The task of pattern recognition: searching and finding of pre-learned objects Example applications: counting of parts, pick & place etc. The mission often exeeds the task of finding an object 2D: exact orientation and scale of the objects is of interest 3D: rigid transform [R,t] with 6 degrees of freedom Z Z Y X Z Y X Y X Z Y X Z Y 2D-image plane of the camera 3D scene X 8

DESCRIPTION OF THE TASK The task of pattern recognition: searching and finding of pre-learned objects Example applications: counting of parts, pick & place etc. The mission often exeeds the task of finding an object 2D: exact orientation and scale of the objects is of interest 3D: rigid transform [R,t] with 6 degrees of freedom Summary The pose of an object refers to its concrete position, orientation and scale In the following two approaches are presented for pattern recognition and pose estimation 9

IMAGE FEATURES & BAG OF WORDS Extraction of image features with SIFT, KAZE & binary feature descriptors Extraction of feature points in corner-like image structures 11

IMAGE FEATURES & BAG OF WORDS Extraction of image features with SIFT, KAZE & binary feature descriptors Extraction of feature points in corner-like image structures Calculation of descriptors, the footprint Bag of Visual Words Accumulation of the descriptors like a dictionary [Blendswap/Wikipedia/Wikia] Bag of Words 13

IMAGE FEATURES & BAG OF WORDS Apply the Bag of Words for recognition tasks Matching of the extracted features Cluster 3 Cluster 1 scene image Cluster 2 Bag of Words 15

IMAGE FEATURES & BAG OF WORDS Apply the Bag of Words for recognition tasks Matching of the extracted features Evaluation of the histogram Cluster 3 Cluster 1 scene image p C1 C2 C3 Cluster 2 Bag of Words Histogram Cluster 16

IMAGE FEATURES & BAG OF WORDS Apply the Bag of Words for recognition tasks Matching of the extracted features Evaluation of the histogram Use point correspondences for the pose Cluster 3 Cluster 1 scene image p C1 C2 C3 Cluster 2 Bag of Words Histogram Cluster 18

IMAGE FEATURES & BAG OF WORDS Advantages of this technique Robust search results for a variety of poses 19

IMAGE FEATURES & BAG OF WORDS Advantages of this technique Robust search results for a variety of poses Robustness towards variations in lighting 20

IMAGE FEATURES & BAG OF WORDS Advantages of this technique Robust search results for a variety of poses Robustness towards variations in lighting Disadvantages of this technique Needs objects with disctinctive image structures Processing time is high time (ms) 1435 890 35 point extraction descriptors clustering [Stemmer/Rewe] 22

IMAGE FEATURES & BAG OF WORDS Advantages of this technique Robust search results for a variety of poses Robustness towards variations in lighting Disadvantages of this technique Needs objects with disctinctive image structures Processing time is high License costs of SIFT The alternatives are more imprecise or insignificantly faster Motivation for the search for an alternative 23

PRESENTATION OF THE SEARCH CLASSIFIER Classic approach in pattern recognition Finding a pre-learned pattern using window search A metric decides wether the pattern was found or not (e.g. correlation, regression etc.) pattern image scene image 25

PRESENTATION OF THE SEARCH CLASSIFIER A new approach in pattern recognition: Learn how to find the object Automated learning in CVB Polimago: generation of random learning examples For every learning example: Extraction of features using a MRF (Multi Resolution Filter) & regression (Tikhonov) Saving of the underlying transformation pattern database learning image zero position 30

PRESENTATION OF THE SEARCH CLASSIFIER A new approach in pattern recognition: Learn how to find the object Thanks to the automated learning stage of the classifier a huge number of poses can be learned Additional advantage: The processing time can be decreased 36

PRESENTATION OF THE SEARCH CLASSIFIER A new approach in pattern recognition: Learn how to find the object Example: an image window of 268 x 252 pixels and an object size of 64 x 64 pixels Search with correlation without image pyramid- 33768 comparisons two-level image pyramid - 2110 comparisons three-level image pyramid - 527 comparisons 54

COMPARISON OF THE TWO APPROACHES Qualitative comparison Approach / Criterion Bag of Words with SIFT Features Search Classifier Invariance against - geometric transformations - variations in lighting fully affine (ASIFT) perspective (PSIFT) yes (normalization) fully affine (Training) imaginable yes (training) Extraction of features depends on corner-like structures arbitrary structures Pattern recognition - multiple objects - negative samples no (yes with extensions) no yes yes Processing time low high 57

COMPARISON OF THE TWO APPROACHES Quantitative comparison what is the expected precision? Comparison of Polimago with a geometric pattern matcher (CVB ShapeFinder) 1/10 Pixel precision in positioning 0,1 precision in orientation Error: Rotation in Suchklassifikator ShapeFinder Error: Euklidische Distanz Suchklassifikator ShapeFinder 0,1 0,045 0,09 0,04 0,08 0,035 0,07 0,03 0,06 0,025 0,05 0,04 0,02 0,03 0,015 0,02 0,01 0,01 0,005 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 Num Tests Num Tests 58

COMPARISON OF THE TWO APPROACHES Quantitative comparison what is the expected precision? Comparison of Polimago with a geometric pattern matcher (CVB ShapeFinder) 1/10 Pixel precision in positioning 0,1 precision in orientation Comparison with an established measurement system PTB-certified ground truth available [GOM & Engineeringcapacity] 59

COMPARISON OF THE TWO APPROACHES Quantitative comparison what is the expected precision? Comparison of Polimago with a geometric pattern matcher (CVB ShapeFinder) 1/10 Pixel precision in positioning 0,1 precision in orientation Comparison with an established measurement system PTB-certified ground truth available Calculation of the three Euler angles [Stemmer] 60

COMPARISON OF THE TWO APPROACHES Quantitative comparison what is the expected precision? Comparison of Polimago with a geometric pattern matcher (CVB ShapeFinder) 1/10 Pixel precision in positioning 0,1 precision in orientation Comparison with an established measurement system PTB-certified ground truth available Calculation of the three Euler angles 5 measurement accuracy up to 60 of tilt [Stemmer] 61

SUMMARY Evaluation of the current state Robust recognition results of pre-learned objects Pose estimation of one or several objects in parallel Low processing times suitable for real-time tracking applications Integrated in Common Vision Blox 2016 Future developments Speed-up of the classifier s learning stage (GPU, SSE) Preparation for the platforms Linux / ARM 65

DO YOU HAVE ANY QUESTIONS? Come and join our LinkedIn-group EUROPEAN VISION TECHNOLOGY FORUM and meet with our experts. 66

Thank you for your attention! STEMMER IMAGING GmbH Gutenbergstraße 9 13 82178 Puchheim, Germany Telefon: +49 89 80902-744 Fax: +49 89 80902-116 j.zuegner@stemmer-imaging.de www.stemmer-imaging.de Your contact person: Johannes Zügner