University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision

Similar documents
Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Image Processing

Edge and corner detection

Digital Image Processing COSC 6380/4393

Lecture 7: Most Common Edge Detectors

[Programming Assignment] (1)

Outline 7/2/201011/6/

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Edge detection. Goal: Identify sudden. an image. Ideal: artist s line drawing. object-level knowledge)

Edge Detection (with a sidelight introduction to linear, associative operators). Images

Edge Detection. CSE 576 Ali Farhadi. Many slides from Steve Seitz and Larry Zitnick

COMPARATIVE STUDY OF IMAGE EDGE DETECTION ALGORITHMS

Computational Foundations of Cognitive Science

EE795: Computer Vision and Intelligent Systems

Edge Detection Lecture 03 Computer Vision

Introduction. Introduction. Related Research. SIFT method. SIFT method. Distinctive Image Features from Scale-Invariant. Scale.

Computer Vision I. Announcement. Corners. Edges. Numerical Derivatives f(x) Edge and Corner Detection. CSE252A Lecture 11

Edges and Binary Images

Efficient Interest Point Detectors & Features

Efficient Interest Point Detectors & Features

CS334: Digital Imaging and Multimedia Edges and Contours. Ahmed Elgammal Dept. of Computer Science Rutgers University

Anno accademico 2006/2007. Davide Migliore

convolution shift invariant linear system Fourier Transform Aliasing and sampling scale representation edge detection corner detection

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

CS534: Introduction to Computer Vision Edges and Contours. Ahmed Elgammal Dept. of Computer Science Rutgers University

An Implementation on Object Move Detection Using OpenCV

Biomedical Image Analysis. Point, Edge and Line Detection

Assignment 3: Edge Detection

Edge and Texture. CS 554 Computer Vision Pinar Duygulu Bilkent University

Edge detection. Winter in Kraków photographed by Marcin Ryczek

Computer Vision I. Announcements. Fourier Tansform. Efficient Implementation. Edge and Corner Detection. CSE252A Lecture 13.

CAP 5415 Computer Vision Fall 2012

DIGITAL IMAGE PROCESSING

IMAGE PROCESSING >FILTERS AND EDGE DETECTION FOR COLOR IMAGES UTRECHT UNIVERSITY RONALD POPPE

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Review of Filtering. Filtering in frequency domain

CS664 Lecture #21: SIFT, object recognition, dynamic programming

Local Image preprocessing (cont d)

Edge detection. Winter in Kraków photographed by Marcin Ryczek

Edge and local feature detection - 2. Importance of edge detection in computer vision

Image features. Image Features

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Motivation. Gray Levels

Lecture 4.1 Feature descriptors. Trym Vegard Haavardsholm

Edge Detection. Computer Vision Shiv Ram Dubey, IIIT Sri City

Edge Detection. Today s reading. Cipolla & Gee on edge detection (available online) From Sandlot Science

The SIFT (Scale Invariant Feature

Local Feature Detectors

Edge Detection. Announcements. Edge detection. Origin of Edges. Mailing list: you should have received messages

Local features: detection and description. Local invariant features

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

SIFT - scale-invariant feature transform Konrad Schindler

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Local Features Tutorial: Nov. 8, 04

Announcements. Edges. Last Lecture. Gradients: Numerical Derivatives f(x) Edge Detection, Lines. Intro Computer Vision. CSE 152 Lecture 10

Robotics Programming Laboratory

Digital Image Processing (CS/ECE 545) Lecture 5: Edge Detection (Part 2) & Corner Detection

What is an edge? Paint. Depth discontinuity. Material change. Texture boundary

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

A Comparison and Matching Point Extraction of SIFT and ISIFT

Announcements. Edge Detection. An Isotropic Gaussian. Filters are templates. Assignment 2 on tracking due this Friday Midterm: Tuesday, May 3.

Prof. Feng Liu. Winter /15/2019

Previously. Edge detection. Today. Thresholding. Gradients -> edges 2/1/2011. Edges and Binary Image Analysis

Edge Detection CSC 767

Edge Detection. CS664 Computer Vision. 3. Edges. Several Causes of Edges. Detecting Edges. Finite Differences. The Gradient

What Are Edges? Lecture 5: Gradients and Edge Detection. Boundaries of objects. Boundaries of Lighting. Types of Edges (1D Profiles)

CS4670: Computer Vision

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Line, edge, blob and corner detection

Introduction to Medical Imaging (5XSA0)

Lecture 6: Edge Detection

Texture. Texture is a description of the spatial arrangement of color or intensities in an image or a selected region of an image.

EE795: Computer Vision and Intelligent Systems

Solution: filter the image, then subsample F 1 F 2. subsample blur subsample. blur

Segmentation and Grouping

Computer Vision for HCI. Topics of This Lecture

Edge detection. Convert a 2D image into a set of curves. Extracts salient features of the scene More compact than pixels

Implementing the Scale Invariant Feature Transform(SIFT) Method

Chapter 2 Basic Structure of High-Dimensional Spaces

2D Image Processing Feature Descriptors

Automatic Image Alignment (feature-based)

3D Object Recognition using Multiclass SVM-KNN

CS5670: Computer Vision

Edge Detection. EE/CSE 576 Linda Shapiro

Implementation of Canny Edge Detection Algorithm on FPGA and displaying Image through VGA Interface

Segmentation I: Edges and Lines

Feature Detection and Matching

[ ] Review. Edges and Binary Images. Edge detection. Derivative of Gaussian filter. Image gradient. Tuesday, Sept 16

CS5670: Computer Vision

A Robust Method for Circle / Ellipse Extraction Based Canny Edge Detection

Multimedia Computing: Algorithms, Systems, and Applications: Edge Detection

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Topic 4 Image Segmentation

5. Feature Extraction from Images

Edge Detection Using Streaming SIMD Extensions On Low Cost Robotic Platforms

Feature Matching and Robust Fitting

Rapid Natural Scene Text Segmentation

Comparison of Feature Detection and Matching Approaches: SIFT and SURF

Motivation. Intensity Levels

Transcription:

report University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision Web Server master database User Interface Images + labels image feature algorithm Extract features Phone Client image it is b database image feature algorithm Extract feature? Find closest match a c b Matthew Johnson November 2010

2 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Applied Computer Vision Computer Vision is not only an exciting field of research, it is also one of the key new technologies driving innovation in business and industry today. Just as human vision is a fundamental and incredibly rich channel of information which we use for a huge array of activities, advances in computer vision are creating new opportunities for innovation in the way in which we interact with our world through digital devices. In this lecture, we will focus on ways in which the concepts and techniques you have been exposed to so far in this course are being used today in the field of image recognition and search on embedded devices.

Mobile Computer Vision 3 Impaired Vision The first thing that becomes clear as one begins to work on a ubiquitous computing device like a mobile phone is that one can no longer take anything for granted. For example, most mobile phones in use today do not have any hardware support for division. This means that any division operation is translated in assembly into a long series of other instructions, thus making it incredibly costly. Here is a partial list of the challenges posed by mobile devices: Limited processor hardware (slow floating point operations, lack of functionality) Small cache (i.e. limited stack space for procedures) Slow access to a small supply of Random Access Memory (RAM) Costly disk access Arcane versions of known programming languages How do we overcome these challenges?

4 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Client/Server Model Web Server File Server User Interface image feature algorithm Images + labels master database Extract features Phone Client image feature algorithm image Extract features it is b report? Find closest match The most common and, at first, only way of overcoming these challenges on mobile devices was to offload any significant computation to a remote server. The system on the phone becomes a thin client (meaning that it does no real computation) that sends the image to the server, which computes feature vectors, matches them against a database of known images, and returns the information on the match. In order to add new images to this system, a second system interacts with the server to upload images and labels.

Mobile Computer Vision 5 Client/Server Model, cont. This arrangement solves the problem of the mobile device s limited capabilities, but has several problems of its own: If the file server stops functioning, the entire system fails If communication to the file server is lost, the entire system fails The server(s) have to perform massive amounts of parallel computation and database indexing Large quantities of data (i.e. images) require lots of time and bandwidth to transfer Significant delay in response to user actions

6 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics report Smart Clients Web Server master database User Interface Images + labels image feature algorithm Extract features Phone Client image it is b database image feature algorithm Extract feature? Find closest match a c b An alternative way of approaching this problem is to embrace and overcome the difficulties of performing vision on mobile devices through research into modifications to existing algorithms and entirely new algorithms which are able to solve computer vision problems with the tools available and within the limits imposed. This would be considered a smart client model, because the client does some or all of the computation. In this model, the client is able to perform feature vector extraction and matching locally, which means that it must maintain a full or partial copy of the master database on the device.

Mobile Computer Vision 7 Smart Clients, cont. We can see quite clearly how a smart client system, while it has to overcome the challenges of running in a limited environment, also has many benefits: If the server malfunctions, clients can still operate with their current database Similarly, if communication to the server is lost, the system still operates All computation is done by the phones locally, thereby greatly reducing server-based computation All communication with the server consists of feature vectors and metadata, much smaller than images Immediate response to user actions Clearly there are great gains to be had, but first we must overcome the limitations imposed by mobile platforms.

8 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Smart Client: System Design If we are going to achieve Computer Vision on a mobile phone, we must first come to terms with reality. That reality dictates that we have no large constructs in memory, no division, and no floating point numbers. Most of the standard Computer Vision algorithms are simply not available for use. Instead, we will have to rely on more efficient algorithms, and where those are unavailable, on approximations. A crucial requirement for this task is knowing the literature. We must understand the basic underlying principles of a technique so that we know what can be jettisoned without reducing performance. We are going to design a logo recognition system that can run on a mobile phone. It is going to be based on using a distance transform on the output of an edge detector to match logo contours. Thus, this system will contain 3 parts: Edge Detector Distance Transform Contour Fragment Matching

Mobile Computer Vision 9 Convolve with Logo Logo Recognition Logo recognition is a key task for many mobile vision applications. Restaurants, shops and other businesses take great care to have distinctive logos that are prominently displayed, making them ideal targets for user localization and value-added experiences. Edge Detection Distance Transform We will use Canny edge detection, the Euclidean distance transform, and a simple threshold-based contour recognizer. In the course of developing this system, we will examine the main techniques by which computer vision algorithms can be adapted to work on an embedded device like a mobile phone.

10 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Problems Our chosen techniques pose some very difficult problems to solve. The Canny edge detection algorithm poses the biggest challenge. The system works by following the peaks of the first derivative of a Gaussian convolved over the image to find continuous edge contours. In practice, it is implemented using the following steps, almost all of which pose intractable problems in their original forms: Convolution with a Gaussian: floating point arithmetic Computing derivative in the horizontal and vertical directions: floating point arithmetic Compute edge magnitude: square root computation Non-maximum suppression: floating point arithmetic, arctangent computation Hysteresis: recursion, redundant memory access Euclidean distance computation consists of finding the Euclidean distance at each pixel in the image to the nearest edge. A naïve implementation is computation intensive, but we will examine a technique that makes it very efficient.

Mobile Computer Vision 11 Gaussian Approximation When convolving an image with a Gaussian, one must perform 2k floating point multiplications per pixel, where k is the size of the Gaussian kernel, typically 6σ. The standard way of optimizing this is to convert the floating point values to integer approximations, with an integer division at the end. Unfortunately, this method of approximation is still too computationally intensive, requiring 2k integer multiplications and 2 integer divisions. The tool we want to use is called a box filter approximation, as discussed by Rau and McClellan in [4]. A box filter simply sets a pixel to the mean of its neighboring pixels. One interesting property of a box filter in one dimension is that multiple applications approximate a Gaussian, as seen below: i=0 i=1 i=2 i=3

12 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Gaussian Approximation, cont. Another property is that it is separable (that is, when computed in the horizontal and vertical directions separately it is equivalent to computing it using a two-dimensional square filter), and as such can be used to filter in two dimensions as well while losing none of its efficiency. Even more interesting is that, by using a rolling sum, this can be computed using one addition, one subtraction and one division per pass per pixel. As it takes about 3 passes to get a good approximation, this results in 6 additions, 6 subtractions and 6 divisions per pixel to convolve in the horizontal and vertical directions, regardless of the size of the kernel. 6 divisions sounds like a limitation, but if the size of the kernel is a power of 2, then we can use bit shifting instead of division, leaving a very efficient Gaussian approximation whose outputs are integers. Because the outputs are integers, we have removed the floating point arithmetic from derivative computation. Similarly, if we store edge magnitude as the magnitude squared (i.e. gx 2 + gy 2 ) then we avoid the square root computation, as all that is important is the ordering of edges by magnitude, not the actual magnitude values.

Mobile Computer Vision 13 Non-Maximum Suppression In order to suppress non-maximum edge signals, we must compare every pixel to those perpendicular to its edge direction. Some simple trigonometry allows us to avoid a full arctangent computation and simply use gx and gy to perform interpolation directly. While this is exact, the floating point computation is essential yet too expensive. Therefore, we will use another approximation trick: the lookup table. 0 2 In our system so far, gx and gy are integers. Thus, we can use them to index a two-dimensional array. As can be seen above, there are only 4 possible comparison sets, and thus in each array index will be an integer from 0 to 3 indicating which set to compare. We will fill this array during an initialization stage and thus avoid all floating point operations at runtime.

14 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Hysteresis The biggest challenge is hysteresis. During hysteresis, we connect together contours using low and high magnitude thresholds. Each contour consists of edgels (single edges which have passed the non-maximum suppression stage) with a magnitude greater than the lower threshold that are joined together (using 8-way connectivity) with at least one edgel in the contour with a magnitude greater than the higher threshold. The way this is done is by starting at an edgel above the higher threshold, and recursively determining which lower threshold pixels are connected to it. 1 1 2 2 3 1 2 2 4 3 1 2 2 This recursion can require many iterations, and given the limitations of a mobile phone is not possible. Thus, we need a different way of doing this. Thankfully, there is a tool in computer vision which will solve this problem for us very efficiently.

Mobile Computer Vision 15 Connected Components Connected components analysis takes a binary image (pixels are either 1 or 0) and finds which on pixels are connected together and labels each group of connected pixels (i.e. component) with a unique label. When implemented correctly the technique is very fast, and thus will provide what we need with a considerable speedup over the recursive method. Binary Image Input Labeled Components

16 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Connected Components, cont. If we create a binary image from the edgels with magnitudes greater than the lower threshold (indicated by L in the diagram below), then as described by Liu and Haralick [3] we can use connected components to find potential contours within the image by identifying all L pixels which are connected. Then, we simply exclude components which do not contain an edgel whose magnitude is greater than the higher threshold (indicated by H in the diagram). This achieves the same effect, while being greatly more efficient in computation and requiring no recursion whatsoever. L L L L L L L L L L H L L L H H H H H L L L L L L L L L L L Thresholded Maximum Edges Connected Components Extracted Contours L L

Mobile Computer Vision 17 Euclidean Distance Transform As mentioned previously, a naïve Euclidean distance transform is too computationally expensive for our needs. We are instead going to use a technique developed by Bailey in [1] which achieves the same accuracy using only integers and a minimum of computation. For a moment, let us reduce the distance transform to a one-dimensional problem. In one dimension, finding the distance from any index to an edge index can be computed very quickly and efficiently in two passes, as seen below: 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 2 1 0 1 2 3 3 2 1 0 1 2 3 4 5 4 3 2 1 0 1 2 3 4 5 6 7 8

18 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Efficient Euclidean DT Thus, for each row in an image we can find the distance from each pixel to the nearest edge in that row very efficiently. This is, naturally, insufficient. Think now of storing the square distance instead of the distance in each row pixel. This traces out a slice of a paraboloid of revolution, whose center is at the edge pixel, as seen below. 0 1 4 9 16 25 36 0 1 4 9 16 25 36 49 64 81 0 1 4 9 16 25 36 49 64 4 1 0 1 4 9 9 4 1 0 1 4 9 16 25 16 9 4 1 0 1 4 9 16 25 36 49 64 The paraboloid centered at each edge extends across the entire image, intersecting with others from other edges at various points. Each paraboloid determines the area of influence for each edge. By evaluating al paraboloids at a pixel, we can determine which edge is nearest to that pixel.

Mobile Computer Vision 19 Efficient Euclidean DT, cont. If we now examine the columns of our image after storing the square distance in each row pixel, we see a record of all the edges which can affect that column. By calculating the value of each paraboloid at each index, we can see which has the least value and thus find the true squared distance to an edge at each row in the column, as seen below. Initial 9 4 36 81 25 16 25 100 49 16 Final 5 4 5 8 13 16 17 20 17 16 At first, this seems like an O(N 2 ) operation, but by scanning the column in order and keeping track of influence changes, it becomes O(N). Thus, the total complexity of the transform remains linear in the number of pixels.

20 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Thin Client: Descriptor Compression While we have discounted to a certain extent server based methods, there have been several advances in descriptor compression in the past few years which make a server-based model increasingly feasible for large-scale deployment. Descriptor compression is an area of research that concentrates on encoding an image descriptor (e.g. a gradient histogram, SURF, SIFT) at a lower bitrate. There are several techniques that are used, but they fall into two categories: quantization and entropy coding. We are going to look briefly at a generalized method for descriptor compression proposed by Johnson in [2]. In this method, any image descriptor meeting simple requirements can be compressed for optimized matching, resulting in significant savings in both storage and computational requirements while improving performance. Image Patch 1 2 3 4 5 6 7 8 Descriptor Algorithm 1 2 3 4 5 6 7 8 0.232 0.179 0.334 0.255 0.134 0.497 0.231 0.138 0.384 0.217 0.145 0.254 0.185 0.069 0.160 0.587 0.249 0.201 0.243 0.307 0.578 0.054 0.303 0.065 0.265 0.252 0.228 0.255 0.383 0.387 0.122 0.108 0.102 0.380 0.128 0.391 0.364 0.236 0.371 0.030 0.122 0.372 0.417 0.090 0.009 0.528 0.157 0.306 0.188 0.322 0.452 0.039 0.281 0.145 0.279 0.295 0.276 0.357 0.013 0.353 0.087 0.244 0.167 0.502 F compression 2 2 2 2 3 1 2 3 1 3 3 2 2 3 3 1 2 2 2 2 1 3 2 3 2 2 2 2 2 1 3 3 3 2 3 1 2 3 1 3 3 2 1 3 3 1 3 2 3 2 1 3 2 2 2 2 3 1 3 2 3 2 3 1 c 2 3 3 3 1 0 4 9 4 0 1 9 1 0 1 3 2 1 + 9 + 0 + 1 = 11 D 6 14 23 11 6 6 6 14 1 1 4 4 4 4 1 1 w

Mobile Computer Vision 21 Hufmann Tree Encoding The technique used in [2] for compression is Hufmann Tree Encoding. Hufmann Trees are specialized binary trees in which each node has a value equal to the sum of its children. In this context, they can be used as a lossy method of encoding normalized distributions (like intensity or orientation histograms). First, the values in the distribution (a) are used to construct a Huffman Tree (b). The values are then approximated using a negative power of 2 equal to their depth in the tree (d). In this way, the distribution still sums to 1 but can be represented using N log(n) bits (c), where N is the length of the distribution. 0.421 3 2 1 1 0.500 0.157 0.234 0.157 0.188 0.188 0.421 3 0.234 1.000 0.579 2 0.345 3 0.125 0.250 0.125 (a) (b) (c) (d) =

22 Engineering Part IIB - Module 4F12 - Computer Vision and Robotics Distance Lookup Matrix One advantage of using Huffman Tree Encoding is that it limits the possible values for each index of the distribution. As most distance metrics (e.g. Minkowski norms, Kullback-Leibler Divergence) can be separated on a per-index basis, this means that we can pre-compute all possible distance scores in a lookup table. As the majority of computation time in an image retrieval application is spent computing distances between different descriptors, this optimization results in sizeable savings in computational requirements. In the diagram below, two distributions (x and y) are compressed (c x and c y ) and then are compared using a distance lookup matrix D. 0.421 0.157 0.234 0.188 x 1 3 0.126 3 0 4 9 3 0.132 2 4 0 1 1 3 9 1 0 2 0.523 c x D c y 9 + 0 + 4 + 1= 14 y 0.219

Mobile Computer Vision 23 Further Reading References [1] D. Bailey. An efficient Euclidean distance transform. In Proc. of IWCIA04, pages 394 408, 2004. [2] M. Johnson. Generalized descriptor compression for storage and matching. In Proc. of British Machine Vision Conf., Aberystwyth, Wales, August 2010. [3] G. Liu and R. M. Haralick. Two practical issues in Canny s edge detector implementation. In Proc. of Int. Conf. on Pattern Recognition, volume 3, page 3680, 2000. [4] Rau R. and J. McClellan. Efficient approximation of Gaussian filters. IEEE Trans. on Signal Processing, 45:468 471, February 1997.