REAL-TIME PATTERN RECOGNITION

Size: px

Start display at page:

Download "REAL-TIME PATTERN RECOGNITION"

Daisy Carter
6 years ago
Views:

1 SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY 2007 MINICOURSE REAL-TIME PATTERN RECOGNITION USING THE OPENCV LIBRARY {jpsml, mouse, ela, gsm, ds2, vt, João Paulo Lima Thiago Farias Eduardo Apolinário Guilherme Moura Daliton Silva Veronica Teichrieb Judith Kelner MAY/JUNE 2007

2 Contents 1. Introduction Motivation License Installation Documentation Functionalities Image enhancement and feature selection Smoothing Edge detection Corner detection Segmentation Thresholding Line detection Contour detection Connected components labeling Object tracking Template matching CamShift Optical flow Object detection Example applications Square detection CamShift demo Kanade Lucas tracker Face detection Final considerations References Appendix

3 1. Introduction OpenCV is an open source software library for offline and real time image processing [1]. It may be applied in vast and different areas, such as humancomputer interaction (HCI), object identification, segmentation and recognition, face recognition, gesture recognition, motion tracking, ego motion, motion understanding and structure from motion (SfM), and mobile robotics. Table 1 shows a more detailed description of the functions supported by OpenCV. The library is developed by Intel, and has been under development since 2001, with its first release version distributed on October There are OpenCV versions targeting both, Windows and Linux operating systems. Table 1. OpenCV main functions Image data manipulation Image and video I/O Matrix and vector manipulation and linear algebra routines Various dynamic data structures Basic image processing allocation, release, copying, setting, conversion file and camera based input, image/video file output products, solvers, eigenvalues, single value decomposition (SVD) lists, queues, sets, trees, graphs filtering, edge detection, corner detection, sampling and interpolation, color conversion, morphological operations, histograms, image pyramids Structural analysis connected components, contour processing, distance transform, various moments, template matching, Hough transform, polygonal approximation, line fitting, ellipse fitting, Delaunay triangulation Camera calibration finding and tracking calibration patterns, calibration, fundamental matrix estimation, homography estimation, stereo correspondence Motion analysis Object recognition Basic GUI Image labeling optical flow, motion segmentation, tracking eigen-methods, hidden Markov models (HMM) display image/video, keyboard and mouse handling, scroll-bars line, conic, polygon, text drawing OpenCV s source code is fully available for download and anyone is free to modify it, once the license clauses are followed. OpenCV is written in C/C++, and provides support for developers that use Microsoft Visual Studio, Eclipse Project 3

4 and C++ Builder (when using Windows), and make files (when using Linux). OpenCV can be divided in four modules: cv, cvaux, cxcore and highgui. The cv module contains the main functions, and can be considered the library s heart. The cvaux module, as its name suggests, implements functions that support OpenCV s use. The cxcore module is responsible for data structures and linear algebra operations. At last, the highgui module provides support for GUI functions, such as showing a window containing images captured by a webcam, for example Motivation Real time pattern recognition represents an important task of applications applied in diverse areas, such as Augmented Reality, Robotics and HCI. OpenCV provides out of the box functionalities for building user interfaces, image loading and video capturing, for example. The OpenCV library is an open source solution for computer vision that provides many functionalities adequate for the types of application listed before. Since it has a well defined and simple programming interface, it can be easily integrated to projects that already exist. A considerable problem involving pattern recognition is the heavy processing load that this type of task demands. Because of that, OpenCV s implementation contains many optimizations that imply in a good performance. Since OpenCV is implemented by Intel, there are a series of optimizations specially designed for its produced processors. Intel has been realizing large scale investments on OpenCV development. Many Intel researchers dedicate their work to the development of new OpenCV functionalities, which are incorporated to the library every time a new version is released. As an example, it can be cited the cvgoodfeaturestotrack function, based on Carlo Tomasi s work License OpenCV s license allows modifying and freely distributing its source code and binaries, since some conditions are obeyed. Among these conditions, the following can be cited: - The source code and/or binaries distributed must contain all the information listed in OpenCV s original license; - The name of Intel or any of its partners cannot be used for promoting the use of applications that utilize OpenCV library without having a previous written authorization Installation OpenCV can be installed through a wizard, available at or it can be downloaded directly from its CVS. The instructions to use the CVS can be accessed at In both cases it will be copied the source code and some precompiled DLLs. In order to recompile or modify any OpenCV module, one has to open it using one of its supported IDEs. In sequence, OpenCV s step-by-step installation using the wizard is described. Figure 1 shows the screen that appears on OpenCV installation. By clicking on the Next button, the use license is shown. In order to agree with the listed terms, the user must check the option I accept the agreement and click on 4

PATH environment variable). Figure 1 illustrates the first of these screens. After confirming all options, OpenCV is copied to hard disk and available for use. Figure 1. OpenCV s first installation screen In order to recompile the OpenCV source code using Microsoft Visual Studio, it is necessary to open the opencv.

5 the Next button again. The following three screens on the installation process show the standard installation configuration (place on hard disk where it should be installed and addition of installation directory to the PATH environment variable). Figure 1 illustrates the first of these screens. After confirming all options, OpenCV is copied to hard disk and available for use. Figure 1. OpenCV s first installation screen In order to recompile the OpenCV source code using Microsoft Visual Studio, it is necessary to open the opencv.sln file, located inside the _make folder. This way, the entire development environment is configured for a correct code compilation. In sequence, user must click on the Build Solution option, from the Build menu, as shown in Figure 2. Each OpenCV module is compiled into two link libraries (.dll and.lib). They integrate OpenCV with projects developed by the user. Figure 2. Compiling the OpenCV source code 5

6 1.4. Documentation OpenCV has a complete documentation, and many related forums and discussion groups, resulting in an efficient support for the library users. The installation application automatically copies a series of user support documents. Among these documents, the opencvman_old.pdf deserves special attention, since it is the library s reference manual. In this document it can be found a detailed description of all functions implemented by the OpenCV modules and example codes. The manual also explains general concepts related to OpenCV, such as specific data types defined and common implementation guidelines used in image processing. A discussion group about OpenCV is available at It is also available in the OpenCV installation an html documentation, which is a simpler and more frequently updated version of the library reference manual. When user registers himself/herself (the registration on the discussion group is free), he/she will have access to an extensive question and answer database involving OpenCV practical use situations. He/she will also have the possibility of posting questions that could not be simply solved using only existing documentation. There is a great chance that the question will be solved soon, since OpenCV s developers community has a large number of active members. 2. Functionalities This chapter explains a whole bunch of OpenCV functions related to several steps of the image processing pipeline. Functions related to preprocessing, segmentation, representation, and recognition are put into a context and their usage is explained Image enhancement and feature selection The main objective of image enhancement is to process an image so that the result is more suitable than the original image for a set of specific applications [2]. There are basically two approaches for image enhancement: spatial domain methods and frequency domain methods. The term spatial domain refers to the image plane itself, and approaches in this category are based on direct manipulation of image pixels. Frequency domain techniques are based on modifying the Fourier transform of an image. The OpenCV library handles only a subset of spatial domain techniques. Besides that, in this section, some functions related to feature selection (e.g. edges and corners) are presented Smoothing Smoothing filters are used for blurring and noise reduction. Blurring is used in preprocessing steps, such as removal of small details from an image. Besides that, noise reduction can be accomplished by blurring using a linear filter and by nonlinear filtering. An example of the use of smoothing is shown in Figure 3. 6

7 Figure 3. Original picture (left), a uniform kernel (center) and a Gaussian kernel (right) In the OpenCV library, there is a function related to smoothing filtering named cvsmooth, and since the smoothing operation is nothing more than a convolution with a specific matrix, it is possible to use the function cvfilter2d. Both functions are explained below. void cvsmooth( const CvArr* src, CvArr* dst, int smoothtype, int param1, int param2, double param3); The src argument is the source image, dst is the destination image, smoothtype is the type of the smoothing that can be applied (all the types are summarized in Table 2). Table 2. OpenCV smoothing types Smoothing type CV_BLUR_NO_SCALE CV_BLUR Description Summation over a param1 param2 pixel neighborhood. Summation over a param1 param2 pixel neighborhood with subsequent scaling by 1/(param1 param2). CV_GAUSSIAN Convolution of the image with a param1 param2 Gaussian kernel. CV_MEDIAN CV_BILATERAL Median of the param1 param1 neighborhood (i.e. a square neighborhood). Application of a bilateral 3x3 filtering with color sigma=param1 and space sigma=param2, as described in [3]. The only catch in the use of function cvsmooth is when a Gaussian smoothtype is used and the concern is with param3, because it indicates the Gaussian sigma (i.e., the standard deviation). If the result is zero, it is calculated from the kernel size, according to the following formula: sigma = ( n / 2 1) * , where sigma is param1 or param2, depending on the kernel orientation (horizontal or vertical). 7

a single-channel floating point matrix. The anchor indicates the relative position of a filtered point within kernel; a concern is that the anchor point must lie within the kernel.

8 Another function related to convolutions is the cvfilter2d, that convolves any linear kernel with an image. Its signature is: void cvfilter2d( const CvArr* src, CvArr* dst, const CvMat* kernel, CvPoint anchor); The src is the source image, dst is the destination image and kernel is the convolution kernel, a single-channel floating point matrix. The anchor indicates the relative position of a filtered point within kernel; a concern is that the anchor point must lie within the kernel. The special default value (-1,-1) means that it is at the kernel center Edge detection Edge detection techniques are inherently easy to implement and have a low computational complexity. As an example of edge detection application in tracking, RAPiD [4] is often cited, and can be seen in Figure 4. Figure 4. Some points are sampled along model edges (left), and these points are joined and used to infer pose (center). Occlusion can be treated in a robust way (right). This subsection presents all the OpenCV functions related to edge detection. The Sobel, Laplacian and Canny operators are described. Sobel operator The Sobel operator performs a 2D spatial gradient measurement on an image, emphasizing regions of high spatial gradient that correspond to edges. Typically it is used to find the approximate absolute gradient magnitude at each point in an input grayscale image. An example of the use of Sobel operator is illustrated in Figure 5. Figure 5. Grayscale image (left), Sobel x-gradient image (center) and Sobel y-gradient image (right) 8

9 In theory, the operator consists of a pair of 3 3 convolution masks as shown in Figure 6. The mask on the right is simply the other rotated by 90. Figure 6. Sobel convolution masks These masks are designed to respond maximally to edges, running vertically and horizontally relative to the pixel grid, one mask for each of the two perpendicular orientations. The masks can be applied separately to the input image, producing separate measurements of the gradient component for each orientation ( Gx and Gy ). These can then be combined together to find the absolute magnitude of the gradient at each point and the orientation of that gradient. The gradient magnitude is given by: G = G + G Although typically, an approximate magnitude is computed in a much faster way using: G = G + G In the OpenCV library, the Sobel operator is implemented via function cvsobel, and its signature is: void cvsobel( const CvArr* src, CvArr* dst, int xorder, int yorder, int aperture_size); The scr argument is the source image, dst is the destination image, xorder is the order of the x derivative, yorder is the same with the y axis. The parameter aperture_size is the size of the extended Sobel kernel; it must be 1, 3, 5 or 7. In all cases, except 1, aperture_size aperture_size kernel will be used to calculate the derivative. There is also a special value to aperture_size CV_SCHARR that corresponds to a 3x3 Scharr filter [5]. x x y y Laplacian operator The Laplacian is a 2D isotropic measure of the second spatial derivative of an image. The Laplacian of an image highlights regions of rapid intensity change and is therefore often used for edge detection. 9

10 The Laplacian is frequently applied to an image that has first been smoothed with an approximated Gaussian filter in order to reduce its sensitivity to noise; the two variants will be described together here. The operator normally takes a single graylevel image as input and produces another graylevel image as output. by: The Laplacian L ( x, y) of an image with pixel intensity values I ( x, y) is given 2 2 I I L( x, y) = x y This can be approximated by a convolution filter. Since the input image is represented as a set of discrete pixels, it is necessary to find a discrete convolution kernel that can approximate the second derivatives in the definition of the Laplacian. Two commonly used small kernels are shown in Figure 7. Figure 7. Two commonly used discrete approximations to the Laplacian filter The OpenCV function that implements the Laplacian of an image is cvlaplace, and its signature is: void cvlaplace( const CvArr* src, CvArr* dst, int aperture_size); The src argument is the input image, dst is the resulting image, and aperture_size is the same argument used by the Sobel operator. Canny operator The Canny operator was designed to be an optimal edge detector (according to particular criteria, i.e., there are other detectors around that also claim to be optimal with respect to slightly different criteria). It takes as input a gray scale image, and produces as output an image showing the positions of tracked intensity discontinuities. An example of the Canny operator is shown in Figure 8. As can be seen, the edges of the image were highlighted. 10

Figure 8. Raw image (left), and Canny operator resulting image (right) The Canny operator works in a multi-stage process. First of all the image is smoothed by Gaussian convolution.

11 Figure 8. Raw image (left), and Canny operator resulting image (right) The Canny operator works in a multi-stage process. First of all the image is smoothed by Gaussian convolution. Then, a simple 2D first derivative is applied to the smoothed image to highlight regions with high first spatial derivatives. Edges give rise to ridges in the gradient magnitude image. The algorithm then tracks along the top of these ridges and sets to zero all pixels that are not actually on the ridge top, so as to give a thin line in the output; this process is known as nonmaximal suppression. The tracking process exhibits hysteresis controlled by two thresholds: T 1 and T 2 with T 1 > T 2. Tracking can only begin at a point on a ridge higher than T 1. Tracking then continues in both directions out from that point until the height of the ridge falls below T 2. This hysteresis helps to ensure that noisy edges are not broken up into multiple edge fragments. The Canny operator is implemented in OpenCV via the function cvcanny and its signature is as follows: void cvcanny( const CvArr* image, CvArr* edges, double threshold1, double threshold2, int aperture_size); The image argument is the input image, edges is the image that will store the results, threshold1 and threshold2 are the two thresholds used in this operator, and aperture_size is the aperture parameter, just as with Sobel operator Corner detection One of the computer vision approaches available in literature to extract certain kinds of features from an image is corner detection, or more generally interest point detection. Figure 9 shows an interest point based markerless augmented reality application [6]. This algorithm is able to track using some interest points, for example, a computer, its position and orientation. As can be noted in Figure 9 (top left), most of the interest points are localized around corners; Figure 9 (bottom left) and (bottom right) show that the algorithm is robust to rotation and translation, since the board position is tracked correctly. 11

12 Figure 9. The wireframe is matched with the computer (top left), eight points are tracked (top right), and a board inside the computer is tracked (bottom left and bottom right) Some uses of feature detection include motion detection, tracking, image mosaicking, panorama stitching, 3D modeling, and object recognition. In the OpenCV library, there are three ways of detecting interest points: (1) directly calculating eigenvectors and eigenvalues, (2) via Harris operator and (3) using Good features to Track function. Both (1) and (2) alternatives are concerned with detecting corners in the image, and even that (3) is based on (2), the class of features this algorithm is able to handle is larger than just corners. These alternatives are described in sequence. Eigenvectors and eigenvalues The majority of feature detection algorithms are based on calculation of eigenvalues and eigenvectors, and this operation is implemented in OpenCV via two functions, cvcornereigenvalsandvects and cvcornermineigenval. Both functions are related to the covariation matrix of derivatives around a certain pixel. The covariation matrix is of the form: C 2 di di di dx dx dy, di di di dx dy dy = 2 where I ( x, y) is the intensity of the pixel in coordinates ( x, y). The function cvcornereigenvalsandvects calculates both eigenvalues and eigenvectors of the covariation matrix and cvcornermineigenval just calculates and stores its eigenvalues. 12

13 The signature of cvcornereigenvalsandvecs is: void cvcornereigenvalsandvecs( const CvArr* image, CvArr* eigenvv, int block_size, int aperture_size); The image argument is the input image, eigenvv is an image that stores the results and it must be 6 times wider than image, block_size is the neighborhood size, and aperture_size is the aperture parameter, just as with Sobel operator. The function signature of cvcornermineigenval is: void cvcornermineigenval( const CvArr* image, CvArr* eigenval, int block_size, int aperture_size); The parameters are the same as for cvcornereigenvalsandvecs function, except for eigenval, because this function just calculates and stores the minimal eigenvalue associated with the pixels; therefore, every eigenvalue is of the same size of image, instead of function cvcornereigenvalsandvecs that is 6 times wider and calculates and stores both eigenvalues and both eigenvectors. Harris corner detector The Harris corner detector computes the locally averaged moment matrix calculated from the image gradients, and then combines the eigenvalues of the moment matrix to compute a corner strength ; the maximum values of this result indicate the corner positions. It is also based on the covariation matrix. The key for unlocking the power of this matrix occurs when we examine its eigenvalues. When the matrix has two large eigenvalues this corresponds to the two separate principal directions in the underlying image gradient. This is calculated quickly and efficiently using a simple equation for the corner response, and stored in each image pixel: R C k trace C 2 = det( ) *( ( )), where k is a tunable parameter which determines how edge-phobic the response of the algorithm is. The result obtained by applying the algorithm to a picture can be seen in Figure 10. In (right) it is possible to see some white regions; these are local maxima that indicate that region as a corner. It should be noted that the Harris corner detector is also able to find edges, as can be seen in Figure 10 observing the dark gray regions. 13

int block_size, int aperture_size, double k); The image argument is the input image, harris_response is the image that will store the Harris detector result.

14 Figure 10. Grayscale image (left), and the result of applying Harris corner detector to the image (right) The function signature is shown below: void cvcornerharris( const CvArr* image, CvArr* harris_response, int block_size, int aperture_size, double k); The image argument is the input image, harris_response is the image that will store the Harris detector result. The parameter block_size is responsible for neighborhood size, as discussed for eigenvectors and eigenvalues. The aperture_size is the same of the Sobel operator. Finally, k is a tunable parameter. Good Features to Track The Shi and Tomasi s work on feature tracking is implemented in OpenCV via the function cvgoodfeaturestotrack. It is strongly based on the Harris corner detection operator. The basic idea of the algorithm is to monitor the quality of image features during tracking by using a measure of feature dissimilarity that quantifies the change of appearance of a certain feature between the first and the current frame. Below, in Figure 11, the features found in the left image are enhanced and shown in the right one. Figure 11. The first frame of sequence (left), and the features selected according to the established criterion (right) There are some concerns about this function. One of those is the need for two temporary floating-point 32 bit images of the same sizes of input image. Therefore, in applications where memory is a scarce resource, some problems may appear when using this function. 14

15 The function signature is show below: void cvgoodfeaturestotrack( const CvArr* image, CvArr* eig_image, CvArr* temp_image, CvPoint2D32f* corners, int* corner_count, double quality_level, double min_distance, const CvArr* mask, int block_size, int use_harris, double k); The image argument is the source single-channel image, eig_image and temp_image are temporary floating-point 32-bit images of the same size as image. The parameter corners is a previously allocated structure to store the detected corners, and corner_count is just the number of detected corners. The parameter quality_level is a number indicating minimal accepted quality of image corners, and min_distance is the minimal euclidian distance between corners. The parameter mask is the region of interest; if a pointer to NULL is passed, the whole image is used. The parameter block_size is the same as for the Harris corner detector function. When using that function it is possible to decide to use the Harris corner detector by setting use_harris to any value different from zero. At last, k is the free parameter of the Harris operator Segmentation The task of analyzing an image in order to distinguish specific elements is called segmentation. The following subsections explain some techniques used for accomplishing this goal Thresholding This technique consists in separating the objects of interest from the image background. It is often done in grayscale images, but can be also applied to other formats, such as color images. A common practice, when thresholding color images, is to consider the sum of the color components for each pixel. Thresholding is done by estimating level ranges that determine the pixels that belong to image background and to objects of interest. As can be seen in Figure 12 (left), when there is only one object to segment in an image f ( x, y), a threshold level T is specified and two level ranges are defined: f ( x, y) > T and f ( x, y) T. This operation is named single-level thresholding and the result is a binary image that distinguishes the pixels that belong to each level range. In multilevel thresholding, there are n objects to segment, requiring the use of different threshold levels T 1,, T n, as shown in Figure 12 (right). The result of this operation is a grayscale image with n distinct levels. 15

Figure 12. Single-level (left) and multi-level (right) thresholding. Thresholding operation can also be classified as global or local.

In local thresholding, the threshold levels T k depend on local properties of the pixels, such as the average level of their neighborhood.

Figure 13 shows an example where an object with constant illumination is properly segmented using global thresholding, while the results obtained

Local thresholding is more indicated for cases where the objects illumination is variable, since it takes into account local image features.

16 Figure 12. Single-level (left) and multi-level (right) thresholding. Thresholding operation can also be classified as global or local. In global thresholding, the same threshold levels T k are used for all pixels in the image. In local thresholding, the threshold levels T k depend on local properties of the pixels, such as the average level of their neighborhood. Global thresholding is more adequate to images where the objects have a constant illumination. Figure 13 shows an example where an object with constant illumination is properly segmented using global thresholding, while the results obtained with local thresholding are not satisfactory. Local thresholding is more indicated for cases where the objects illumination is variable, since it takes into account local image features. In Figure 14, local thresholding segments more pixels from the object than global thresholding, due to the variable nature of the objects illumination. Figure 13. Source image where object has a constant illumination (left), global thresholding result (center) and local thresholding result (right) Figure 14. Source image where objects have a variable illumination (left), global thresholding result (center) and local thresholding result (right) Many image processing algorithms used in real time applications do not handle color images. As a result, the original image has to be converted to a more suitable format, such as binary. This is where thresholding takes place. An example 16

17 of the use of thresholding in real time pattern recognition is present on the marker tracking performed by the ARToolKit augmented reality library [7]. The color images captured by a camera are analyzed in order to segment the dark pixels relative to the markers from the background. A global single-level thresholding is applied to the source image, with the threshold level T being specified by the user. A common value for T is 150. Considering a color pixel r, g, b ) from the source image, if r ( p p p + g + b < T, then the pixel is classified as marker pixel, else p p p 3 it is classified as background pixel. Figure 15 shows the results obtained by ARToolKit when thresholding an input frame. Figure 15. Global thresholding used in ARToolKit: source color image (left) and global thresholding result (right) OpenCV implements single-level thresholding and offers both a global and a local thresholding function for grayscale images. Global thresholding is implemented by the function cvthreshold, which has the following signature: void cvthreshold(const CvArr* src, CvArr* dst, double threshold, double max_value, int threshold_type); The src argument is the grayscale image to be thresholded. The dst argument is where the resulting binary image will be stored. The global threshold value T is specified by the threshold argument. The max_value argument is the level that will be used to distinguish the object s pixel from the background pixels. The threshold_type argument determines which function will be used in the thresholding operation. The five available thresholding types and their respective function are presented in Table 3, and their visual representation is illustrated in Figure

18 Table 3. OpenCV thresholding types and their functions Thresholding type Function max_value,if src( x, y) > threshold CV_THRESH_BINARY dst ( x, y) = 0, otherwise CV_THRESH_BINARY_INV 0,if src( x, y) > threshold dst ( x, y) = max_value, otherwise CV_THRESH_TRUNC threshold,if src( x, y) > threshold dst ( x, y) = src( x, y), otherwise CV_THRESH_TOZERO src( x, y),if src( x, y) > threshold dst ( x, y) = 0, otherwise CV_THRESH_TOZERO_INV 0,if src( x, y) > threshold dst ( x, y) = src( x, y), otherwise Figure 16. OpenCV thresholding types visual representation Local thresholding is implemented by the function cvadaptivethreshold, which has the following signature: 18

void cvadaptivethreshold(const CvArr* src, CvArr* dst, double max_value, int adaptive_method, int threshold_type, int block_size, double param1); The src, dst, max_value and threshold_type arguments

The adaptive_method argument determines how T is calculated. If it is CV_ADAPTIVE_THRESH_MEAN_C, then T is the mean of the block_sizexblock_size pixel neighborhood subtracted by param1.

19 void cvadaptivethreshold(const CvArr* src, CvArr* dst, double max_value, int adaptive_method, int threshold_type, int block_size, double param1); The src, dst, max_value and threshold_type arguments are the same from cvthreshold. The only local thresholding types available are CV_THRESH_BINARY and CV_THRESH_BINARY_INV. The threshold level T is computed for each pixel of the input image. The adaptive_method argument determines how T is calculated. If it is CV_ADAPTIVE_THRESH_MEAN_C, then T is the mean of the block_sizexblock_size pixel neighborhood subtracted by param1. If it is CV_ADAPTIVE_THRESH_GAUSSIAN_C, then T is a Gaussian weighted sum of the block_sizexblock_size pixel neighborhood subtracted by param1. This means that the nearest pixels will have a bigger influence on the result than the farthest ones Line detection In order to detect straight lines in an image, the first step is to perform an edge enhancement operation on the input image. Then, a threshold is applied to the result. The binary output of this operation can be used for the detection of lines. Some real time markerless 3D tracking techniques rely on the detection of line segments in the image [8]. A wireframe model of the object to be tracked is projected onto the image using an estimated camera pose. The projection and the image lines are compared in order to calculate the current camera pose. Figure 17 illustrates the process. Figure 17. Detected lines on the image (left) and estimated pose of the car (right). The operator used for performing line detection is the Hough transform. The idea behind the Hough transform is to decrease computational complexity of line detection by using a line representation on the parameter space rather than on the xy plane. Considering a point ( x i, yi ) in the xy plane, there is an infinite number of lines passing through it. These lines have the form yi = axi + b. Rearranging the equation in terms of parameters a and b, the resulting equation is b = xia + yi. This is equivalent to a line in the parameter space. Considering another point x, y ), the set of lines that passes through it is represented on the ( j j parameter space by b = x a + y. The intersection of the lines in the parameter j j 19

20 space at point ( a ', b' ) determines the parameters of the line that passes through both points x, y ) and x, y ) in the xy plane. Figure 18 shows graphics that ( i i illustrate this idea. ( j j Figure 18. Line representation on the xy plane (left) and on the parameter space (right). In the representation used by the algorithm, each axis of the parameter space is subdivided in ranges of equal size, as shown in Figure 19, generating cells named accumulators. Each accumulator ( a, b) in the parameter space represents a line in the xy plane. Figure 19. Accumulators in the parameter space. The first step of the algorithm is to assign all accumulators to zero. Then, the input image is scanned for edge points. For each edge point ( x i, yi ), the parameters ( a, b) of all the lines that pass though it are evaluated by using all values allowed by the subdivision on the equation b = xa + y. The corresponding accumulator for each parameter pair is incremented. After all edge points are treated, the values on the accumulators will be the number of edge points contained within the corresponding line. A threshold can then be applied to select the lines with a higher number of points. The use of the equation b = xa + y for representing lines leads to a problem when detecting vertical lines, since a tends to infinity. Due to this, the polar representation is preferred, which has the form ρ = x cosθ + y sinθ, where ρ is the distance between the line and the origin and θ is the angle coefficient of the line. 20

21 The execution time of the Hough transform can be reduced without significant loss in the results quality by using a probabilistic approach [9]. Instead of utilizing all edge points, only a random fraction of these points are considered. Line detection is performed by the cvhoughlines2 OpenCV function: CvSeq* cvhoughlines2( CvArr* image, void* line_storage, int method, double rho, double theta, int threshold, double param1, double param2 ); The image argument is the binary image from where the lines will be retrieved. The line_storage argument is a container for the detected lines data. The Hough transform variant to be used is specified in the method argument. Table 4 presents the available Hough transform methods. The rho and theta arguments are the distance and angle resolution, respectively. The threshold argument determines the minimum number of points needed by a line. The param1 and param2 arguments are only used if method is CV_HOUGH_PROBABILISTIC or CV_HOUGH_MULTI_SCALE. In probabilistic Hough transform, param1 is the minimum line length and param2 is the maximum distance between segments lying on the same line that do not cause their joining. In multi-scale Hough transform, param1 is the divisor for rho and param2 is the divisor for theta. Table 4. OpenCV Hough transform methods. Hough transform method CV_HOUGH_STANDARD CV_HOUGH_PROBABILISTIC CV_HOUGH_MULTI_SCALE Description Classical Hough transform. Retrieve all lines detected. Each line is represented by the distance to the origin ( ρ ) and the angle between the x-axis and the line normal (θ ). Probabilistic Hough transform. Retrieve all line segments detected. Each segment is represented by its starting and ending points. Multi-scale variant of the classical Hough transform. Retrieve all lines detected in the same way as CV_HOUGH_STANDARD Contour detection After applying an image enhancement operator for edge detection and thresholding the resulting image, the contours can be extracted from the image. In order to perform this task, two steps need to be covered: contour tracing and contour representation. In contour tracing, the existing contours are followed in the image. In contour representation, the contours are described in a meaningful way. Contour detection is widely used in real time pattern recognition solutions. In ARToolKit, the contours relative to the markers present on the input frame need to be segmented in order to enable recognition and pose estimation [7]. Figure 20 illustrates the operation. 21

Figure 20. Contour detection used in ARToolKit: source color image (left), enhanced edges (center) and detected contours highlighted on the image (right).

Then, the neighborhood of the first pixel is checked in clockwise direction to find the next pixel of the contour.

22 Figure 20. Contour detection used in ARToolKit: source color image (left), enhanced edges (center) and detected contours highlighted on the image (right). OpenCV uses the Suzuki algorithm to perform contour tracing [10]. In this algorithm, at first, the upper left contour pixel is found. Then, the neighborhood of the first pixel is checked in clockwise direction to find the next pixel of the contour. From now on, the search for the other pixels of the contour is done in anti-clockwise direction and ends when the first two pixels of the contour are found again. OpenCV performs contour representation using two different description types: chain codes and polygonal representation. Chain codes consist in a sequence of numbers that determine the neighborhood of a contour pixel where the next contour pixel is found. Figure 21 shows the codes for each neighborhood (left) and an example of a contour represented by a chain code. It can be noted that choosing a different starting point for the contour can give different chain code representations. This can be avoided by shifting the numbers of the chain code in a way that results in the integer of minimum magnitude. Figure 21. Chain codes for each neighborhood direction (left) and a chain code representation of a contour (right). The polygonal representation is a sequence of vertices that once linked together symbolizes the contour essence. Figure 22 exemplifies this type of representation, in which there is tradeoff between contour fidelity and codification overhead. 22

23 Figure 22. Polygonal representation. OpenCV provides several ways to organize the contours retrieved from an image. For example, the contours can be stored in a tree, where a contour C 1 is a parent of a contour C 2 if and only if C 1 contains C 2. Figure 23 shows how nested contours can be described by a tree. Figure 23. OpenCV hierarchical representation of contours. The function used to extract contours from a binary image is called cvfindcontours and is defined as follows: int cvfindcontours( CvArr* image, CvMemStorage* storage, CvSeq** first_contour, int header_size, int mode, int method, CvPoint offset); The image argument is the binary image from where the contours will be retrieved. The storage argument is a container for contour data. The first_contour argument is where a pointer to the first detected contour will be available after the function call. The header_size argument is relative to the size of the contour structure. The mode argument determines how the contours should be organized. Table 5 presents the available contour retrieval modes and their description. The method argument specifies the contour representation to be used. Table 6 describes the available representation methods. The offset argument is used to shift the retrieved points by an explicit amount of pixels. The function returns the number of contours found in the image. 23

24 Table 5. OpenCV contour retrieval modes. Retrieval mode CV_RETR_EXTERNAL CV_RETR_LIST CV_RETR_CCOMP CV_RETR_TREE Description Retrieve only the extreme outer contours. Retrieve all the contours in a list. Retrieve the connected components of the image (see topic 2.2.4). Retrieve all the contours hierarchically in a tree. Table 6. OpenCV contour representation methods. Representation method CV_CHAIN_CODE CV_CHAIN_APPROX_NONE CV_CHAIN_APPROX_SIMPLE CV_CHAIN_APPROX_TC89_L1 and CV_CHAIN_APPROX_TC89_KCOS CV_LINK_RUNS Description Chain code representation. Polygon representation where all contour pixels are returned as a vertex. Polygon representation where only the horizontal, vertical or diagonal segments ending points of the contour are returned as vertices. Two variants of the Teh-Chin polygon representation [11]. Only the contour pixels with high curvature are returned as vertices. Polygon representation where only the horizontal segments ending points of the contour are returned as vertices. After extracting the contours of an image, they can be handled in order to get information such as area and polygonal approximation. Contour area is calculated using the cvcontourarea function, described next: double cvcontourarea( const CvArr* contour, CvSlice slice); The area of the contour specified in the contour argument is returned by the function. If only a section of interest of the contour has to be considered in the calculation, it can be determined using the slice argument. Polygonal approximation of a contour is obtained using the cvapproxpoly function: CvSeq* cvapproxpoly(const void* src_seq, int header_size, CvMemStorage* storage, int method, double parameter, int parameter2); The src_seq argument is the contour to be approximated. The header_size and storage have the same purpose of the ones from 24

25 cvfindcontours. The only currently acceptable value for the method argument is CV_POLY_APPROX_DP, which corresponds to the Douglas-Peucker polygon approximation algorithm [12]. The parameter argument determines the tolerance value ε to be used by the algorithm. The parameter2 argument specifies if the hierarchical organization should be respected or if the contour is closed or not. This algorithm is based on the distance between a vertex and an edge segment, and on a tolerance ε. To start the algorithm, two extreme points of the polygon are connected. This connection defines the first edge to be used. Then, the distance between each remaining vertex and this edge is tested. If there are distances bigger than ε, then the vertex with the biggest distance away from the edge is added to the simplification. This process continues recursively for each edge of the current step until all distances between the vertices of the original polyline and the simplification are within the tolerance distance. Figure 24. Douglas-Peucker polygon approximation algorithm Connected components labeling Labeling of a binary image refers to the act of assigning a unique value to pixels belonging to a same connected region. After thresholding the input image, neighboring pixels that belong to objects of interest are associated with a label. Knowledge of the connected components of an image is very useful for automated recognition processes. This operation is often used in real time pattern recognition applications. Labeling is present on the image processing pipeline of the ARToolKit library [7]. The connected components of the thresholded image are labeled in order to identify the regions relative to marker borders and inner template, as can be seen in Figure

26 Figure 25. Connected components labeling used in ARToolKit: thresholded image (left) and labeling results (right). The principle of the labeling algorithm consists in scanning the binary image from the upper left pixel for objects pixels. For each objects pixels, their neighboring pixels that have already been scanned are examined. There are three options for the labeling of the pixel: If there are no objects pixels among the examined pixels, a new label is created and assigned to the current pixel; If there is one and only one object pixel among the examined pixels, the label of this pixel is assigned to the current pixel; If there are more than one objects pixels among the examined pixels, one of the labels is assigned to the current pixel and an equivalence between the different labels is created. When all the objects pixels are labeled, the equivalent labels are grouped in equivalence classes, each one with a unique label. The image is then scanned once again to solve the equivalences and define the connected components. In OpenCV, labeling is performed using the cvfindcontours function with CV_RETR_CCOMP set as the mode argument. The connected components are organized in a hierarchical manner. The external contours of the components are put on the first level of the tree. The contours of the holes present on a component are put on the second level Object tracking The OpenCV library comes bundled with some object tracking functions. This tutorial will cover three techniques which may be implemented using these featured functions. Template matching uses a simple template to scan the image storing the results. CamShift is an adaptive algorithm which tries to find an object based on its histogram back-projection. Optical flow is a technique that uses the flow of each pixel to track the object. Other free combinations can be performed in order to achieve different results Template matching Template matching is one of the simplest ways of finding an object within a picture. The goal of this technique is to scan all the pixels of the image and find every instance of a specific object described by a template. An application of template matching for real time pattern recognition can be found on some markerless augmented reality systems, such as the one shown in 26

3D tracking with template matching - green lines mark the template image (left) and augmented scene (right).

27 Figure 26 and described in [13]. The difference between a region of the image and a reference template is minimized. After that, the parameters of a function that warps the template into the target image are calculated, so that tracking can be done. Figure 26. 3D tracking with template matching - green lines mark the template image (left) and augmented scene (right). OpenCV provides such feature through the cvmatchtemplate function: void cvmatchtemplate( const CvArr* image, const CvArr* templ, CvArr* result, int method ); All the comparison results are stored into the result variable, which can be interpreted like an image. The image argument stands for the original image, and the template is the object which is going to be detected. The appearance of such image will be related with the template matching: the closer the matched template, the higher the pixel value stored, which will be seen as a bright pixel. The method argument is a variable that tells the function which method to use in the comparison and it is detailed in Table 7. OpenCV supports several methods for comparing the template and the image: Table 7. Template matching methods. Method CV_TM_SQDIFF CV_TM_SQDIFF_NORMED CV_TM_CCORR CV_TM_CCORR_NORMED CV_TM_CCOEFF CV_TM_CCOEFF_NORMED Description The squared difference between the template and the image. The normalized squared difference. The cross correlation between template and image. The normalized cross correlation. The correlation coefficient. The normalized correlation coefficient. Figure 27 shows an example of a template used to match with the input image illustrated in Figure 28 (top). Figure 28 (bottom) shows the results obtained by matching this template with the input image. 27

3.2. CamShift CamShift stands for Continuous Adaptive Mean Shift and is an iterative algorithm [14].

28 Figure 27. Template used by the template matching technique. Figure 28. Example of template matching usage. Original image (top) and template matching result (bottom) CamShift CamShift stands for Continuous Adaptive Mean Shift and is an iterative algorithm [14]. It uses the mean shift algorithm, which iterates to find the object center given its 2D color probability distribution image. Then, the algorithm calculates the object s size and orientation. The algorithm s workflow is shown in Figure 29, and its steps are described in sequence. 28

29 Figure 29. CamShift workflow. The CamShift algorithm is described step by step next: 1. Set the calculation region of the probability distribution to the whole image. 2. Choose the initial location of the 2D mean shift search window. 3. Calculate the color probability distribution in the 2D region centered at the search window location in an region of interest (ROI) slightly larger than the mean shift window size. 4. Run mean shift algorithm [14] to find the search window center. Store the moment (area or size) and center location. 5. For the next video frame, center the search window at the mean location stored in Step 4 and set the window size to a function of the previous moment found there. Go to Step 3. The CamShift algorithm is implemented in OpenCV by the cvcamshift function: int cvcamshift( const CvArr* prob_image, CvRect window, CvTermCriteria criteria, CvConnectedComp* comp, CvBox2D* box); The prob_image parameter is the back projection of the object histogram, which can be calculated by the cvcalcbackproject function, as show in Figure 30. The window parameter is the initial search window which the algorithm will use in the iterations. The criteria parameter is used to tell the application when it should stop searching the next window. The user can specify whether the criterion is the number of iterations or an epsilon threshold for the windows similarity. The comp parameter stands for a Connected Component which will contain information about the converged search window. The window coordinates can be retrieved by calling comp->rect and the sum of all pixels inside the window can be retrieved by the comp->area field. The box parameter contains the object size and orientation at the end of the algorithm. Figure 31 shows an example of face tracking using CamShift. 29

The displacement of these pixels over time can be used to estimate camera movement in applications like the one shown in

30 Figure 30. Histogram back projection example. Figure 31. CamShift face tracking example Optical flow Optical flow is a technique for measuring velocity of image pixels by comparing them with a previous frame. The displacement of these pixels over time can be used to estimate camera movement in applications like the one shown in Figure 32, which performs 3D face tracking [15]. Figure 32. Optical flow used for 3D face tracking. There are several algorithms to implement optical flow, and OpenCV implements four of them, which are explained next. Lucas & Kanade technique The optical flow task is reduced to a linear system by applying the optical flow equation to a group of adjacent pixels and assuming that all of them have the 30

31 same velocity [16]. This technique is fast enough to be used in real time, because it does not process the whole image. The OpenCV function for this technique is cvcalcopticalflowlk, which is used by a sample described in Chapter 3. void cvcalcopticalflowlk( const CvArr* prev, const CvArr* curr, CvSize win_size, CvArr* velx, CvArr* vely ); The prev and curr arguments correspond to the two temporally adjacent frames of the image used to calculate the velocity. The win_size parameter is the average window which will work as a base for the whole image. The velx and vely parameters are, respectively, the horizontal and vertical components of the optical flow for every pixel. They have the same size as the input images. OpenCV implements also a pyramidal approach of the Lucas & Kanade algorithm [17]. The overall pyramidal tracking algorithm works as follows: first, the optical flow is computed at the deepest pyramid level L m. Then, the result of that computation is propagated to the upper level in a form of an initial guess for the pixel displacement. Given that initial guess, the refined optical flow is computed at level 0 (the original image). The OpenCV function for this technique is called cvcalcopticalflowpyrlk. void cvcalcopticalflowpyrlk( const CvArr* prev, const CvArr* curr, CvArr* prev_pyr, CvArr* curr_pyr, const CvPoint2D32f* prev_features, CvPoint2D32f* curr_features, int count, CvSize win_size, int level, char* status, float* track_error, CvTermCriteria criteria, int flags ); The prev and curr arguments are the two frames used for optical flow calculation. The prev_pyr and curr_pyr arguments are buffers used by the algorithm to store the partial images. The prev_features and curr_features arguments are the arrays of points that will be tracked. The first buffer contains the points, and the second buffer will store the new found positions. The count argument is the number of points to be tracked. The win_size parameter stands for the size of the search window for each pyramid, and level stands for the maximum pyramidal level. The status parameter is an array which contains 1 if that feature has been successfully tracked and the flow calculated and 0 otherwise. The track_error argument is an optional parameter which contains differences between patches around the original and moved points The criteria parameter is used to tell the application when it should stop searching the next window. The flags parameter can be set for saving some processing time, since it is possible to assume that one pyramid buffer is already calculated. The available flags are listed in Table 8. 31

32 Table 8. Pyramidal optical flow flags. Flag CV_LKFLOW_PYR_A_READY CV_LKFLOW_PYR_B_READY CV_LKFLOW_INITIAL_GUESSES Description Pyramid for the first frame is precalculated before the call. Pyramid for the second frame is precalculated before the call. Array curr_features contains initial coordinates of features before the function call. Horn & Schunck technique This is a function for finding the optical flow pattern which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image [18]. The Horn & Schunck optical flow technique is implemented by OpenCV through the cvcalcopticalflowhs function: void cvcalcopticalflowhs( const CvArr* prev, const CvArr* curr, int use_previous, CvArr* velx, CvArr* vely, double lambda, CvTermCriteria criteria ); The use_previous parameter denotes whether to use or not the calculation of the previous velocity field. The lambda field is called the Lagrangian multiplier. It must be smaller for a noisy image and larger for a clean and accurate image. The prev, curr, velx, vely and criteria arguments are the same as explained on the Lucas & Kanade technique topic. Block matching technique This technique does not use an optical flow equation directly. Instead, it uses more like a pattern matching technique. It uses a box inside the first image and tries to find another box of the same size in the second image that is similar to the first box. This algorithm presents an approximation result compared to the other techniques. This technique is implemented by the cvcalcopticalflowbm function: 32

33 void cvcalcopticalflowbm( const CvArr* prev, const CvArr* curr, CvSize block_size, CvSize shift_size, CvSize max_range, int use_previous, CvArr* velx, CvArr* vely ); The block_size and shift_size parameters denote the characteristics of the block used. The max_range parameter is the size of the neighborhood pixels around the block that are scanned. The prev, curr, velx, vely and criteria arguments are the same as explained on the previous topic Object detection Object detection is a feature bundled with OpenCV library. It uses a classifier, proposed by Paul Viola [19] and Rainer Lienhart [20], which can identify many different objects. The process begins by selecting a few hundred sample views of the desired object in many angles and illumination conditions. This set of images is called the positive samples. Then, another set of images must be chosen, containing arbitrary images not including the selected object. This set is called the negative samples. Afterwards, both sets must be used for training a cascade of boosted classifiers. The word "cascade" in the classifier context means that the resultant classifier consists of several simpler classifiers that are applied subsequently to a region of interest until at some stage the candidate is rejected or all the stages are passed. The word "boosted" means that the classifiers, at every stage of the cascade, are complex and built out of basic classifiers using one of four different boosting techniques (chosen performing a weighted voting). Currently, Discrete Adaboost, Real Adaboost, Gentle Adaboost and Logitboost are supported [21]. After the classifier is trained, it can be used to detect the object in a region of interest. The classifier is designed so that it can detect objects of different sizes by resizing the region of interest. OpenCV has a complete face detection project, detailed in Chapter 3, which implements the cascade classifier described above. This project can be easily adapted to any object, since a new set of images can be used on the training stage without modifying much of the code. Figure 33 illustrates an example of object detection application [22]. 33

34 Figure 33. Component detection example. There are some important functions that should be detailed for better understanding. The OpenCV function cvload is used to import a generated XML file during training stage: void* cvload( const char* filename, CvMemStorage* memstorage, const char* name, const char** real_name); The single parameter that must me passed to this function when loading a cascade is the filename, which must contain the address of the XML file. The other arguments are not necessary to specify when loading a Cascade file. The cvhaardetectobjects function returns a sequence of squares for every object detected: CvSeq* cvhaardetectobjects(const CvArr* image, CvHaarClassifierCascade* cascade, CvMemStorage* storage, double scale_factor, int min_neighbors, int flags, CvSize min_size); The image parameter stands for the image where the objects must be detected. cvload function returns its result through the cascade parameter. During the search, potential rectangle candidates are stored in the storage parameter. The scale_factor is the factor by which the search window is scaled every subscan. The min_neighbors parameter works grouping neighbor rectangles. The only flag currently supported by the flags parameter is CV_HAAR_DO_CANNY_PRUNNING, which uses a canny edge detector (see Subsection 2.1.2) to reject some image regions that do not contain the searched object. min_size stands for the minimum detection window size. 3. Example applications Some code samples are also distributed in the default package delivered by Intel. Inside the subfolder samples, located in the installation folder, there are source codes that may help a beginner OpenCV user. Among these files, some samples deserve a closer look, taking into account their relevance for pattern 34

Square detection The square detection sample attempts to find and highlight squares contained in pictures loaded.

35 detection and the use of concepts previously presented in this tutorial. Therefore, samples regarding square detection, the use of CamShift object tracking algorithm and Kanade Lucas tracker will be described as follows Square detection The square detection sample attempts to find and highlight squares contained in pictures loaded. In order to find squares, this sample implements a function that returns a sequence containing the vertexes of the squares detected. First, the sequence that will be used to carry the result is created, making possible any further addition of vertexes along the function. After this process, some filters are applied to enhance edges considering many thresholds. At each threshold, the function cvfindcontours will really split each contour and store them in a sequence object. For each contour the function cvapproxpoly will calculate an approximated polygon based on result output by cvfindcontours. If some conditions were satisfied (like having four vertexes and a convex contour), the polygon detected is a rectangle. Figure 34 illustrates two samples with squares highlighted by this demo CamShift demo Figure 34. Square detection sample. This sample shows how to apply the cvcamshift function to detect a pattern in an image acquired from a webcam, based on information given by a histogram. The histogram contains the color information of a pattern, and this demo uses it to find an area on the image that matches with the histogram data. To make tracking possible, user must give as input to the application a subarea of the captured frame. This area is taken as base to compute a hue histogram, which will serve as pattern to search the whole frame looking for a correspondent match. A histogram sample can be seen in Figure 35 that represents the face detected in Figure 36. This histogram serves as a color identity. The functions used for creating and calculating the histogram are cvcreatehist and cvcalchist, respectively. 35

Figure 35. Hue histogram. To find the pattern over a frame, one must compute the back projection of the hue plane, using the histogram.

To calculate the image equivalent to the back projection of the image based on a histogram, user must call cvcalcbackprojection. The back projection result can be illustrated by Figure 36 (right).

The CamShift algorithm is implemented by the cvcamshift function. An example of the CamShift sample being applied to a face may be found in Figure 36 (left).

36 Figure 35. Hue histogram. To find the pattern over a frame, one must compute the back projection of the hue plane, using the histogram. The back projection is an image where each position has a probability of composing an object like the one tracked. To calculate the image equivalent to the back projection of the image based on a histogram, user must call cvcalcbackprojection. The back projection result can be illustrated by Figure 36 (right). Given the result of back projection, the CamShift algorithm is applied to return the track box, which will contain the oriented rectangle that surrounds the matched area. The CamShift algorithm is implemented by the cvcamshift function. An example of the CamShift sample being applied to a face may be found in Figure 36 (left). The algorithm attempts to search the largest area that has connected neighbors having high probability of being part of the solution. Figure 36. CamShift detecting a face (left) and back projection of the image (right) Kanade Lucas tracker The Kanade Lucas tracker is a common feature tracker implemented using a pyramidal approach that uses optical flow to compute the new positions of the selected features. A feature can be explained as a point in a mostly rigid scene, which may be matched with another point in the next frame, conserving the semantic value of being in the same 3D point of the real scene. To generate some features to populate the application and test the behavior of the tracker, a function called cvgoodfeaturestotrack can be used. This function selects the strongest corners in the image to be the features that will 36

be tracked in the ongoing application. To refine the position of the selected corners, cvfindcornersubpix may be used in addition to the proper function.

37 be tracked in the ongoing application. To refine the position of the selected corners, cvfindcornersubpix may be used in addition to the proper function. A result of this function can be verified in Figure 37. Figure 37. Frame showing the "good features to track". In the main loop, the program tests if the user has added a new feature manually, and also refines the position of this point to match the closest corner in the image. This is needed because corners are easier to track and facilitate optical flow calculation. To calculate the next position of the features shown in the last frame, the cvcalcopticalflowpyrlk function is used. This function implements the Kanade Lucas tracker and needs some temporary images to represent the pyramidal approach. They act like subsampled images that aid the algorithm to track optical flows with large movements. The result of this function is the new position of the points given as parameters Face detection The face detection sample uses the Haar classifier to recognize a face pattern. The demo loads a pretrained Haar classifier and uses it to detect the objects in the current frame. The Haar classifier used in this demo is the HaarClassifierCascade, which differs from other classifiers by cascading multiple Haar classifiers. To load the classifier, the sample uses the cvload function and then casts the result back to CvHaarClassifierCascade type. This classifier will be used as parameter of the cvhaardetectobjects function. With this function, user can find all the regions that match with the objects the cascade has been trained for. As return value, the cvhaardetectobjects function outputs a sequence of rectangles containing the matched objects. A good snapshot of a runtime instance of face detection is shown in Figure

38 Figure 38. Haar classifier detecting faces. 4. Final considerations This tutorial shows that OpenCV is a sophisticated computer vision library containing a large amount of functions related to pattern recognition. These functions are base for any image processing project that aims to detect a specific or general pattern. The OpenCV base functions for pattern detection are easy to use and follow the same structure, sharing common parameters offered by the library. Beyond the framework simplicity, OpenCV comes with an interface library, which will speed up the prototyping period of a test or application using OpenCV. This library is called HighGUI and a template code sample is shown in the Appendix of this tutorial. 5. References [1] G. Bradski and V. Pisarevsky. Intel's Computer Vision Library: applications in calibration, stereo segmentation, tracking, gesture, face and object recognition, Proceedings of the Conference on Computer Vision and Pattern Recognition, Hilton Head Island, USA, [2] R. Gonzalez and R. Woods. Digital Image Processing. Prentice Hall, [3] C. Tomasi and R. Manduchi. Bilateral Filtering for Gray and Color Images, Proceedings of the International Conference on Computer Vision, Bombay, India, [4] C. Harris. Tracking with Rigid Objects. MIT Press, [5] H. Scharr. Digitale Bildverarbeitung und Papier: Texturanalyse mittels Pyramiden und Grauwertstatistiken am Beispiel der Papierformation. Diplomarbeit, Fakultät für Physik und Astronomie, Ruprecht-Karls- Universität Heidelberg, [6] M. Uenohara and T. Kanade. Vision-Based Object Registration for Real- Time Image Overlay. Journal of Cognitive Neuroscience 3, 71 86, [7] H. Kato and M. Billinghurst. Marker Tracking and HMD Calibration for a Video-Based Augmented Reality Conferencing System, Proceedings of the Workshop on Augmented Reality, San Francisco, USA,

An Implementation on Object Move Detection Using OpenCV

An Implementation on Object Move Detection Using OpenCV Professor: Dr. Ali Arya Reported by: Farzin Farhadi-Niaki Department of Systems and Computer Engineering Carleton University Ottawa, Canada I. INTRODUCTION