Object Move Controlling in Game Implementation Using OpenCV

Object Move Controlling in Game Implementation Using OpenCV Professor: Dr. Ali Arya Reported by: Farzin Farhadi-Niaki Lindsay Coderre Department of Systems and Computer Engineering Carleton University Ottawa, Canada I. INTRODUCTION Computer vision is a rapidly growing field, partly as a result of both cheaper and more capable cameras, partly because of affordable processing power, and partly because vision algorithms are starting to mature. OpenCV itself has played a role in the growth of computer vision by enabling thousands of people to do more productive work in vision. With its focus on real-time vision, OpenCV helps students and professionals efficiently implement projects and jump-start research by providing them with a computer vision and machine learning infrastructure that was previously available only in a few mature research labs. II. METHODOLOGY Computer vision is the transformation of data from a still or video camera into either a decision or a new representation. All such transformations are done for achieving some particular goal. The input data may include some contextual information such as "the camera is mounted in a car" or "laser range finder indicates an object is 1 meter away". The decision might be "there is a person in this scene" or "there are 14 tumour cells on this slide". A new representation might mean turning a color image into a grayscale image or removing camera motion from an image sequence. For robotics, we need object recognition (what) and object location (where): a) Object recognition In OpenCV, there are a wide range of approaches to detect an object; the techniques such as: convolution/filters, thresholds, histogram and matching, contours, efficient nearest neighbour matching to recognize objects using huge learned databases of objects, etc. b) Object location Again, in OpenCV, there are various techniques for finding the object location, e.g. background subtraction (to find the moving objects), corner finding, optical flow, mean-shift and camshaft tracking, structure from motion using SIFT descriptor and SURF gradient histogram grids, or just simply finding the edge of the object and momentarily checking the location of object s centre on pixel by pixel based movement. A. Convolution Convolution is the basis of many of the transformations. In the abstract, this term means something we do to every part of an image. What a particular convolution "does" is determined by the form of the Convolution kernel being used. This kernel is essentially just a fixed size array of numerical coefficients along with an anchor point in that array, which is typically located at the center. The size of the array is called the support of the kernel.

We can express this procedure in the form of an equation. If we define the image to be I(x, y), the kernel to be G(i, j) (where 0 < i < M i 1 and 0 < j < M j 1), and the anchor point to be located at (a i, a j ) in the coordinates of the kernel, then the convolution H(x, y) is defined by the following expression: B. Canny The most significant new dimension to the Canny algorithm is that it tries to assemble the individual edge candidate pixels into contours. These contours are formed by applying a hysteresis threshold to the pixels. This means that there are two thresholds, an upper and a lower. If a pixel has a gradient larger than the upper threshold, then it is accepted as an edge pixel; if a pixel is below the lower threshold, it is rejected. If the pixel's gradient is between the thresholds, then it will be accepted only if it is connected to a pixel that is above the high threshold. void cvcanny( const CvArr* img, CvArr* edges, double lowthresh, double highthresh, int aperturesize = 3 ); The cvcanny() function expects an input image, which must be grayscale, and an output image, which must also be grayscale. C. Threshold double cvthreshold( CvArr* src, CvArr* dst, double threshold, double max_value, int threshold_type ); Frequently we have done many layers of processing steps and want either to make a final decision about the pixels in an image or to categorically reject those pixels below or above some value while keeping the others. The OpenCV function cvthreshold() accomplishes these tasks. The basic idea is that an array is given, along with a threshold, and then something happens to every element of the array depending on whether it is below or above the threshold. The cvthreshold() function handles only 8-bit or floating-point grayscale source images. CV_THRESH_BINARY_INV dst i = (src i > T)? M : 0

Fig 1. Each threshold type corresponds to a particular comparison operation between the i th source pixel (src i) and the threshold (denoted in the table by T). Depending on the relationship between the source pixel and the threshold, the destination pixel dst i may be set to 0, the src i, or the max_value (denoted in the table by M). D. Adaptive Threshold There is a modified threshold technique in which the threshold level is itself variable. In OpenCV, this method is implemented in the cvadaptivethreshold() function: void cvadaptivethreshold( CvArr* src, CvArr* dst, double max_val, int adaptive_method = CV_ADAPTIVE_THRESH_MEAN_C int threshold_type = CV_THRESH_BINARY, int block_size = 3, double param1 = 5 ); cvadaptivethreshold() allows for two different adaptive threshold types depending on the settings of adaptive_method. In both cases the adaptive threshold T(x, y) is set on a pixel-by-pixel basis by computing a weighted average of the b-by-b region around each pixel location minus a constant, where b is given by block_size and the constant is given by param1. If the method is set to CV_ADAPTIVE_THRESH_MEAN_C, then all pixels in the area are weighted equally. If it is set to CV_ADAPTIVE_THRESH_GAUSSIAN_C, then the pixels in the region around (x, y) are weighted according to a Gaussian function of their distance from that center point. The adaptive threshold technique is useful when there are strong illumination or reflectance gradients that you need to threshold relative to the general intensity gradient. This function handles only singlechannel 8-bit or floating-point images, and it requires that the source and destination images be distinct.

Fig 2. Binary threshold versus adaptive binary threshold: the input image (top) was turned into a binary image using a global threshold (lower left) and an adaptive threshold (lower right); raw image courtesy of Kurt Konolidge E. Contours Although algorithms like the Canny edge detector can be used to find the edge pixels that separate different segments in an image, they do not tell us anything about those edges as entities in themselves. The next step is to be able to assemble those edge pixels into contours. cvfindcontours()is a convenient function in OpenCV that will do exactly this for us. Specifically, with assigning memory storages, OpenCV functions gain access to memory when they need to construct new objects dynamically; then we will need to use sequences (as something similar to the generic container classes), which are the objects used to represent contours generally. With those concepts in hand, we will get into contour finding. A contour is a list of points that represent a curve in an image. Contours are represented in OpenCV by sequences in which every entry in the sequence encodes information about the location of the next point on the curve. The function cvfindcontours() computes contours from binary images. It can take images created by cvcanny(), which have edge pixels in them, or images created by functions like cvthreshold() or cvadaptivethreshold(), in which the edges are implicit as boundaries between positive and negative regions. Drawing a contour on the screen using cvdrawcontours() function is the next step. Here we create a window with an image in it. A trackbar sets a simple threshold, and the contours in the thresholded image are drawn. The image is updated whenever the trackbar is adjusted. III. DISCUSSIONS OpenCV, short for Open Computer Vision, is used in the purpose of our project so that movement captured from the camera is translated into movement of objects within the game interface. A basic framework is used as the foundation to create any simple game interface. In terms of code, there are three classes created: Camera, game and object contain all of the basic functions that all games require.

A. Camera Class In this project, using OpenCV, we have tried to implement a simulation of object move detection (Fig. 3) where would be used in game control. Fig 3. The camera result (including four windows). The procedure of requirements for this implementation through Camera class has the following steps: 1) Capturing the image as a frame from a camera. 2) Converting the image/frame to 8-bits grayscale image to be compatibly usable in some filters like Canny. 3) Creating trackbars for up and down thresholds (Fig. 4). Fig 4. Trackbars for up and down thresholds. 4) Finding the edge of the object using Canny or Threshold (Adaptive Threshold) functions (Fig. 5).

Fig 5. Edge of the object. 5) Creating dynamic structure and sequence for the contours, and finding the current location (x, y) of the object s centre after finding its contours. 6) Finally, transferring the objects location (x, y) to the Game class to control the objects movements on the screen. B. Object Class The Object class fundamentally contains the functions to draw a box and a circle. Both of these functions have variables to describe its size, position and colour. Depending on the game, certain other variables and functions are added, such as velocity, if the object is continually moving around the game, or detectcollision() if the object is going to interact with other objects. void Object::drawCircle( CvArr* dst, int centerx, int centery, int radius, int b, int g, int r ) { cvcircle( dst, cvpoint( centerx, centery ), radius, cvscalar( b, g, r ), -1 ); } C. Game Class The Game class contains all of the game logics. It initializes all of the objects and the camera. It creates the game interface window and draws the scene. Any other game properties can also be added, such as a score or a function to continually animate an object around the screen. Two of the more important functions are the movement functions. They describe how the objects move will be controlled by the users, whether in every direction or only vertically or horizontally. These functions use the u and v values that are outputted from the Camera class, where u and v will become the x and y positions of the object designated. IV. RESULTS To explore the possibilities of this framework, some games were developed.

A. Pong Pong was first created to explore the classes (Fig. 6). Two paddles are controlled by the user to hit a ball back and forth. It also keeps score of how many times a paddle lets the ball past. Some extra functions were added. For example, the ball has a spin when it hits a paddle. Depending on where the ball hits, the y velocity of the ball increases or decreases. void spinball( Object paddle ) { int balllocation; float ballpercent; } balllocation = ball.cy - paddle.y; ballpercent = (float)balllocation/(float)paddle.height; ballpercent = ballpercent*10; ball.yvel = (int)(ballpercent + 0.5); Fig 6. Pong game result. B. Maze Another game created was a Maze (Fig. 7). A ball moves within a path. The background image of the path is analyzed and the colour value is stored in a matrix. Before the ball moves, it evaluates the wanted destination based on the u and v values from the camera. If the value in the matrix indicates that the colour is not black, the ball moves to the new location. This way the ball can only travel on the path. Setting the Matrix: for( int i=0; i < bg->width; i++ ) for( int j=0; j < bg->height; j++ ) { CvScalar pixelvalue = cvget2d( bg, j, i ); cvmset( matrix, i, j, pixelvalue.val[0] ); }

Fig 7. Maze game result. C. Third Game There are endless possibilities for simple games using this framework. A ball could collect boxes by adding detectcollision() for each box and a visibility Boolean that determines whether the box is on screen after it collides with the ball (Fig. 8). A breakaway game could be made using an animated ball, a paddle with a horizontal spin function and a bunch of boxes with the detectcollision() and visibility Boolean combination. Game logics for individual games can be made into its own class and members can be added to the Object class to make the game as optimal as possible. Fig 8. Third game result.

V. CONCLUSION Computer vision applications are growing rapidly, from product inspection to image and video indexing on the Web to medical applications and even to local navigation on Mars. OpenCV is also growing to accommodate these developments. One of the key new development areas for OpenCV is robotic perception. This effort focuses on 3D perception as well as 2D plus 3D object recognition since the combination of data types makes for better features for use in object detection, segmentation and recognition. Robotic perception relies heavily on 3D sensing, so efforts are under way to extend camera calibration, rectification and correspondence to multiple cameras and to camera plus laser rangefinder combinations. Creating capable robots subsumes most fields of computer vision and artificial intelligence, from accurate 3D reconstruction to tracking, identifying humans, object recognition, and image stitching and on to learning, control, planning, and decision making. Any higher-level task, such as planning, is made much easier by rapid and accurate depth perception and recognition. It is in these areas especially that OpenCV hopes to enable rapid advance by encouraging many groups to contribute and use ever better methods to solve the difficult problems of real-world perception, recognition, and learning. OpenCV will, of course, support many other areas as well, from image and movie indexing on the web to security systems and medical analysis. The wishes of the general community will heavily influence OpenCV's direction and growth. There is a worldwide community of interactive artists who use OpenCV so that viewers can interact with their art in dynamic ways. The most commonly used routines for this application are face detection, optical flow, and tracking. The focused effort on improving object recognition will allow different modes of interacting with art, because objects can then be used as modal controls. With the ability to capture 3D meshes, it may also be possible to "import" the viewer into the art and so allow the artist to gain a better feel for recognizing user action; this, in turn, could be used to enhance dynamic interaction. The needs and desires of the artistic community for using computer vision will receive enhanced priority in OpenCV's future. A group of manufacturers are aiming to develop cell-phone projectors perfect for robots, because most cell phones are lightweight, low-energy devices whose circuits already include an embedded camera. This opens the way for close-range portable structured light and thereby accurate depth maps, which are just what we need for robot manipulation and 3D object scanning. Computer vision has a rich future ahead, and it seems likely to be one of the key enabling technologies for the 21st century. Likewise, OpenCV seems likely to be (at least in part) one of the key enabling technologies for computer vision. Endless opportunities for creativity and profound contribution lie ahead. REFERENCES [1] Bradski, G., Kaehler, A. 2008, Learning OpenCV: Computer Vision with the OpenCV Library. O Reilly Media Inc., Sebastopol, CA.