Creating Images Using Objects. Kurt Lawrence

Size: px

Start display at page:

Download "Creating Images Using Objects. Kurt Lawrence"

Adelia Joleen Miller
5 years ago
Views:

1 Creating Images Using Objects Kurt Lawrence Bachelor of Science (Honours) in Computer Science with Mathematics The University of Bath May 2010

2 This dissertation may be made available for consultation within the University Library and may be photocopied or lent to other libraries for the purposes of consultation. Signed:

3 Creating Images Using Objects Submitted by: Kurt Lawrence COPYRIGHT Attention is drawn to the fact that copyright of this dissertation rests with its author. The Intellectual Property Rights of the products produced as part of the project belong to the University of Bath (see This copy of the dissertation has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the dissertation and no information derived from it may be published without the prior written consent of the author. Declaration This dissertation is submitted to the University of Bath in accordance with the requirements of the degree of Bachelor of Science in the Department of Computer Science. No portion of the work in this dissertation has been submitted in support of an application for any other degree or qualification of this or any other university or institution of learning. Except where specifcally acknowledged, it is the work of the author. Signed:

4 Abstract As Humans we possess the ability to create recognisable images and shapes from everyday objects like a piece of string or some paper clips. Using our imagination we can easily see a piece of string and visualise it bent into the shape of a square then act on our vision. Although a computer wouldn t be able to imagine its own images we could research into whether it would be possible for it to create images from objects. Therefore the aim of this project was to see whether it was possible for a computer to mimic this art style and how well it performs given different objects and images. To achieve this we reviewed many existing techniques in relevant fields of research like Background Subtraction and Texture Generation. This project documents the development and testing of numerous functions necessary to create some image from an object. The project also explores the limitations and benefits of our approach.

5 Contents 1 Introduction Aim Objectives Motivation Dissertation Structure Literature Survey Non Photo-Realistic Rendering Background Subtraction What is Background Subtraction? Background Subtraction Techniques Background Subtraction Summary Texture Synthesis What is Texture Synthesis? Texture Synthesis Techniques Texture Synthesis Summary Conclusion Implementation Choice of Language General Design Getting Object Classifying Foreground ii

6 CONTENTS iii Cropping Foreground Generating Object Splitting Up Object Finding Segment Relationships Finding Probabliity Distribution Generating More Of An Object Reducing Blockiness Creating Image From Object Find Boundary of Image Compute New Pixel Positions Anti-Aliasing Results Obtaining Object Tests Test 1: Simple Polygons Test 2: Real Objects Test 3: Choice of Threshold Generating Object Tests Test 4: Probability Distribution Table Test 5: Tiling Segments Test 6: Blending Segments Creating Image Tests Test 7: Creating Images Using An Object Test 8: Creating Images Using A Single Object Test 9: Anti Aliasing / Final Results Discussion Introduction Problems Identified and Potential Solutions Threshold Problem Boundary Problem

7 CONTENTS iv Aliasing Problem Limitations Object Choice Image Choice Output Image Alternate Directions Conclusions Did We Achieve Our Aim? Did We Complete Our Objectives? Future Work What We Have Learnt? A File Listing 47 B User Documentation 48

8 List of Figures 1.1 Real life depiction of what we want to achieve Rudolf II (Holy Roman Emperor) painted as Vertumnus, Roman God of the seasons, c Flow diagram of Background Subtraction Example of Image Quilting Illustration of Modifications Illustration of Synthesizing one pixel at a time Overview of design Processes required to get an object Example of Frame Differencing Example of Cropping Processes required to generate more of an object Simple Example of Splitting the Object into Segments Simple Example of Blending Processes required to create image from object Example of Interpolating Colours Examples of simple polygon frame difference tests Examples of object frame difference tests Variation of Threshold Examples of Generating Object Examples of Blending the Generated Object v

9 LIST OF FIGURES vi 4.6 Examples of Creating Images Given Rope and Chain Further Examples of Creating Images Given Rope and Chain Example of Paper Clip Fail Examples of Creating Images Given A Rigid Object Some Anti-Aliased Results Using Rope And Chain Some Anti-Aliased Results Using Paper Clips And Matchsticks Initial aim compared to result

10 List of Tables 4.1 Probability Distribution Table for Rope vii

Chapter 1 Introduction 1.1 Aim This project proposes to research into whether it is possible for a computer to create an image using an object or collection of objects.

11 Chapter 1 Introduction 1.1 Aim This project proposes to research into whether it is possible for a computer to create an image using an object or collection of objects. As Humans we possess the ability to create recognisable images and shapes from everyday objects like a piece of string. Although a computer wouldn t be able to imagine its own images we will explore into whether it would be possible for a computer to mimic this art style and how well it performs given different objects and images. Figure 1.1: Real life depiction of what we want to achieve 1.2 Objectives To achieve our aim there are a number of issues which need to be addressed. In particular there are three main problems: getting our object, generating more of the object and using the object to create an image. In real life we are able to just pick up and use our object straight away. Whether it 1

12 CHAPTER 1. INTRODUCTION 2 be a piece of string or some paperclips no prior work needs to be done to make sure its ready for use. This is not the case when creating a computer depiction of this technique. If we were to load our image of our object onto the computer there would always be a background associated with it, this is impractical and would hinder results later in the process. So we must find some way of segmenting our object from the background. In general, segmentation is unsolved and very hard so this project will focus on a highly constrained blue screen version to get our object - called Background Subtraction. An example of this in relation to our problem is common in weather programs. To keep the weatherperson in front of the map they would need to segment him out. It makes sense to look at current applications of this segmentation technique, problems faced by it and specific algorithms used to do it in order to decide the best way to approach our problem. Once we have our object, to be able to use it we must make sure we have enough to make the image we want. This problem is also common in real life; an example of this is if you wanted to make a square out of two paperclips. This isn t achievable unless you have four paperclips. So we must find a way of generating more of our object given a finite sample. For this we will use ideas discussed in the field of texture propagation. This idea of texture propagation is commonly used in image processing. For example if a piece of an image is lost we would be able to propagate the surrounding texture to fill the gap. There are a number of techniques that exist to return this desired outcome, but most of these are more advanced than the project needs, with the time allowed. So we will look at the simpler techniques available to generate more of our object. When we have generated enough of our object to work with we will then create some image out of it. As humans, most of us possess the capability of imagining something like a piece of string moulded into a shape and act on our visualisation. A computer doesn t have this faculty, this is our third problem. We will have to give the function both our generated object and a template image for it to create. To achieve this we will use some forward or backward mapping function, where pixels from our object are mapped to pixels in our image hopefully resulting in some good results. 1.3 Motivation Undertaking this project will be worthwhile as the aim has never been achieved before. Currently there exists a vast amount of work in areas necessary to complete this project; segmentation, texture propagation and image transformations. However these different components have not been combined in order to create images out of objects. Many artistic styles have been mimicked by a computer in the past, for example impressionism however the art style we highlight here has not. This project will form a good basis for creating more detailed images similar to creations of the artist Giuseppe Arcimboldo who was renowned for creating faces out of objects like books or fruit. This project will also be both interesting and challenging with numerous problems to be faced. By overcoming these problems we will learn many more techniques and gain a better

13 CHAPTER 1. INTRODUCTION 3 understanding on the whole, of the field of Computer Vision. The knowledge gained can be applied in our future projects and endeavours. 1.4 Dissertation Structure This section provides an overview of the chapters of this dissertation, which document the work that has been undertaken in this project. Chapter 1 - Introduction This chapter summarises the motivations and aims of this project. Chapter 2 - Literature Review Reviews literature relevant to the components involved in creating an image out of an object. In particular it looks at existing techniques and decides the most suitable for segmenting our object and generating more of our object to use. Chapter 3 - Implementation This chapter describes the design and implementation process of the model and the incorporation of the aspects and techniques chosen in the literature review. Chapter 4 - Results This chapter details the testing conducted on each of the components. Chapter 5 - Discussion This chapter discusses the results obtained. Specifically which parts didn t work and how the choice of techniques contributed. Also potential solutions to the problems identified are proposed and different approaches to the problem are discussed. Chapter 6 - Conclusion This chapter concludes the project by summarising its strengths and weaknesses to determine how successful it was. Things we could have done differently and possible future work on the project is also considered.

14 Chapter 2 Literature Survey This project proposes the research into whether it is possible for a computer to create an image using an object or collection of objects. But before embarking on its development it is useful to conduct research into the work carried out in the area previously. This is in order to fully understand the problem, giving the project more direction and ensure that actually doing the work is worthwhile. It is necessary to find the most applicable techniques for obtaining our object and generating more of it. But the third problem of creating an image from the object using transformation mappings is fairly common knowledge and prior research isn t necessary. 2.1 Non Photo-Realistic Rendering This project could be categorised closest to the Non-photorealistic rendering (NPR) field. (Vilanova, n.d.) defines NPR as the means of generating imaging that does not aspire to realism. This is unlike normal rendering which tries to get computer images as real looking as possible. NPR from photographs is of growing importance as artists and others want to create computer graphics in diverse styles. But the diversity of styles NPR supports is limited by algorithm design. NPR is inspired by artistic styles such as painting, drawing, technical illustration, and animated cartoons. This idea is also beginning to be used more extensively in the film industry, with films like 300, A Scanner Darkly and Sin City adopting a lot of the techniques. Currently there exist applications in NPR for filtering an image to mimic a particular artistic style or artist like Van Gogh. (Hertzmann, Jacobs, Oliver, Curless and Salesin, 2001) uses this idea to solve the problem of image analogies. This is the idea that given a pair of images A 1 and A 2 (the unfiltered and filtered source images), along with some additional unfiltered target image B 1, we are able to synthesize a new filtered target image B 2. Such that the relation between A 1 and A 2 is equivalent to the relation between B 1 and B 2, i.e. they have the same style. This idea makes it possible to copy most artistic 4

CHAPTER 2. LITERATURE SURVEY 5 styles, but in general the idea of creating images out of objects is currently not included in contemporary NPR, and to include it will requires us to use new methods.

15 CHAPTER 2. LITERATURE SURVEY 5 styles, but in general the idea of creating images out of objects is currently not included in contemporary NPR, and to include it will requires us to use new methods. Segmentation has also widely been used in this field, but not to obtain objects to create art with. Instead it s more commonly used to move from low level vision to mid and high level vision. In particular to get information about an image, that can be applied further. To segment an image is to break it into parts given some semantic meaning. Segmentation algorithms rely on coherence: they assume that pixels in a segment all have more or less the same property. This is usually colour but could also be classed according to texture or intensity. Combining ideas from these fields would be a new concept in Computer Vision and worthwhile to do. The outcome wouldn t fit directly into the Non Photo-Realistic field as our main aim is to create an image using an object not filter an existing image. However with the time allowed we would be unable to achieve this, therefore we will be giving the computer an image to modify. Doing this would allow us to create images using objects and form a platform to begin to mimic pieces of art like Archimboldo s Vertumus. This painting used fruit as objects to create a portrait of Rudolf II, shown below. Figure 2.1: Rudolf II (Holy Roman Emperor) painted as Vertumnus, Roman God of the seasons, c Background Subtraction As previously identified the first problem we are required to solve is that of obtaining the object. In real life we are able to just pick up and use our object straight away. A computer image of our object will be accompanied by a background, which will not be practical when creating images out of it. So we must find someway to get our object on its own - segmentation. In general, segmentation is unsolved and very hard so this project will focus on a technique called Background Subtraction.

CHAPTER 2. LITERATURE SURVEY 6 2.2.1 What is Background Subtraction?

16 CHAPTER 2. LITERATURE SURVEY What is Background Subtraction? According to (McIvor, 2000) the name Background Subtraction comes from the simple technique of subtracting the observed image from the estimated image and thresholding the result to generate the objects of interest. This definition is also shared by (Piccardi, 2004) who states the rationale in the approach is that of detecting the difference between the current frame and a reference frame to find the desired foreground object. (Sen-Ching, Cheung and Kamath, 2004) believes that although there are many algorithms, in general they all follow the same pattern of processing. Figure 2.2: Flow diagram of Background Subtraction The four major steps in a Background Subtraction algorithm according to (Sen-Ching et al., 2004) are preprocessing, background modeling, foreground detection, and data validation. Preprocessing first takes the video frames captured and changes the input video into a more processible format. Next background modelling uses the initial video frames to create a statistical description that is representative of the scene with no foreground objects of interest. Foreground detection identifies changes in the video frames which do not fit in with the background model. These changes will be the foreground. (Deane, n.d.) describes this stage as essentially a binary classification problem where each pixel is assigned value 1 for foreground and 0 for background. Data validation then checks the found foreground to detect and get rid of any false matches. Background Subtraction is used in many emerging video applications, such as video surveillance, traffic monitoring, and human detection, to name a few.(sen-ching et al., 2004). Although there are many varying techniques to Background Subtract it still remains, in general, unsolved. This is because there are many challenges in developing a good Background Subtraction algorithm. (Deane, n.d.) highlights a number of these problems; however most of them aren t applicable to the problem this project will be tackling. The aim of this project is to use a Background Subtraction technique to segment a single object from a flat colour background rather than segmenting an object from a video. This process will essen-

17 CHAPTER 2. LITERATURE SURVEY 7 tially follow the same steps but will not face problems like relocation of background object, non static background or noise. However this simpler process may still suffer changes in illumination, shadows and similar background or foreground colours. To combat this when acquiring the images the user could make sure to pick the best one with least probability of facing these problems Background Subtraction Techniques Many methods exist for Background Subtraction, each with different strengths and weaknesses in terms of performance and computational requirements. These different methods tend to differ in how they construct the background model and how they decide on the foreground. As there are so many methods it would be unnecessary to look at all of them, as there isn t enough time and some aren t applicable to the project. So here we will look at Frame Differencing, the basic building block of most methods. Although a lot of the methods are a bit more advanced than what we need it will benefit the project to look at them. Mainly because the methods vary in background model construction, foreground detection and complexity. Therefore it will be beneficial to see the extremes of the implementations available in order to choose the most suitable method for the project. Frame Differencing According to (Sen-Ching et al., 2004) the non-recursive method called Frame Differencing is arguably the simplest form of Background Subtraction. The Background Model is just the previous frame and the foreground is detected by comparing the current frame to the previous frame. In practice the current frame is simply subtracted from the previous frame, and if the difference in pixel values for a given pixel big enough, the pixel is considered part of the foreground. I t (x, y) B t (x, y) > T (2.1) In equation 2.1 I t denotes the pixel intensity in the current frame at position (x, y) and B t denotes the pixel intensity in the background model (previous frame) at (x, y). T here is the threshold value and this value determines which pixels are foreground and background. (Benton, n.d.) highlights that this method does have two major advantages. Firstly it is obvious that the computational load is very low as the computation only deals with two frames, the current and previous. Secondly this method allows the background model to change every new frame so it can adapt to changes in background much faster than any other method. However there are a number of problems attributed to this method. (Sen-Ching et al., 2004) highlights that since it uses only a single previous frame, Frame Differencing may not be able to identify the interior pixels of a large, uniformly-colored moving object. This is commonly known as the aperture problem. But we won t face this problem in this project as the method won t be applied to video. However a relevant problem this project may face is the calculation of the threshold. As the threshold is usually determined empirically the

18 CHAPTER 2. LITERATURE SURVEY 8 performance of the method is very sensitive to it. (Deane, n.d.) acknowledges the problem that one threshold may not be applicable to every situation and suggests a modification to the Frame Differencing method. A variation of the Frame Differencing method is to incorporate normalised statistics. I t (x, y) B t (x, y) µ d σ d > T (2.2) Where µ d is the mean and σ d the standard deviation of I t (x, y) B t (x, y) for all locations (x, y). This allows us to use the same threshold value for every computation as by normalising the statistics we change to the same range of pixel intensity values regardless of the image. The modest computational load and low complexity of the Frame Differencing method is well suited to the project aim, segmenting a single object from a flat colour background. The only worry with this method is the quality of the result, (Sen-Ching et al., 2004) examines that with the appropriate parameters, Frame Differencing is significantly worse than all the other methods when tested for accuracy. When working through our simple task it may produce a good enough result but it is wise to look at more complex methods. Median Filtering Median Filtering is one of the most commonly-used non recursive background modeling techniques. To create the background model, the method takes a median of all the frames in the buffer for each pixel in the frame. Then (as with frame difference), the background model is subtracted from the current frame and thresholded to determine the foreground pixels. In this project there will only be one previous frame to use, so we have two options here. Either we use this frame N times creating a greater memory load or we just use one frame to create background model which is just Frame Differencing. But if we were to try and segment an object from a video Median Filtering would be very helpful. According to (Sen-Ching et al., 2004) non-recursive techniques are highly adaptive as they do not depend on the history beyond those frames stored in the buffer. But storing a large number of frames could also be detrimental to the process, because of the memory required to store them. Mixture of Gaussians (Sen-Ching et al., 2004) also highlights a second class of techniques - recursive. Recursive techniques recursively update a single background model based on each input frame. As a result, input frames from distant past could have an effect on the current background model. Compared with non-recursive techniques, recursive techniques require less storage as they don t have to store numerous previous frames, but only the background model. However if the model has any error it takes a long time to disappear. The Mixture of Gaussian method is classed as recursive and is highly complex. Also, MoG is more robust, as it can handle multi-modal distributions. For instance, a snow falling against a blue sky has two modes-snow and sky. MoG can filter out both.

19 CHAPTER 2. LITERATURE SURVEY 9 The Mixture of Gaussian method is different to previous methods as the background model isn t a frame of values. Instead, the background model is parametric. Generally, each pixel in the image is modelled separately by a mixture of k Gaussians: k P (I t ) = w i,t η(i t ; µ i,t ; σ i,t ) (2.3) i=1 In general a Gaussian Distribution consists of a mean, standard deviation and a weight. In the above equation I t represents the colour of the pixel (x, y) in image I at time t. The mean µ is an educated guess of the pixel value in the next frame. The weight w i,t and standard deviations σ of each component are measures of the confidence in that guess. The foreground is detected by firstly sorting all the components in the mixture into decreasing order of probabilty of being background. This is so importance is placed on components with most evidence and lowest variance and can therefore be assumed as background. Then to determine if a pixel is part of the background, we compare it to the Gaussian components tracking it. If the pixel value is within a scaling factor of a background component s standard deviation σ, it is considered part of the background. Otherwise, it s foreground. Essentially this method forms clusters from the Gaussian Distribution of each pixel. Then when comparing a new frame to these clusters we can see which parts fit in with the current clusters and which don t giving us a change and therefore foreground Background Subtraction Summary In this part of the Literature Review we have looked at how to segment our object from the background, so it is ready to use. In particular we looked at the idea of Background Subtraction. We found that most applications of the Background Subtraction method are used in accordance with videos and therefore it is a bit out of our scope. However it was still beneficial looking at the basic structure of the method and simple applications of it. In general we found that it is necessary to create a background model then compare the latest frame to detect the foreground (object). For this, in our project, the Frame Differencing method will most probably be enough to do the job. This is because it doesn t need to deal with light changes, moving obejcts and other similar problems. After looking at this area it is easy to see there still exist problems and this project will try to go some way in identifying more and suggesting solutions. 2.3 Texture Synthesis Once we have our object, to be able to use it we must make sure we have enough to make the image we want. This problem is also common in real life; an example of this is if you wanted to make a square out of a piece of string. You must first make sure the string is long enough to make the shape. So we must find a way of generating more of our object

20 CHAPTER 2. LITERATURE SURVEY 10 given a finite sample. For this we will use ideas discussed in the field of texture synthesis, specifically the techniques Image Quilting and Markov Models What is Texture Synthesis? A texture within an image is a portion which has a consistent statistical pattern that appears to represent the same type of image. Examples of this are grass on a lawn, words on a page, bricks in a wall or in our case an object like a piece of string. (Efros and Freeman, 2001) suggests that Texture propagation is the ability to take a sample of texture and generate an unlimited amount of image data which, while not exactly like the original, will be perceived by humans to be the same texture. Texture synthesis is a very prominent active research topic in Computer Vision and has numerous potential applications mainly in the image processing field. For example if parts of an image are lost, a texture synthesis algorithm could be used to fill in the holes with the surrounding texture. It could also be used in special effects, in particular removing a foreground image and filling the gap left with the surrounding texture. The project, although not doctoring images, will be creating new images using the same ideas Texture Synthesis Techniques There are a number of basic approaches to texture synthesis on which most techniques are based. (Lefebvre and Hoppe, 2005) mention a number of these in their Parallel Controllable Texture Synthesis research paper. (Kwatra, Schodl, Essa, Turk and Bobick, 2003) goes further and categorizes the techniques in to three groups. The first class uses a fixed number of parameters within a compact parametric model to describe a variety of textures. This is the idea of statistical property matching where relevant texture properties are determined. This model could be a histogram (Heeger and Bergen, 1995) or wavelet features (Portilla and Simoncelli, 2000) for example. The image is then synthesized according to matching properties. The second class of texture synthesis methods is non-parametric, which means that rather than having a fixed number of parameters, they use a collection of exemplars to model the texture. This is basically texture synthesis by generating a pixel at a time. This technique uses conditional probability of pixel values to synthesize new image, but this requires search of entire input for each output pixel, so it s not very quick. The third, most recent class of techniques generates textures by copying whole patches from the input. This class is based on the simplest approach for texture synthesis, tiling. This approach places random blocks from the sample texture together to form the bigger texture. The result of this is rarely satisfactory as there will be a lot of boundary errors between the tiles and the image will be highly repetitive. So after looking at the different classes of texture synthesis and early approaches, it would be wise to look more in depth at the most recent class of techniques. The techniques Image

21 CHAPTER 2. LITERATURE SURVEY 11 Quilting and Markov chains will be looked at in this literature review. It will be beneficial to see the two implementations available in order to choose the most suitable method for the project. Image Quilting According to (Kwatra et al., 2003) Image Quilting is the process by which small blocks from the sample image are copied to the output image, similarly to the tiling approach. But unlike tiling it uses an overlap to try and reduce the block like outcome. We could use this technique to initially fill a polygon with texture, from this we could begin to use it for more purposeful applications in the image processing field. Figure 2.3: Example of Image Quilting The algorithm for Image Quilting is as follows: 1. Read in a source image 2. Take input for the following parameters: The tile size The amount of overlap between tiles An error tolerance Destination image size 3. Seed the output image 4. Fill in new tiles 5. For each possible tile position in the source image, find the sum of squared differences (SSD) between that tile and the existing destination image, in the overlap region. 6. Find all tile positions that have SSD errors less than (1 + tolerance) times the best SSD error. 7. Pick a random tile from the above, and copy it to the destination image.

22 CHAPTER 2. LITERATURE SURVEY 12 So in general terms say we have a source image of some string texture and we wish to propagate this texture to fill a square. Parameters are then inputted. The size of the block is the only parameter controlled by the user; the block must be big enough to capture the relevant patterns in the texture, but small enough so that the interaction between these structures is left up to the algorithm and not just one block. In the experiment in (Efros and Freeman, 2001), the width of the overlap edge was 1/6 of the size of the block and the error tolerance was set to be within 0.1 times the error of the best matching block. These values seemed to work well in their examples, so we will use them to. Next we can begin creating the new image/texture. We start by copying a random tile to the top left corner of our square. Next the tiles will be filled in, in raster order so from left to right and top to bottom. To decide what tile to put next we compare our current tiles with all possibilities by getting their sum of square differences of where they will overlap. In our case we could do this by finding the total SSD over all three colour channels. The algorithm then chooses a random tile according to these comparisons, with the better matches more likely. This process is repeated to fill the entire square. After implementing this algorithm there still exist boundary errors. (Efros and Freeman, 2001) proposes a technique called the Minimum Error Boundary Cut. This essentially allows ragged edges between tiles to minimize boundary error. They transform the overlap region between a new patch and the already synthesized texture into an error map, and then use Dijkstra s algorithm to find a lowest cost path through the error map from one boundary to another. We will have to do the minimum-cut twice per tile, once along the left edge and once along the top. This reduces boundary errors all around the tile. Figure 2.4: Illustration of Modifications There numerous advantages and disadvantages to using Image Quilting. Firstly it produces excellent results on structured patterns, which for string is the case. It is also a fairly fast process. However even with the minimum cut algorithm there still exist boundary errors and it has a lot of repetition. For this project we probably won t these errors, or they won t be very noticeable. But if we did possible solutions to the boundary problem include having different shaped tiles, similar to the (Song, Rosin, Hall and Collomosse, 2008) paper. The repetition problem may benefit from allowing the program to rotate or flip the tiles randomly. Markov Chains A formal definition of the Markov Property (Rui, 2004) is: Let X = {X n } n=0,,,,,n be a sequence of random variables taking values s k N P (X m = s m X 0 = s 0,,,,, X m 1 = s m 1 ) = P (X m = s m X m 1 = s m 1 ) then

CHAPTER 2. LITERATURE SURVEY 13 the X fulfills Markov property. Informally this just means the future is independent of the past given the present.

23 CHAPTER 2. LITERATURE SURVEY 13 the X fulfills Markov property. Informally this just means the future is independent of the past given the present. general a Markov Chain is made up of three components: In A state space S = s 1, s 2,..., s n An initial state a 0 A probability distribution A It will be easier to understand the process by looking at an example. A famous application that is well cited is the idea of generating English-looking text using N-grams proposed by (Shannon, 1948). Initially a Markov Model must be built, to do this he read in a sample of the language. Then for each letter, determine the probability distribution of the next letter, by building probability tables. So if we look at the letter t this will be followed by the letter a more times than it would d in a book. Therefore the probability will be higher for a and it is more likely to get chosen. In this example the letters are the state spaces, the initial state would be t and the probability distribution would be a table showing that a would be more probable than d to follow t. This could then be repeated at each letter to generate English looking text. To get better results make the state spaces larger, i.e. whole words, but to do this the Markov Model will need a bigger sample. This example is very similar to the project, growing a piece of string. We start with a sample string and break it into segments. Next choose a random piece and construct a table of probabilities. These probabilities will be worked out, similarly to Image Quilting, by computing the sum of square differences between the segments. Now choose the next part based on these probabilities and repeat. (Efros and Leung, 1999) suggests another way to use the Markov property to generate texture - Synthesizing One Pixel at a time. An illustration of this is shown below. Figure 2.5: Illustration of Synthesizing one pixel at a time To synthesize a pixel (p), rather than constructing a model, we directly search the input image for all similar neighbourhoods to produce a probability table. These similar neigh-

24 CHAPTER 2. LITERATURE SURVEY 14 bourhoods are illustrated by the blue boxes in the sample image. We then randomly choose a neighbourhood and take its centre to be the newly synthesized pixel. However, since our sample image is finite, an exact neighbourhood match might not be present. So we find the best match using SSD error over all three colour channels, and take all samples within some distance from that match. This is to make sure the new pixel agrees with its closest neighbors. There numerous advantages and disadvantages to using Markov chaining for texture propagation. Firstly it is conceptually simple and has the ability to model a wide range of real world textures making a very popular choice in Image processing. It can be applied to do many tasks. Ranging from occlusion fill-in, growing texture directly on surface and even for synthesizing video clips (for example - a flame). But on the other hand it s a fairly greedy process and pretty slow Texture Synthesis Summary This section of the Literature Review has looked at possible ways use our segmented object, to create an image. In particular we looked at the notions of Image Quilting and Markov Chains. After looking at these two differing approaches, this project will benefit by combining ideas of both techniques. We will tile using a Markov Model. After looking at this area it is easy to see there still are problems and this project will try to go some way in identifying more and suggesting solutions.

25 CHAPTER 2. LITERATURE SURVEY Conclusion The result of this review is that it has demonstrated that it would be worthwhile and interesting to explore a computerised way to create images using objects. In general the idea of combining the notions of Background Subtraction and texture synthesis doesn t seem to be very prominent. Furthermore we haven t found work on creating images from string or indeed any object using these processes. Therefore it would be worthwhile to the field to try and do it, document what problems we face and how we overcome them. The first section of the review looked at the idea of segmenting the object from a background, in particular Background Subtraction. We have decided to initially use the Frame Differencing technique, mainly because of the low complexity and the accuracy seems to be reasonable for our problem. If we find that it isn t good enough we have also looked at other methods which could be implemented should they need to be. The second section of the review looked at the idea of using the segmented object, in particular Image Quilting and Markov Chains. We have decided to combine ideas from both and tile using a Markov Model, mainly because it is more suited to the idea of using objects to create images. If we find that it isn t good enough we have also looked at other methods which will be implemented should they need to be. So overall the project is proposing the research into whether it is possible to take an object and create a recognisable image from it. Background Subtraction will be used to get the object, more of the object will be generated by tiling given a Markov Model then this object will be transformed into some image. This will be worthwhile to the field, interesting and challenging for us to complete.

26 Chapter 3 Implementation This section is concerned with describing how the techniques identified in the literature review will be put into practice, relating to our project. It will describe why certain decisions have been taken, for example why one method may have been used in preference to another. 3.1 Choice of Language The language that was chosen for the implementation of this project was Matlab. Matlab is a numerical computing environment that has been designed specifically for the fast implementation of ideas in the mathematics community. We also worked with a lot of images in this project and Matlab is very well suited to image manipulation on account of its large library of functions. On the other hand Matlabs ability to easily implement mathematical ideas leads to the execution speed being fairly slow. We could have used the language C if the execution time was deemed to be the most important factor. However the opportunity of a environment suited to mathematical ideas, image manipulation and with a library of functions out weighed the need for fast execution speed. So Matlab was judged to be a suitable choice of language. 3.2 General Design Throughout this project we have identified the project to have three components; this didn t change for the implementation. We implemented the components in stages, in an incremental manner. This provided the opportunity to ensure that each piece of functionality was working as expected, before moving on the implement the next section. This decision made detection of bugs and the consequent solving of these much simpler, as it narrowed down the possible locations for these. 16

The organization of the system enabled a very easy iterative implementation technique, where each section could easily be excluded from the execution and tested separately.

27 CHAPTER 3. IMPLEMENTATION 17 Figure 3.1: Overview of design During implementation, each part was coded separate to the preceding sections, each in their own separate function and file, to keep logically unrelated functionality separate. The organization of the system enabled a very easy iterative implementation technique, where each section could easily be excluded from the execution and tested separately. The idea behind this was that if section was working correctly the whole system would work correctly. Furthermore it greatly reduced the amount of time dedicated to finding bugs. 3.3 Getting Object As previously identified the first problem we were required to solve is that of obtaining the object. In real life we are able to just pick up and use our object straight away. But a computer image of our object will be accompanied by a background, which will not be practical when creating images out of it. So we had to find someway to get our object without its background. In the literature review we identified Frame Differencing, a form of Background Subtraction, as the method we would use to obtain our object. This part of our solution could be split into more simple processes, making the implementation easier to do and improve. Figure 3.2: Processes required to get an object

28 CHAPTER 3. IMPLEMENTATION Classifying Foreground We found in the literature review that to compute the difference between frames it is necessary to have a background model and a test frame with something in front of the background. So for our project we loaded in an image of a plain background and an image of our object on the background. The next step in Frame Differencing was to find and classify the foreground or object pixels. To do this we compared the pixels at every location in each image. Initially we were comparing the Red, Green and Blue (RGB) values of each pixel, working in the RGB colour space. But we are working with hundreds and possibly thousands of pixels at a time so this was fairly impractical - it took too long to compute. To combat this problem we decided to convert our image into greyscale. RGB space works with three different values ranging from 0(black):255(white) whereas working in greyscale we have three like values. This allows us for each pixel to work with one intensity value rather than three, thus reducing the computational load. According to 2.1 to classify each pixel we must do so in relation to some threshold. This essentially means if the pixel value changes by a significant amount we can assume it is new and classify it as foreground. We had to make a decision here as to what the threshold value should be to get optimal results. Choosing a low value although would work for some examples wouldn t work for all. For example if we loaded two photographs as our frames, there may be some difference in light between the two. If the threshold was too low these pixels would be picked up as foreground something we didn t want. Furthermore if we chose a high threshold this would pick up most pixels as foreground, including the background. We had to find a balance between being too vague and too precise but this was very hard to do due to the wide ranging inputs. We found a threshold of ten to be a good start, but the ambiguity of inputs this may not work all of the time. We classified our image by colouring the pixel black if the pixel intensity hadn t changed and white if it had. So the area that is coloured white will be where our object was in the frame. Figure 3.3: Example of Frame Differencing

CHAPTER 3. IMPLEMENTATION 19 3.3.2 Cropping Foreground At this point we still hadn t got our object in a form which we were able to work with in the next section.

29 CHAPTER 3. IMPLEMENTATION Cropping Foreground At this point we still hadn t got our object in a form which we were able to work with in the next section. So we needed to use our previous classification as a guide to trim our object, essentially cutting it out of the background. This was a fairly simple task to achieve. Using the guide we iterated down all the rows finding where the first and last white pixel appears and marked these pixels as the top and bottom edges of our object. We repeated this idea with columns to find the left and right edges and cropped the original image accordingly. This functionality also gave us a another reason not to set the threshold value to a small number, the likelihood of getting some anomalies in the data would have greatly increased and the image would have been cropped incorrectly. So editing the threshold was easier than changing the trim functionality. Figure 3.4: Example of Cropping The image in 3.4 depicts the general functionality of this part of the system. We have taken the guide and put the red lines where the first white pixel is found. We then applied the red rectangle to the image of the object and cut it out of the background thus overcoming the first problem of obtaining the object. Although we had our object we were still not in a position where we could generate an image. 3.4 Generating Object We now had the object but were unable to create some recognisable image out of it. This was due to the second problem we identified; we may not have enough of our object to create anything meaningful. This is similar to the real life problem of trying to make a square from two paper clips - it just isn t possible. In the literature review we decided to tile our object using a Markov Model, mainly because this combination made it easier to generate more of our object. Furthermore the ideas seem to blend together well. Tiling puts different segments of a given sample together according to some guide or randomly to create a larger sample. We will introduce the Markov Model as the guide for tiling the segments, allowing us to create a larger sample which has been generated randomly based on some probability yet still looks realistic. Similar to the previous part this part of our solution could be split into simpler processes, making the implementation easier to do and

CHAPTER 3. IMPLEMENTATION 20 improve. The processes that will be required to generate more of an object are shown in the figure below. Figure 3.5: Processes required to generate more of an object 3.4.

We could have just repeated the object over and over again to generate more, however this wouldn t have looked very realistic as there would have been no randomness.

30 CHAPTER 3. IMPLEMENTATION 20 improve. The processes that will be required to generate more of an object are shown in the figure below. Figure 3.5: Processes required to generate more of an object Splitting Up Object The first task we identified that was essential to generate more of the object was to split it into a number of segments. We could have just repeated the object over and over again to generate more, however this wouldn t have looked very realistic as there would have been no randomness. A human would have been able to tell this was happening by finding like patterns in the repeated sections. So we needed to find some way to randomly generate whilst still looking realistic. The Markov Model was ideal for this. For this computation we required a number of states and relationships, in this case the segments were the states and their bond strengths were the relationships. A key decision that we made here was what number of segments to use. We had already decided one segment wouldn t work, so next we tried using every column of pixels as a segment. This wasn t functional either as when we set up the probability distribution the computational load and time was very large. So if the number of segments is small it is more obvious what is happening when looking at the result but it is quicker. With a larger number of segments the quality is too random and it takes a long time to compute. The correct choice of segment size is relative to the object size. With the objects we decided to use, the choice that suited us was to have a smaller number of segments, anywhere between ten and twenty gave fairly good results and they were computed quickly. Figure 3.6: Simple Example of Splitting the Object into Segments

31 CHAPTER 3. IMPLEMENTATION Finding Segment Relationships Now we had a given number of segments we had to find a way of choosing the next segment to generate so that the result looks fairly good. To achieve this, we needed to set up our initial Markov Model consisting of states and relationships. We had previously found our states; the segments of our objects, so what remained were the relationships between these. The relationship between the segments was signified by a value of how well the edges of the two segments went together. So for?? we can see that segment 1 would have a very strong bond with segment 2 as they are connected in the sample. However segment 1 will probably have a weaker bond should it be connected with segment 3 or itself. To calculate this bond value we compared each segments right column of pixels with each segments left to find the closest match. Originally we just found the difference between the two columns and found an average however this did not give a wide enough range of results. In particular there wasn t a huge amount of difference between the best and worst match. So rather than just find the mean of the difference we found the mean of the squared difference, thus given us a wider range of results. Hence giving the worst matches a much higher value Finding Probabliity Distribution Next we aimed to construct a probability distribution from the bond values. This distribution allocates each segment a probability of being chosen given the current segment. So if we were to look at??, suppose we had segment 1 and were looking at which segment to choose next. Segment 2 would have a higher probability of being chosen next than 1 or 3 as the relationship between 1 and 2 is strongest. To create some probability distribution from the matrix of bond values at this point wasn t easily possible. The matrix consisted of a range of values, zero being the best bond and large numbers being the worst. We needed to change this around so that the better the bond the higher the value and the smaller the value the weaker the bond. This was simple to do; we just found the weakest bond (highest value) for each segment and subtracted all values for that segment from it. The weakest bond was now zero. As we now had the range of values intuitively meaning the correct thing, we could now work out percentages of each segment being chosen given another. From these percentages it was an easy step to assign each segment a range of values between 0:1, the size of its distribution dependent on its probability of being chosen. We had obtained our probability distribution for the segments of the object, so we could begin thinking about putting the object generation into practice and generating more of the sample object Generating More Of An Object At this point we had done most of the calculations required to generate more of the object, all that remained was to them into practice. To tile the segments according to the proba-

32 CHAPTER 3. IMPLEMENTATION 22 bility distribution we first had to randomly choose a start segment. We knew the number of segments we could choose from so we picked a random number between zero and the number of segments. This start segment gave us an initial state to implement the Markov Model on. The Markov property, in relation to our project states that the next segment we choose only depends on the current segment and its probability distribution or bond strengths with other segments. It doesn t matter what segments we have chosen previously, the past is irrelevant. With this in mind we then chose a random number between 0:1 and found which segment this signified in our probability distribution given the current segment and combined the two. We continued this process until we had a long enough object to create an image from. We successfully did this but it took a very long time to create an amount big enough Reducing Blockiness After getting our generated object we found the quality wasn t brilliant as it was still easy to see where the segments joined, it was very blocky. In the literature review we discovered the Minimum Error Boundary Cut algorithm for reducing block like outcome. However given time constraints and quality of results needed we decided not to use it. Instead we implemented a form of blending between the segments. We inserted eight new columns of pixels between the two segments. Each column was created using some weighting of the two segments edge columns. This weighting was more favoured to segment 1 on the left and segment 2 on the right, thus making the transition between segments less severe. Figure 3.7: Simple Example of Blending Implementing this blending function improved the results somewhat, yet they still weren t perfect and human would be able to tell the difference between the real and generated objects. At the moment the object generated is long enough to create an image but its other dimension was way too big and would hinder the aim. So the generated object had to be scaled down in order to progress to the next step. With the object scaled down it was much harder to tell the difference between the real and fake as the images were further away and therefore less detailed. The decision was made here to scale down the sample object rather than the generated object. This improved the computation time greatly solving the problem highlighted in the previous section. Furthermore doing this casued very little detriment to the overall quality of the result. We were now in a much better position to be thinking about creating an image from our object.

CHAPTER 3. IMPLEMENTATION 23 3.5 Creating Image From Object We have now generated enough of our object to work with so we can now create some image out of it.

33 CHAPTER 3. IMPLEMENTATION Creating Image From Object We have now generated enough of our object to work with so we can now create some image out of it. As humans, most of us possess the capability of imagining something like a piece of string moulded into a shape and act on our visualisation. A computer doesn t have this faculty, this was our third problem. We had to get both our generated object and a template image and create our new image out of these. To achieve this we decided use some forward or backward mapping function, where pixels from our object are mapped to pixels in our image. This part of our solution could be split into more simple processes, making the implementation easier to do and improve. Figure 3.8: Processes required to create image from object Find Boundary of Image To fit the generated object to a path we first needed to load in an image to use as a guide. Most of the images we used, we created in Microsoft Paint. The binary images we created were mainly silhouettes of shapes and recognisable images in black and white. After loading the image we had to find a way of getting the outline of the shape. To do this, rather than reinvent the wheel, we used a built in Matlab function called bwtraceboundary. This function takes a binary image as its input, and outputs a chained list of pixel coordinates which lie on the image boundary. This is worked out by noticing nonzero pixels belong to an object in the image and zero pixels constitute the background. Using this list of boundary coordinates we were easily able to map a row of pixels from the object to this boundary line. We could also map strips of pixels to each point on the boundary. For horizontal lines this worked perfectly however whenever the line bent or was vertical the outcome was very poor. This was because the strips weren t rotating accordingly.

34 CHAPTER 3. IMPLEMENTATION Compute New Pixel Positions To rotate the strips by the right amount and fit on the path we had to find some transformation matrix to map points in the source image to new positions in the target image. (u, v) = m (x, y) (3.1) The coordinates (x, y) are some position in the object, (u, v) is the pixel position on the boundary of the image we wish to map to. The m is the transformation matrix necessary to get from (x, y) to (u, v). We needed to find the correct m for each strip and map the pixels accordingly, this is called forward mapping. To achieve forward mapping we first had to set up our reference frames that we map between. To construct a reference frame you need to look at a line of points and find a centre point, a vector along its width and a vector along its length. The first reference frame was easy to set up as it was our object. We found the centre line of the object; this was just a horizontal line bisecting the object. Now for each pixel on the centre line we could work out the width and length vector by looking at the next and previous pixel. To calculate the length vector we just take two successive points on the line P oint1 = (a, b) and P oint2 = (c, d) say. The vector between them is F = P oint1 P oint2 = (c a, b d) = (z1, z2), this vector is the length vector. For the straight centre line we have for the object it is easy to see this would be F = (1, 0). Next we had to find the width vector. If you imagine walking along the centre line, the width vector is the direction of your left arm if you stick it out to the side. Its easy to show the direction in relation to the length vector is A = ( z2, z1). So in our object example A = (0, 1). So now we had a centre point (x, y), a length vector F and a width vector A. By normalising the vectors to make them unit vectors we had constructed the first reference frame. The second reference frame we needed was that of the image boundary. As this wasn t a straight line it was harder, but it still followed the same principle. The boundary points we worked out previously, we could use as the centre line. Each point on the boundary we used as the new centre point and constructed the length and width vector the same way we discussed previously. However we later found the results weren t ideal. So we decided to widen the range of pixels to subtract, this would give a more general idea of how the line curved. So rather than taken P oint1 to be the current pixel we set that to be the pixel position three steps before the centre point. Likewise we set P oint2 to be the position three steps in front of the centre along the boundary. We could vary this value to give more refined results but found the value three to give good enough results for most objects. Again we normalised our vectors and therefore obtained our second reference frame. We were now almost in a position to work out the transformation matrix M. M = ref erencef rame1 inverse(ref erencef rame2) (3.2) To be able to do this equation we needed to make the first reference frame a square matrix. To do this we just homogenised the coordinates. Once we had done this we were able

35 CHAPTER 3. IMPLEMENTATION 25 calculate the above equation and obatin M for each strip of pixels. All that remained at this point was to apply the relevant transformation matrix to each pixel in the object. By doing this we got new pixel positions for every point in the object. We just mapped these to a blank canvas and this resulted in the image getting created out of the object Anti-Aliasing We had successfully fit our object to the shape however the quality was hindered by the aliasing. This was because we had decided to use forward mapping rather than backward mapping. Forward mapping has two main disadvantages as a computational procedure: gaps and overlaps. Depending on specific spatial transform function, you may have some output pixels that did not receive any input image pixels; these are the gaps. You may also have some output pixels that received more than one input image pixel; these are the overlaps. Our results didn t suffer from overlaps as we implemented a check on each map to see if it would be overlapping, however the results did suffer from gaps. Due to the way we had constructed our reference frames and the time constraints implementing backward mapping instead was not a viable option. So instead we interpolated the value of the gaps by looking at surrounding pixels. To do this we looked through all the pixels to see if the colour of it was black, indicating background or an occurrence of aliasing. If we find a black pixel we look at pairings of pixels surrounding it, to get an average colour to replace the pixel with. For example if the pixels above and below the black pixel aren t black then we work out an average pixel colour from them. If they aren t we try the left and right pixels and if they aren t black either we try diagonals. Running this interpolating function once removes most of the aliasing that had occurred previously. Running it more than once continues to improve the results, but doing it too many times causes errors to occur and begins to colour the background in. Figure 3.9: Example of Interpolating Colours

36 Chapter 4 Results In order to determine whether the implementation of the system has been a success it was essential that we tested our system thoroughly by giving it different inputs and variables. The aim of the dissertation was to be able to create a recognisable image from an object. However the testing process for the system is not just limited to tests required to achieve the overall aim of the dissertation. Throughout the implementation stage, the system was tested once each component had been completed, this proved useful in speeding up testing the complete system. The testing that was carried out during the implementation phase was very much based around ensuring the system was returning the correct images and numbers from certain calculations. We had three components to test; obtaining an object, generating more of an object and creating an image from an object. In order to ascertain whether the implementation is functioning correctly, it was important to test these three sections in a way that will best highlight if there are any problems. In the first component we expected to be able to segment an object from a background given two frames. This part should output the original frame cropped around the object. In the second component we expected to be able to generate more of an object given some sample of the object. This should result in a longer object that looks somewhat realistic. The final component is where we achieve the aim of the dissertation. Given some object and image this part should create the image out of the object. This section provides the evidence to see whether the system has met the requirements that were initially set out for the project. This section also identifies areas where the algorithm does not perform as expected, and to provide output images that can be used to evaluate why this is happening and possible solutions. 26

CHAPTER 4. RESULTS 27 4.1 Obtaining Object Tests The first component we were required to implement was that of obtaining the object.

37 CHAPTER 4. RESULTS Obtaining Object Tests The first component we were required to implement was that of obtaining the object. If we were to load our image of our object there would always be a background associated with it, this is impractical and would hinder results later in the process. So we had to find some way of segmenting our object from the background, in the literature review we identified Frame Differencing as the technique we would use. For each test we gave the system a background frame to use as the model and a second frame with the object on the background. We expected the system to highlight any differences in the images; essentially these parts would be the foreground or object. After highlighting the object it should proceed and crop the original object frame around the foreground. Furthermore we implemented a background modifier into the system, this colours background pixels black. We did this as when creating images from the objects we will be doing so on a black background. So we also expect the output object to be surrounded by a black background Test 1: Simple Polygons The first test we performed on this component was just to see how it performed given different colours and shapes. We used a block colour as the background and a simple polygon as the object. The output we expected to see was some foreground guide where the polygon is coloured white and background black. Then using this guide a figure that is cropped around the polygon with the background pixels coloured black should be outputted. Figure 4.1: Examples of simple polygon frame difference tests As we can see our first batch of tests were successful and the polygons were correctly

38 CHAPTER 4. RESULTS 28 cropped. The only slight failure was the third test where the polygon maintained a red outline. This we think was due the choice of threshold which we fixed at ten for these examples. We will discuss the impact of varying the threshold later Test 2: Real Objects As the first tests on simple polygons worked successfully, our second set of tests involved actually obtaining the objects. Again we used a block colour as the background model but this timed used an actual object rather than a polygon as the object. As before we expect some foreground guide and a cropped object with a black background to be outputted. For the following examples we used either a black or white background depending on the object frame. Figure 4.2: Examples of object frame difference tests As you can see the results were as expected. The only problem that arose was that of what threshold to choose similar to the first set of tests. In the chain example the threshold seems to be too large as the parts of the actual chain are being marked as background. This could also be because the object has parts which are the same colour as the background. On the other hand the paperclip example the threshold seems to be too small as a white outline is visible around the object.

39 CHAPTER 4. RESULTS Test 3: Choice of Threshold Although we have effectively obtained our objects, there are still some minor errors which could be ironed out; in particular that of the threshold. So in this set of tests we will look at the impact of varying the threshold on the paperclip example. Figure 4.3: Variation of Threshold We can see from this that if the threshold is too low it is a detriment to the results. This is because it doesn t get rid of all of the background. We can see that setting the threshold to ten is the best result in this case; the results seemed to plateaux after this value. However this optimal choice for the threshold, whilst may work well with the paperclip may not work so well with other objects. So due to the variation in possible objects and backgrounds it is impossible to select a general threshold. 4.2 Generating Object Tests The second component we were required to implement was that of generating more of an object. If we were to use our object to create a recognisable image most of time we wouldn t be able to as we don t have enough of it. So we had to find some way of generating a larger amount of the object given a sample. In the literature review we decided to implement a tiling approach based on a Markov Model. For each test we take an object found in the previous component and uses this as the sample. We expect the system to output a longer version of the sample. To do this we also need to make sure the probability distribution table is set up correctly. We also implemented a blending function which needs to be

40 CHAPTER 4. RESULTS 30 tested. The functionality of generating more of a sample object isn t relevant for all classes of objects. For example a single object like a paper clip doesn t need to be generated and would look bad if it was. For single objects it is more applicable to just tile the object. So this component is aimed at objects like rope and chain which have some pattern and can be generated to look realistic Test 4: Probability Distribution Table The first test we performed on this component was to see if the calculations being done were correct. In particular we looked at the probability distribution table. After inputting our object the system should cut it into segments and measure the bonds between them. The bonds then give some guide for a probability distribution to be allocated. We expect to see the biggest allocation to be given to segments with strong bonds; in particular segments which are next to each other when split up. Table 4.1: Probability Distribution Table for Rope Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 Segment 7 Segment 8 Segment 9 Segment Segment Segment Segment Segment Segment Segment Segment Segment This table shows us how probabilities are distributed for the segment choice. So the first row of values depicts the probabilities of choosing the next segment (1:9) given the current segment (1). This is the implementation of the Markov Model. We can see the results were as expected by looking at the bold diagonal. These bold values are where the biggest distributions occur and as expected this is for segments that were next to each other when split. The final row is pretty evenly distributed as there wasn t a neighbour segment for segment nine when it was split. We can also see that the dominant distribution isn t so big that it is the onl choice that is made. This allows for an element of randomness in the eventual output hopefully makin the outcome more realistic and lest regimented Test 5: Tiling Segments As we are sure the probability distribution is being calculated correctly we can now begin testing the tiling of segments based on this distribution. Given some sample object we expect this component to use the Markov Model to tile the segments and output a longer version of the sample objects. This generated object should look somewhat realistic and maintain the pattern of the original. In essence, if a human was given both the sample object and the generated version they should struggle to tell the difference between the two.

CHAPTER 4. RESULTS 31 Figure 4.4: Examples of Generating Object From these examples we can see that this component has done what it was supposed to do; generate more of the sample.

3 Test 6: Blending Segments To combat the boundary problem we implemented a blending function to lessen the severity of the connections.

41 CHAPTER 4. RESULTS 31 Figure 4.4: Examples of Generating Object From these examples we can see that this component has done what it was supposed to do; generate more of the sample. However the quality isn t the greatest as you can quite clearly in the rope example that it isn t real. The boundaries where the segments join is where the problem lies Test 6: Blending Segments To combat the boundary problem we implemented a blending function to lessen the severity of the connections. Given the generated objects we expect the same thing to be outputted only with some blending pixel columns between each segment. This should improve the realism of the generated object. Figure 4.5: Examples of Blending the Generated Object

CHAPTER 4. RESULTS 32 These results show that by implementing the blending the output has been improved somewhat however there are still some issues to address.

42 CHAPTER 4. RESULTS 32 These results show that by implementing the blending the output has been improved somewhat however there are still some issues to address. It seemed to work better on the rope example where there wasn t much variation in colour. However the chain example has two fairly extreme colours; gold and sliver. So in the chain example the quick change in the colour still highlights where the boundary is and whilst the result is improved it is still not ideal. 4.3 Creating Image Tests The final component we were required to implement was that of using the generated object to create a recognisable shape or image. To do this we decided to use a forward mapping function, where pixels from our object were mapped to pixels in the target image. For each test we take an object and some template image to map to. The object can either be one that was generated in the previous component or just single object segmented in the the first component. The guide should be some binary image depicting a silhouette of some shape or image. We expect the system to fit the object to the outline of the silhouette giving the impression that the object has been arranged in a form which resembles an image. Once the object has been fit to the image we can use the implemented anti-aliasing function which will improve the quality of the final result by colouring in the black pixels dotted throughout the output Test 7: Creating Images Using An Object The first test on this component that we performed was to see how well it performed given object generated previously like the rope and chain examples. Given some silhouette and the generated object we expect the object to bend into the guide shape. To do this an appropriate rotation and translation should be worked out for each pixel and then applied accordingly. The eventual result should look realistic and handle things like overlaps and the transforms robustly. Figure 4.6: Examples of Creating Images Given Rope and Chain

Where the pixels have been mapped to there new position in the target image there still exist some pixels that don t have a colour mapped to them.

43 CHAPTER 4. RESULTS 33 Figure 4.7: Further Examples of Creating Images Given Rope and Chain As we can see the results were as expected. The object was successfully transformed into the outline in the guide image. The only problem visible in the results was that of aliasing. Where the pixels have been mapped to there new position in the target image there still exist some pixels that don t have a colour mapped to them. We have implemented an anti-aliasing function which will be tested later on in this section Test 8: Creating Images Using A Single Object The second test was to see how well this component worked given a single object like a paper clip. So given some template and an object like a paper clip we expect the system to tile the object along the outline of the image. Figure 4.8: Example of Paper Clip Fail As we can see this result shows that some adaptation is required. Although the paper

44 CHAPTER 4. RESULTS 34 clip has been correctly mapped to the image the paper clips are being bent. In real life this wouldn t happened so we had to implement some modification so that this component could also work realistically with rigid objects. To do this we just changed the calculation of the rotation transform. Rather than apply different rotations for each column of pixels we keep the same rotation transformation for the whole of the object. Figure 4.9: Examples of Creating Images Given A Rigid Object The above results are obtained using an image of a paper clip and of a matchstick. Although the shape isn t as close to the guide image as before it looks a lot more realistic this way. If you were to look at these results you would think they had been completed by a human something our system aspired to achieve. However the results are still hampered by the aliasing issue.

CHAPTER 4. RESULTS 35 4.3.3 Test 9: Anti Aliasing / Final Results As we have found in previous tests aliasing in our results was an issue.

45 CHAPTER 4. RESULTS Test 9: Anti Aliasing / Final Results As we have found in previous tests aliasing in our results was an issue. So the following tests make sure the anti-aliasing functionality works. Given our aliased image the system should interpolate the colours where aliasing occurs. This should make the output much more realistic and smoother. We will also be making note of hoy many times we will need to interpolate to obtain the best results. We expect there to be some limit where it begins colouring in the background; something we don t want. Figure 4.10: Some Anti-Aliased Results Using Rope And Chain The anti-aliasing component has done as we expected. It successfully found the black holes and filled them in with the surrounding colours. We ran the interpolater twice to acieve these results, any more and it would have been at a detrement to the results. As this

46 CHAPTER 4. RESULTS 36 had worked we could now test it on the rigid body modified system. We expected the same as before. We had to be careful not to over interpolate examples like the paper clip, as if we did too much it would colour in the paper clip and it wouldnt look like one anymore. Figure 4.11: Some Anti-Aliased Results Using Paper Clips And Matchsticks The results of this final test were as expected and are the final outcome of this project. We have successfully achieved the aim of creating an image out of an object. The end result also looked very realistic adhering to a further aim of the project.

47 Chapter 5 Discussion 5.1 Introduction As can be seen from the tests performed in the results section previously, the images obtained through using the system with a variety of different parameters produced some very good results. The results showed, that as expected, some images created using objects were of a very high standard, and that for other images, the results were not as good as expected. It is just as important if not more so to look at the results of images where the outcome was below the level set out for the system. This is due to these images highlighting areas of failure, and so provides a good insight into the characteristics of images that cause the system to fail. This section will identify which parts of the solution didn t work so well and identify other ways we could have tackled the problem in order to avoid these failures given the chance to do this project again. 5.2 Problems Identified and Potential Solutions After testing the system it was easy to see that there were three fundamental problems, one for each component, which hindered the quality of the results. When obtaining the object the threshold choice was the main area of discrepancy. In the component which dealt with generating the object the main problem for the visual results were the visible boundaries between segments. The final components main flaw was the aliasing issue or gaps in the results. It is also fairly apparent that these problems arose due to poor decisions made during the literature review and implementation stages. Here we shall highlight these three problems, how the techniques chosen ultimately failed and the technique we should have chosen and would choose given the chance to do the project again. 37

48 CHAPTER 5. DISCUSSION Threshold Problem The first components task was to obtain an object by removing its background. To do this we compared pixels from the background model and object frame and classified them as foreground or background based on some threshold value. So if the difference is within the threshold the pixel in the object frame would be classified as background as it hasn t changed significantly enough. If the difference is outside the threshold then the pixel has changed by a sufficient amount to be seen as a foreground object. The choice of value for the threshold is essential in getting the optimal result outputted. Choose too large a value then it s possible for background pixels to be classed as foreground. To small a value and its possible for foreground pixels to be classed as background. When testing we found the value ten to be a good start and worked well for numerous objects and backgrounds. However it didn t work for all objects. In 4.2 you can see that the threshold value of ten didn t work as well in the crayon example as it did in the matchstick example. The matchstick guide is very tight to the object whereas the crayon gives the appearance of some glow around the object. Furthermore with a threshold of ten, in the chain example some of the object has been classed as background; not what we want. A possible solution to this threshold problem we actually identified in the literature review but decided it wouldn t be necessary in our project; in hindsight this assumption was wrong. (Deane, n.d.) proposed the incorporation of normalised statistics in the Frame Differencing calculations allowing us to use the same threshold value for every computation. This is because by normalising the statistics we change to the same range of pixel intensity values regardless of the input images. This would allow us to create a more generic system but we still think to find the best result some trial and error would still have to be used Boundary Problem The second components task was to generate a larger amount of an object given a sample. To do this we split are sample object up and constructed some probability distribution for choosing the next segment. We then tiled segments together based on these probabilities. Although the better matched segments were tiled together most frequently in theory, in practice this wasn t always the case. When poor matches were tiled together the boundaries between them are clearly visible as can be seen in 4.4. We attempted to address this problem by implementing some blending between the segments. But to be honest this was just a quick fix given time constraints and the results still weren t ideal, 4.5 shows that boundaries are still visible. Again we identified a potential solution to this problem earlier in the project. When researching texture generation techniques in the literature review we highlighted Image Quilting as a possible approach. This is essentially tiling with some overlap. By implementing this method we feel we may have got improved results. We could have measured the overlaps to get the bond strengths and hence the probability distribution in the Markov

49 CHAPTER 5. DISCUSSION 39 Model. This would be better than comparing just the two edges but the computational load would be a lot greater. The results would also be improved if we used the Image Quilting method in accordance with the Minimum Error Boundary Cut Algorithm proposed by (Efros and Freeman, 2001). This algorithm uses Dijkstra s algorithm to find a lowest cost path through the overlap area and would reduce the harshness of where the segments join. However we still don t think this would solve the problem completely. For example if we were to look closely at the chain generation in 4.5 we can see that a silver link transforms into a gold link. These rapid colour changes would signify fakery under closer scrutiny Aliasing Problem The third components task was to create some recognisable image out of an object or collection of objects. To do this we first found some path in out template image to map to. Then we found the reference frames of our target and source images and the relevant transformation needed to obtain one given the other. For each source pixel we worked its position in the target image by using these transformation matrices and mapped the colours and intensities accordingly, this technique is called forward mapping. Unfortunately there are a number of problems we found with forward mapping. First, the resultant coordinates in the target image are real valued; however pixels are addressed by integer coordinates and so we had to round our results to the nearest integer, so that we know which pixel to colour in. Secondly, many transformations resulted in pixels being missed out and thus not receiving any colour. This led to holes appearing in the target images as seen in the examples in 4.6. We wrote a function to fill in these holes but again this was just a quick fix due to the time constraints of the project. It was an error not to review techniques in this area in the literature review. It would have helped us to foresee this problem and allowed us to approach the solution differently. A better solution would have been to use backward mapping. In backward mapping we iterate over the target image, rather than the source image. For each pixel we obtain integer coordinates which we multiply by the inverse transformation matrix to obtain the corresponding pixel in the source image. We then colour the target pixel with the colour of the source pixel. This approach does not suffer from the aliasing issues of forward mapping, because we are guaranteed to visit and colour each pixel in the target image. By implementing backward rather than forward mapping we would have achieved better final results and also cut down on the amount of code written. The results probably still wouldn t be perfect but they would be much better. 5.3 Limitations After testing the system it was easy to see that there were some limitations to what the system could take as input so not to ruin the results. Here we shall highlight these limitations, question why they are limitations and identify potential solutions to them.

50 CHAPTER 5. DISCUSSION Object Choice One of the main limitations found in the final system is that of what objects it can take as input. Firstly the orientation of it must adhere with how the components generate the images. The objects are currently required to be in some landscape view so that they can be generated correctly, more so for the texture generation section. For example if we inputted the rope example but first rotated it so that it wasn t landscape, when the texture generation is performed the output would no longer look like a rope. To combat this limitation we could either implement a test, whereby the input is flagged if the object width is smaller than its height, but this may omit valid objects from the system. There isn t really a solution to this other than to tell the user to input objects of this format. A further limitation relating to that of the object choice is essentially the characteristics of the chosen object. In our system we can handle objects with characteristics similar to rope and chain; a long single object that has some pattern which can be recognised and replicated. We also modified it to accept objects like paper clips; single rigid object with no generation required. But there are so many different types of object it would be very hard to keep creating modifications to handle all of them. Are system could probably handle single, bendy objects also by bypassing the generation component and creating the image using the component before modification. As there are so many characteristics it would be impractical to implement every type, but to just implement the main ones as we have done in this project is sufficient Image Choice The second main limitation on our final solution is that of the image we create. Currently the system accepts only a single binary image, a silhouette of a single image. Although this produced good results we would have liked to create multiple images on the same canvas. To do this we would have had to find some way of handling multiple silhouettes in the same image. The Matlab function we use; bwtraceboundary only found the chain code of one outline. We think it would be possible to modify this function to find multiple outlines. There exists a second function in Matlab called bwboundaries which we looked at as this does outline multiple objects in an image. However we found that it didn t chain code the outline of each object but rather found coordinates column by column. Again we think there is scope to modify this code to suit our needs. Another image input we would have liked to have catered for would be photos or images that weren t binary. For this we would need to find a very good segmenting algorithm, however due to difficulty of this task there doesn t exist many that would work to the ability we would require for this project Output Image The only limitation of note is that of the final image. Currently the system only creates it on a black background. It would be nice if the background could vary a bit. This shouldn t

51 CHAPTER 5. DISCUSSION 41 be too hard to do, setting the canvas to be the desired background and mapping the object onto it as before would work ok. However if the object had holes in like the paper clip there would be black backgrounds behind the objects. If we change the first component to fill the background the same colour as in the output image this would solve the problem. Yet this would only work with block colours, if we wished to use textured backgrounds it would be a lot harder. Mainly because the system would have to match up all of the texture patterns, if it didn t do this the results would be poor. So if we were to it again it would be viable to use different coloured backgrounds in the final images, but textured backgrounds would be too hard. 5.4 Alternate Directions We have achieved our initial aims well however we could have looked at our problem from a different angle. Also these alternate directions could also be followed up as future work. As we know it is possible for a computer to create images using objects it may be worthwhile to try and create a more user focussed application to do this. Ideally it would be good to create something which would allow a user to load in a photo of their choice. Then it could segment suitable regions. After this the user could then select particular object to fit to each segment, similar to paint by numbers. The objects could be stored in some library and this library could also be edited by inputting more objects. This would be a more practical use of the system, now that the research we did proved it was possible. A further direction we could have taken or could take would be to create images out of 3D objects. This would require a lot more time than was available in this project. But by doing this the final image would be able to incorporate things like overlapping object more realistically.

Chapter 6 Conclusions 6.1 Did We Achieve Our Aim? For our problem we were required to see if it was possible for a computer to create recognisable images using objects.

52 Chapter 6 Conclusions 6.1 Did We Achieve Our Aim? For our problem we were required to see if it was possible for a computer to create recognisable images using objects. We found it was possible and successfully created numerous images out of a piece of rope, chain, paper clips and matchsticks including a boat, palm tree and a guitar. We can see that our results look similar to the real life example shown in the aim. Figure 6.1: Initial aim compared to result Initially we decided to break our system up into three components which could work separately but ultimately work together. We did this as we thought it would be easier to test and fit together if we made the problem as simple as possible. In hindsight this was a good move as we made many tiny errors and looking through all of the code for the system would of been very time consuming. By breaking the process into sections we were easily able to pinpoint our error and correct it. 42

Data-driven methods: Video & Texture. A.A. Efros

Data-driven methods: Video & Texture A.A. Efros 15-463: Computational Photography Alexei Efros, CMU, Fall 2010 Michel Gondry train video http://youtube.com/watch?v=ques1bwvxga Weather Forecasting for Dummies