Photosketcher: interactive sketch-based image synthesis

Size: px
Start display at page:

Download "Photosketcher: interactive sketch-based image synthesis"

Transcription

1 Photosketcher: interactive sketch-based image synthesis Mathias Eitz 1, Ronald Richter 1, Kristian Hildebrand 1, Tamy Boubekeur 2 and Marc Alexa 1 1 TU Berlin 2 Telecom ParisTech/CNRS Figure 1: Photosketcher: realistic images from user sketches using large image collection queries and interactive image compositing. Abstract We introduce Photosketcher, an interactive system for progressively synthesizing novel images using only sparse user sketches as the input. Compared to existing approaches for synthesising images from parts of other images, Photosketcher works on the image content exclusively, no keywords or other metadata associated with the images is required. Users sketch the rough shape of a desired image part and we automatically search a large collection of images for images containing that part. The search is based on a bag-offeatures approach using local descriptors for translation invariant part retrieval. The compositing step again is based on user scribbles: from the scribbles we predict the desired part using Gaussian Mixture Models and compute an optimal seam using Graphcut. Optionally, Photosketcher lets users blend the composite image in the gradient domain to further reduce visible seams. We demonstrate that the resulting system allows interactive generation of complex images. 1 Introduction Suppose you are thinking of an image depicting a real scene. How would you create a digital representation of this image? If the scene exists in reality it could be photographed, however, this is expensive and time-consuming. With the advent of large open image collections it is possible that a similar photograph has already been taken and made available. Yet, it is to be expected that any existing photograph is only an approximation of your imagination. It seems that for a more complete rendition of the mental image it is necessary to create it with the help of a computer, either by directly drawing it, or by creating a 3D scene that can be rendered into an image. Both solutions require talent, skill, and experience. We propose a system that combines aspects of all of the solutions above: it is based on photographs but lets the user interactively search and compose them, allowing anyone to create complex pictures without any particular artistic skill. Our approach, named Photosketcher, combines two main ideas: 1. Instead of searching for a single image that perfectly fits the requirements, the image is interactively composed out of parts of images. The idea is that it is more likely that each part of the desired scene is part of a picture in the collection than finding a single, perfectly matching image. This has been shown to work well for filling holes in given images [Hays and Efros 2007] or stitching together a single image from a series of photographs with similar content [Agarwala et al. 2004]. 2. All user interaction with Photosketcher is based on simple sketches and scribbles: the image parts are queried based on a sketching interface, requiring the user to only enter few rough feature lines of the desired structures. The compositing interface relies on sparse user scribbles that provide initial hints about the part to be selected. Overall, synthesizing novel imagery with our system only requires two simple interaction steps: 1) sketching outlines that depict the desired scene element; as a result Photosketcher presents a small set of images that match the input and 2) selecting the best matching result; Photosketcher adds this image to the current intermediate result using foreground extraction and matting techniques. Users repeat those two steps until the result eventually fits their intentions and contains all desired scene elements (see Fig. 1). We give a more detailed explanation of this pipeline in Sec. 3. Note that the system described in this paper is an extended version of the synthesis system proposed by Eitz et al. [2009]. While we follow the overall original idea of synthesing images from sparse sketches only, this papers differs in the following aspects: compared to the original single page version, we now have sufficient space to describe all pipeline step in full detail. We use Gaussian Mixture Models (GMM) for learning an image model, use bag-of-features sketch-based search rather than global descriptors and present and additional user study evaluating the retrieval quality. Challenge: sketch-based search The first major technical challenge in the Photosketcher pipeline is the search for images resembling the user sketch. While searching image collections given example photographs is a common problem, we consider queries using sketched feature lines as input as an even more difficult problem. Since the information contained in a sketch is sparser than in a photograph, a larger variety of images can be matched and the search will be less precise. We now formulate several desired properties of a search engine using sketched feature lines as input: a) the search should despite the sparse input be reasonably precise and the matches perceptually similar to the sketch. b) It should be invariant to the absolute position of the sketch on the canvas; this is important as we expect

2 users to search for parts of images and the absolute position of the desired part in a collection image is irrelevant. c) It should be tolerant against local as well as global deformations since we expect users sketches not to exactly depict the outlines and proportions of photographed objects. d) The search should be fast yielding interactive retrieval rates even on collections possibly containing millions of images. To achieve these goals, we propose to base the search on a bag-of-features approach using local features encoded as histograms of gradient orientation [Eitz et al. 2011]. This approach has been shown to outperform other current approaches and due to its use of local features it is translation invariant as well as tolerant against local and global deformations. Finally, using inverted indices for the search yields the desired fast retrieval rates even on large collections. We describe this approach in more detail in Sec. 4. Challenge: composing images The second major challenge in Photosketcher is to form a realistic image given the parts retrieved by sketching. This is extremely difficult as humans are very sensitive to several artifacts that arise from a naive composite: if the parts differ in texture, color or illumination a visible seam around the composition destroys the illusion of a realistic image. We address this by trying to hide the seam in regions where it is difficult to notice using Graphcut optimization. Retrieved parts might also be perspectively distorted with respect to the currently assembled image, such that the impression of a realistic image would be ruined. We do not try to fix this algorithmically but rather rely on the sheer amount of data available in today s image collections: as the number of images in the collection is further increased, the probability that it contains the desired part under the correct perspective also increases. Rather than forcing users to accept an automatic composite (which may or may not fit their intention), our key idea is to start from an automatically generated composite and provide scribble-based interfaces that allow interactively improving the automatically generated seam, if so desired. We describe the user interface in more detail in Sec. 3. The resulting system can be used by novice users to quickly generate compound images (see Sec. 6). The power of the system stems from exploiting the vast amount of existing images, which offsets obvious deficits in search and composition. We present more detailed conclusions in Sec Related work The idea that all user input is based on sketching only is what makes Photosketcher novel and different from similar approaches, such as Sketch2Photo [Chen et al. 2009] which uses keywords and sketches. To put our work into context, we discuss related work in chronological order and highlight major similarities as well as differences in each case. Graphcut Textures [Kwatra et al. 2003] introduces the first approach for synthesizing novel texture images by copying irregularly shaped parts optimized using Graphcut from multiple source textures. All of the following systems (including Photosketcher) make use of a variant of this technique in order to avoid visible seams in the final composite image/texture. Agarwala et al. [2004] introduce a framework for interactive digital photomontage (IDP). Given a stack of photographs and image objectives that encode the properties that are desired in the final composite, IDP makes use of Graphcut optimization to find a composite that best fullfills all objectives. While the compositing stage in Photosketcher has been inspired by Graphcut Textures as well as IDP, the main difference lies in the type of images used to form the composite: in IDP as well as Graphcut Textures the source images are closely related and depict the same scene or object but with slightly different properties such as depthof-field, face expression or illumination. In Photosketcher this is very different: parts typically are selected from a large number of unrelated images and this makes compositing them into a seamless result image harder. Rother et al. [2006] define an automatic algorithm for assembling representative elements from a set of unrelated images into a convincing collage (AutoCollage). Generating the collage is formulated as a set of optimization problems, which include: automatically selecting the regions of interest from the input images, tightly packing the extracted regions into a collage such that important areas such as faces remain visible and finally making the seams between the collage elements visually undisturbing. AutoCollage is similar to Photosketcher with regard to the composition step in that unrelated image content is to be assembled into a final result image. However, it differs by its final goal: while in AutoCollage the result is a tightly packed collage of image elements our aim is to generate a realistic image that in the optimal case would be undistinguishable from a real photograph. Johnson et al. [2006] present Semantic Photo Synthesis, a system specifically designed for synthesizing photographs rather than collages from parts of existing images. Their main idea is to create an image that is semantically similar to the desired one by placing text labels on a canvas. Those labels indicate rough position and category of the desired part. Similarly to all other systems, the final composite image is generated using Graphcut optimization. Photo Clip Art [Lalonde et al. 2007] lets users insert new objects such as cars or humans into existing photographs. Differently from all other systems, the central idea of Photo Clip Art is to avoid modifying the inserted object. Instead, the idea is to let users only choose from objects that already have the required properties (such as lighting, perspective and resolution). In case such objects exists, the resulting composite images are extremly convincing. The main difference between Photo Clip Art and Photosketcher therefore lies in the type of image data required: while Photo Clip Art relies on a medium-sized collection of images that contains presegmented and annotated object instances, Photosketcher relies on the variety that comes with a much larger image collection (without any metadata). Chen et al. [2009] propose Sketch2Photo which uses a text-labelled sketch indicating the layout of a desired picture as the input. This annotated sketch is used to initially perform a keyword search on large online image collections (such as Google Images). For each element, the resulting set of pictures is filtered to retain only those images that are easy to automatically composite and paste them at the location which is indicated by the text-labelled sketch. The main difference to Photosketcher is that Sketch2Photo does not perform pure visual search: each object drawn by the user must be annotated with a keyword and a classical text-based search is then performed. Second, the system is not interactive and may require hours to synthesize novel images, while Photosketcher provides interactive output, such that users can edit and modify the pictures in real time. In case a computer generated rendition of the desired scene is already given, Johnson et al. [2011] propose the CG2Real system which replaces a CG image with parts of images from a large image collection. The resulting composite image is structurally similar to the original CG image but appears more realistic to the human observer. Overall, Semantic Photo Synthesis [Johnson et al. 2006], Photo Clip Art [Lalonde et al. 2007] as well as Sketch2Photo [Chen et al. 2009] are the systems that are most closely related to Photosketcher. While all systems share the same overall goal of synthesizing realistic novel photographs, to the best of our knowledge Photosketcher is the first one that interactively works on large, untagged image collections.

3 included/removed from the final composite image. Once the part is merged into one image, the user continues with sketching the next desired object or part. This loop repeats until the user is satisfied with the result. We illustrate this workflow in Fig. 2. We let users (optionally) select an initial background image using an initial sketch and retrieve step before the actual compositing loop begins. We use a Cintix UX21 touch sensitive monitor as the input device such that users are able to sketch their queries naturally as if using pen and paper. 4 Searching Figure 2: Photosketcher workflow: the user starts by sketching a part of the desired content, the system retrieves best matches from a large collection of images and presents those to the user. The user selects the best match and uses interactive compositing to create an updated version of the desired image. This process is iterated until the user is satisfied with the composite image. 3 Sketching The aim of Photosketcher is to provide the user with an easy-touse interface for interactively composing an image from parts of existing images. The fundamental principle is a loop, in which a user queries a large collection of images for matches to an outline sketch, chooses a query result, extracts a region from the chosen image, and pastes it into the current intermediate result (see Fig. 2). An important interface question is how to query the database. While using text queries alone would be too imprecise, example-based querying is also not an option as the required examples are typically not at hand. Instead, we argue that sketching feature lines is both a natural and simple way to quickly communicate the mental concept of a shape. Additionally, using only information contained in the images themselves rather than using additional metadata (e.g. keywords) makes Photosketcher work on arbitrary image collections. Note that we only allow binary sketches, i.e. feature lines do not carry color information. We have initially experimented with colored strokes but users found this additional input dimension difficult to exploit. In particular, users perform the following steps in one loop of the Photosketcher image generation process: sketch and retrieve: users sketch binary outlines to define the desired shape of the content. We use this sketch to query a large image collection. The result of the query for shapes is a small set of pictures with similar structure from which users select the desired one. transform and select: once a query result is chosen, users roughly draw a closed boundary around the desired part using a thick pen. This defines an initial selection which the user then translates and scales until the part is in the right position. extract and composite: from the rough selection in the previous step we learn a model of desired and undesired image parts and automatically compute an initial composition that takes the learned model into account. Using a stroke based interface, users can iteratively refine the composition by drawing scribbles that indicate that some parts should be additionally We propose using a bag-of-features (BoF) approach [Sivic and Zisserman 2003] for the sketch-based retrieval pipeline of Photosketcher. BoF search is a popular technique for example-based image retrieval and is due to its use of small local features inherently translation invariant. By quantizing the local features to a small finite set of visual words, the search can be accelerated using an inverted index which is a well-proven technique from text retrieval that is easily able to handle huge collections. In particular, we use the approach of Eitz et al. [2011] that generalizes BoF search for sketch-based image retrieval and therefore fulfills our requirements defined in Sec. 1. We now give a quick overview of the components of a typical BoF image retrieval system: Sampling in image (scale) space. This defines local pixel coordinates from which features are extracted. Often, an important algorithmic aspect is to make detection of those coordinates repeatable: in a transformed version of the original image (rigid or perspective transformation, different lighting etc.) the sampler should be able to primarily detect coresponding coordinates. Feature extraction and representation. Given a local coordinate, a feature is extracted that represents the image content within a small local window around that coordinate. The feature should be distinctive, i.e. perceptually different local regions should result in clearly discernable features. Quantization of features. In order to speed up retrieval as well as to achieve invariance to small local deformations, local features are quantized, yielding depending on the application a visual vocabulary containing between 1,000 und 250,000 visual words. Each image is then represented by a (sparse) histogram of visual word occurances. Search using inverted indices. An inverted index is typically constructed in an offline process and maps from a visual word to the list of images in which that word appears. This can significantly speed up the search process as only those images need to be considered that have at least one visual word in common with the query image. In the following paragraphs, we quickly review the stages of the retrieval pipeline applied in Photosketcher (which is similar to the one in Eitz et al. [2011]) and highlight where the sketch-based search differs from a typical example-based retrieval system. Edge extraction One important point is the pre-processing step required to extract the edge and contour information from the photographs in the image collection. This is an offline process and done only once for each image in the collection. Ideally, the resulting edge images would be perceptually very close to user sketches and retain only those edges that users are likely to sketch. This is a difficult problem and we resort to the standard approach of extracting

4 Canny edges from the images. Sampling We perform random sampling in image space, using 500 random samples per sketch/edge image. Note that we do not apply interest point detection as the sketches typically are very sparse and only very few interest points could be detected. We also do not apply any scale-space detection in sketch-space, although this could be an interesting direction for future work. We discard all sample points that would result in an empty feature beeing extracted. Feature representation We use the best performing feature representation from Eitz et al. [2011]: the SHoG descriptor essentially is a histogram of oriented gradients representation of the local image region. We use 4 4 spatial and 8 orientational bins. This makes the descriptor very similar to the SIFT descriptor except that no information about edge direction is stored (not available in binary sketches, only orientations from 0 to π can be discerned). Also, gradient magnitude is regarded as constant for edges/sketch lines. Learning a visual vocabulary Given the local features from all collection images, we randomly sample 1 million local features and use them for learning a visual vocabulary by performing standard k- means clustering. This approach ensures that the visual vocabulary is learned from a wide variery of images and thus general enough to handle arbitrary user sketches. We use k = 1000 clusters. Inverted index Given the visual vocabulary, we quantize each local feature to its nearest visual word (as measured using the standard Euclidean distance metric). As is standard for BoF approaches we build an inverted index that maps from visual word to the list of images containing that word. Query given user sketch While running the Photosketcher system, the inverted index as well as the visual vocabulary are accessible to the system. Given a user sketch, we perform the following steps to retrieve similar images: a) sampling and extracting local features; b) quantization of all features against the visual vocabulary and c) lookup in the inverted index and ranking using tf-idf weights [Sivic and Zisserman 2003; Eitz et al. 2011]. The most expensive step is the quantization stage and we speed this up by performing the quantization for all features in parallel. 5 Compositing Assuming a reasonably matching image S for a user sketch has been found, the last key step in Photosketcher is to progressively assemble the object(s) underlying the sketch with the existing background image B. The difficulty in the compositing step lies in finding a tradeoff between two conflicting requirements: 1. Segmenting the semantically relevant content from the queried image. It would be visually disturbing to cut away parts of the object a user wants to paste into the final image. 2. Creating a composition that is perceptually plausible, i.e. that introduces no visible artifacts along the seams. In Photosketcher, semantically important regions are specified using scribbles: initially, users roughly trace the border of the desired object with a thick pen. This partitions S into 3 regions: a desired region Ω D that must definitely come from S, an undesired region Ω U that must not come from S as well as a boundary region Ω B for which we are not sure yet whether the pixels in it rather belong to Ω D or Ω U (see Fig. 3). Our job now is to search for a seam in Ω B that results in an optimal composition result. Once an initial composition is found (which is typically not yet perfect) users can sketch additional small scribbles to progressively correct errors in the initial result. We make two types of scribbles available for that task: one constrains the pixels lying under it to be included in Ω D, the other constrains its underlying pixels to come from Ω U. This procedure is similar to that used in several existing image segmention approaches but differs in that we also need to take the second requirement into account when computing the segmentation of S. Overall, our compositing algorithm consists of three steps: First, given the user scribbles, we learn a Gaussian Mixture Model (GMM) that we employ to predict if pixels in Ω B should rather belong to Ω D or Ω U. Second, given the predicition from the GMM we use Graphcut to find a binary segmentation of S that results in an optimal composite image. And third, we optionally perform a blending step that smoothes out potentially remaining disturbing luminance differences along the seam between S and B. We now describe those three components in detail. Gaussian Mixture Model A GMM is a parametric representation of an arbitrary dimensional probability density function (pdf) as a linear combination of Gaussians. In Photosketcher, we learn two GMMs that represent a model of the desired and undesired parts of S. Intuitively, the idea is that we have two models of S: for each pixel n S they give us the probability (actually the probability density) that n should be in the composite image. Remember that in Photosketcher the user scribbles define a trimap and partition S into three disjunct regions: one that should definitely be in the composite image, one that should definitely not be; and an unknown region (see Fig. 3). The main idea is to learn models of the color distribution in those regions given the trimap: we learn a model of the desired region by sampling pixels from Ω D; similarly for the undesired regions by sampling from Ω U. Finally, this allows us to make a prediction about the pixels in the unknow region Ω U. A probability density function modeled as a linear combination of k Gaussians is defined as: P (n Θ) = k p in(n µ i, Σ i), (1) i=1 with p i = 1 and µ and Σ denoting the mean vector and covariance matrix of a multivariate Gaussian defined as: [ 1 N(n µ, Σ) = exp 1 ] (2π) M 2 Σ (n µ)t Σ 1 (n µ). The unknown parameters of a GMM (which we need to estimate) are Θ = [(p 1, µ 1, Σ 1),..., (p k, µ k, Σ k )]. In Photosketcher we work on 3-dimensional pixel values in the L*a*b color space (i.e. M = 3) such that Euclidean distance between L*a*b triplets corresponds closely to their perceptual distance. We randomly sample m = 4000 samples from both Ω D and Ω U and use those samples to learn the parameters for two GMMs. We use k = 5 Gaussian components for each model and perform standard expectationmaximization [Bishop 2006] to estimate Θ D,U. We denote by P D(n Θ D) the learned model of the desired image part and by P U (n Θ U ) the model of the undesired part. Finally, we are now able to predict for any unknow pixel x S ΩB whether it rather belongs to Ω D or Ω U. This is done by simply plugging x into Eqn. (1) given Θ D and Θ U, respectively. We visualize the result of such a prediction in Fig. 3. In the remaining part of this paper we write P D(n) as a shorthand for P D(n Θ D).

5 Figure 3: The initial user-drawn boundary curve (blue) partitions S into three disjoint regions. We train two Gaussian Mixture Models using samples from Ω U (crosses) and Ω D (circles). This allows to predict for any pixel whether it rather belongs to Ω D (middle) or Ω U (right). Redder values encode higher probabilities. GraphCut We treat finding those regions in S that yield a semantically as well as visually plausible composition as a binary pixel labeling problem. Each pixel n S is assigned a label l p {0, 1} that describes whether n should be taken from the source image S (l p = 0) or the background image B (l p = 1) in order to form an optimal composite image. Our task is to find a labeling that is a good compromise between both requirements. We find such a labeling by minimizing the cost function E(S, B) = E d (S) + λe s(s, B). (2) E d (S) is a function of all pixels in image S and encodes our first requirement of including all semantically relevant object parts. Intuitively, E d (S) should return a small value if all important parts are included in the compositon and a high value if semantically important parts are missing. E s(s, B) is a function of the pixels of both S and B and encodes the second requirement, i.e. that seams should run through regions where they produce the least amount of artifacts in the final composite image. Again, this should return high values for visible seams and low values otherwise. More specifically, we define both terms as: E d (S) = n S [(1 l p) ln P D(n) + l p ln P U (n)] (3) where P D,U (n) is the probability (density) of pixel n belonging to Ω D and Ω U, respectively (see Eqn. (1)). Since we later perform an additional blending step in the gradient domain, we use the cost function proposed for this problem by Hays and Efros [2007]. To simplify our notation we first define a single-channel intermediate image I that encodes the sum of squared differences between S and B, i.e. the pixel-wise squared Euclidean distance between S and B. Let {p, q} N p(i) denote the four pairs of pixels that form the 4-neighborhood of pixel p in I. Then we define our energy measure as: E s(s, B) = Ẽ(I) = l p l q p q 2 2. (4) {p,q} N p(i) Note that in Eqn. (3) and Eqn. (4) the terms (1 l p), l p and l p l q are used as indicator functions: they can take values of either 0 or 1. Their job is to select one of both terms in Eqn. (3) depending on whether pixel p gets labeled as 0 or 1. Similarly, the inner term of Eqn. (4) returns a value 0 only if two neighboring pixels {p, q} are separated by the seam, i.e get assigned different labels. To summarize: Eqn. (3) makes sure that the seam respects the learned model for the desired and undesired region while Eqn. (4) tries to force the seam through regions where the cut is visually least disturbing. We efficiently find the labeling that minimizes Eqn. (2) using Graphcut [Boykov and Kolmogorov 2004]. More specifically, we define a graph G = (V, E) where V is the set of m n + 2 vertices corresponding to the pixels in S plus two additional terminal nodes, the source S and the sink T. We define the edges in G to correspond to the 4-connectivity of the image, plus, for each node, two edges to the terminal nodes S and T. Associated with each edge is a weight computed from the term p q 2 2 from Eqn. (4). Associated with each edge to S and T is the weight ln P D(n) and ln P U (n), respectively (see Eqn. (3)). To incorporate the user constraints coming from the scribbles, we replace the weights for edges to S by if n is contained in Ω D and similarly for the edges to T if n is contained in Ω U. To compute the minimum cut in G which in turn gives us a labeling that minimizes Eqn. (2), we use the Graphcut implementation of Boykov et al. [2004]. To further reduce visible seams that might result from pasting the resulting region onto the canvas we perform an additional blending step. Blending In Photosketcher, we provide the user with three options for blending the selected object into the existing image: Direct copy of the pixels labeled as foreground to the backgroud. This most simple option is extremely fast and performs astonishingly well in many cases, especially when the seam can be hidden in highly textured regions of the background image. Feathering, i.e. alpha blending of the pixels within a certain small band of k pixels along the seam. Poisson blending [Pérez et al. 2003], i.e. compositing in the gradient domain instead of directly in the spatial domain. Working in the gradient domain retains the relative luminances of the pasted pixels but adapts the absolute luminance to that of the background luminances along the seam. This option can be helpful if the foreground and background are similarly textured along the seam but are differenly illuminated. Following our previous notation and the notation in Pérez et al. [2003] we denote by Ω the region in S that has been labeled with 0s in the Graphcut step, i.e. Ω = Ω F. We denote by Ω the boundary of Ω. S Ω then defines the set of actual pixel values of S lying in the region Ω. We now wish to find a set of unknown, optimized pixel values S Ω that have similar gradients (in a leastsquares sense) as the pixels in S Ω but at the same time minimize color offsets along Ω, which would otherwise be visually disturbing. This problem is commonly formulated as a quadratic optimiza-

6 Figure 4: Final compositing results created with the Photosketcher system. The user s input sketches are shown in the top row, chosen search results below and the final composite on the right. Bottom right: can you determine the number of components used? (solution: 4 mountain, water, pyramid and clouds). Note that users not always choose images that match the input sketches (e.g. bottom row: door and umbrella).

7 our system is interactive, users can simply modify their sketches to retrieve new results in case none of the matches is satisfying an important option that is not reflected in our evaluation numbers. Additionally, we show several sample retrieval results in Fig. 1 and Fig. 4. Figure 5: Comparison of blending options in Photosketcher: from left to right: copy, feathering, gradient domain. Note that gradient domain blending is able to hide the artifact between the two birds but also drastically changes their overall color. tion problem: S Ω = arg min S Ω S Ω 2 2 with S Ω = B Ω, (5) S Ω and its minimizer S Ω is known to be the solution to the Poisson equation with Dirichlet boundary conditions [Pérez et al. 2003]. The resulting system matrix is symmetric and positive definite which means we can solve for S Ω very efficiently by computing a Cholesky decomposition of the system matrix and performing backsubstitution. We reuse the decomposition for all color channels and only recompute the decomposition if Ω changes. 6 Results We have implemented a prototype system and show results in Fig. 1 and Fig. 4. The user interface is exclusively operated by sketches and scribbles for searching and compositing, respectively. The use of simple outlines for querying the database as well as the use of scribbles in the compositing step has been intuitive in our experiments with the Photosketcher system. We now discuss results of all parts of the Photosketcher pipeline. Image collection and pre-processing We use a collection of 1.5 million images downloaded from Flickr. Each image has a resolution of pixels and is stored in JPEG format. The complete collection takes approximately 400Gb of disk space. Computing descriptors, learning the visual vocabulary and computing the inverting index takes approximately 24 hours on a single modern 8-core machine (all steps have been parallelized such that all cores are fully utilized). We extract 500 random samples from each image and use a visual vocabulary of 1000 visual words. Evaluation of sketch-based retrieval We have performed an objective evaluation of the underlying sketch-based retrieval system described in Sec. 4. Using a set of 20 typical user sketches we have automatically generated the top 36 matches for each sketch (we also show 36 matches in our GUI). Showing a sketch along with a corresponding retrieval result, we have asked seven users (including two of the authors) to give binary relevance ratings, determining whether a result is a good match or not. Our evaluation shows that on average 29.1% of the images are considered good matches. In Fig. 6 we show more detailed evaluation plots: there seems to be a clear difference in what images users consider relevant. While participant #1 has been very strict and considered only 9.2% relevant, participant #7 considered 52.9% of the images to be relevant matches (the authors are #4 and #5). Also retrieval quality clearly depends on the input sketches: while sketch #9 generated 57.9% relevant matches, sketch #10 only generated 4.4%. While those numbers show that there is much room for improvement in precision (as expected), they also tell us that among the results we display in the GUI, several relevant results appear and users are typically able to spot and select those very quickly. Additionally, since Interactivity Overall, our experiments with Photosketcher show that one key aspect in its design is the interactivity of all pipeline steps: a) resketching to find better matches; b) optimizing a cut using scribbles and c) choosing among several alternative blending methods to further hide visible seams. All those steps provide interactive response times: our sketch-based search has response times of typically below 500ms; the iterative compositing between about 250ms and 1s (see Tab. 1 for more detailed response times). Note that both the GMM training stage (learning Θ D,U from Ω D,U ) as well as the Graphcut step can be accelerated by reusing previous solutions once an initial solution has been found. The idea is that an additional user scribble typically encodes only local constraints and therefore should evoke mostly local changes in the new solution. In this case, we do not expect the model to change a lot and start the GMM training stage from the previously learned Θ D,U. Typically, only a few iterations are needed until the GMM converges to the new solution. Compositing All results in this paper have been generated using the compositing interface proposed in Sec. 5. We train our GMMs from L*a*b color triplets we found this to work slightly better than using rgb triplets, probably due to the perceptually more uniform colorspace. We also found that the GMMs (using only pixel data for training) are often able to successfully make predictions about large, smooth regions. However, the prediction often fails at the borders of those regions. On close inspection, borders often have a very different color distribution than their adjacent regions (typically darker) and therefore a GMM is unable to predict those borders correctly in case corresponding training data has not been available. The compositing step suffers from imperfections in the images returned from the search: when searching for an object with a given shape, we do not consider the background around that object. A matching background texture and color is crucial, though, for achieving good results when employing Poisson blending. In a similar vein, it would be useful to align and snap the selected image objects to the current background. 7 Discussion and conclusion Overall, the compositing step is an extremely difficult task due to mostly very dissimilar images that users try to merge. Our compositing step relies heavily on the Gaussian Mixture Models: if their prediction is incorrect, this incorrect data is used as a prior for the Graphcut stage and the resulting seam will not be optimal. We usually use 5 components for the GMM, however an automatic selection could be desirable: if the image contains large, smooth areas, less components would be appropriate while other image areas containing detail and a wide variety of colors might require more than 5 components in order to be accurately represented. Compared to GrabCut [Rother et al. 2004] we find that learning full covariance matrices is possible in an interactive setting using a reasonable number of training samples (4000 in our case). Note that our system (on purpose) does not automatically filter out search results that are hard to composite (as e.g. Sketch2Photo does [Chen et al. 2009]). Again we argue that interactivity is key: since all pipeline steps are interactive our strategy is to give complete control to our users and let them decide whether an automatic segmentation is acceptable, further manual work is required

8 Table 1: Photosketcher pipeline stages timings (in ms). stage: search mask gmm-train gmm-predict graphcut blend time (ms): 300 < 1 80 (initial) (inital) < 1 (copy) 10 (reused) 100 (reused) 10 (feathering) 500 (poisson) 60 average relevance (per participant) 60 average relevance (per sketch) % relevant % relevant participant sketch Figure 6: Evaluation of sketch-based retrieval stage: we show the percentage of relevant results in the top 36 matches. Left: average percentage by participant (on average, participant 1 considered less results relevant than e.g. participant 3). Right: average percentage per sketch (our system returns fewer relevant results for sketch 10 than for e.g. sketch 11). or rather another image needs be selected. While we do not require any metadata coming along with the images, constructing a search index (see Sec. 4) in an offlineprecomputation step along with nearest-neighbor search during retrieval can be considered a sort of automatic tagging. Ideally, the descriptors describing our images in high-dimensional feature space would form semantically closely related clusters in its current form, the retrieval stage frequently does not respect semantic closeness (which can be observed in Fig. 4). Failure cases We observed failure cases mostly in 3 situations: The image collection does not contain any object semantically close to the searched one. We admit that it is difficult to determine whether the desired image is not in the collection or whether the sketch-based image retrieval step was the limiting factor. Note however that geometric similarity is almost always matched. Abstract sketches with a very specific semantic meaning attached to them such as stick figures for instance do not result in meaningful query results. This is since Photosketcher only measures the geometric similarity between a sketch and a picture for providing it as a query result. Photosketcher relies on the compositing step to generate convincing results. Often this is an extremely difficult task, e.g. when background textures do not match or the lighting of the part to be inserted is too different from that of the existing image. Future work A possible direction for future work in sketch-based retrieval could be to consider perceptual gradient orientations, as long structures in a given direction are perceived as more important, even if unstructured gradients define another direction in the same area (e.g. tree roots contours vs. grass direction). The Canny edge detector has problems in such cases and we believe that generating high quality line drawings from the image collection is essential for successful sketch-based retrieval. Extending sketch-based retrieval systems to handle abstract or symbolic sketches and more semantic information, is definitely an interesting and important direction for future work. An interesting observation is the dependence of the system on the collection content. Our collection contains very few objects in a simple frontal view (i.e. the front side of the house, the side view of a car). However, most users tend to sketch objects from these points of view and will find that only few images match their sketch simply because there are no objects in the collection with silhouettes as sketched by the user. We believe that making a system return perspectively projected views from a simple frontal view would help boosting sketch-based retrieval performance tremendously. A final aspect for future work would be to consider the existing image when searching for parts to be added this could be realized by additionally ranking the objects according to the estimated composite quality. This is an approach also taken in Photo Clip Art [Lalonde et al. 2007]. Note that this would be much harder in Photosketcher since the image collection used is much larger and there is no presegmentation of all images into objects from which the desired properties could be precomputed. Conclusion Photosketcher makes a first step towards the creation of new image content from simple sketches of feature lines, exploiting large online image collections without the need of additional metadata besides the raw images. In case high-quality metadata is available, this can be exploited and existing systems such as Photo Clip Art and Sketch2Photo [Lalonde et al. 2007; Chen et al. 2009] have shown extremely convincing results using this approach. The results of the final composite images generated by Photosketcher are very promising, despite the fact that two inherently difficult tasks finding images according to sketches and compositing unrelated images are combined into a single system. We found that interactivity is a critical component for usability of the system: users explore the space of possible pictures progressively and typically go through an intensive try-and-test session where all these steps

9 might be repeated several times. In such a context, we believe that Photosketcher offers an advantage over offline systems such as e.g. Sketch2Photo [Chen et al. 2009]. References AGARWALA, A., DONTCHEVA, M., AGRAWALA, M., DRUCKER, S., COLBURN, A., CURLESS, B., SALESIN, D., AND COHEN, M Interactive Digital Photomontage. ACM Transactions on Graphics 23, 3, BISHOP, C. Springer Pattern Recognition and Machine Learning. BOYKOV, Y., AND KOLMOGOROV, V An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 9, CHEN, T., CHENG, M.-M., TAN, P., SHAMIR, A., AND HU, S.- M Sketch2Photo: Internet Image Montage. ACM Transactions on Graphics 28, 5, 124:1 124:10. EITZ, M., HILDEBRAND, K., BOUBEKEUR, T., AND ALEXA, M PhotoSketch: a sketch based image query and compositing system. In ACM SIGGRAPH 2009 Talk Program. EITZ, M., HILDEBRAND, K., BOUBEKEUR, T., AND ALEXA, M Sketch-Based Image Retrieval: Benchmark and Bag-of- Features Descriptors. IEEE Transactions on Visualization and Graphics, PrePrint. HAYS, J., AND EFROS, A. A Scene completion using millions of photographs. ACM Transactions on Graphics 26, 3, 4:1 4:7. JOHNSON, M., BROSTOW, G., SHOTTON, J., ARANDJELOVIC, O., KWATRA, V., AND CIPOLLA, R Semantic Photo Synthesis. Computer Graphics Forum 25, 3, JOHNSON, M., DALE, K., AVIDAN, S., PFISTER, H., FREEMAN, W. T., AND MATUSIK, W CG2Real: Improving the Realism of Computer Generated Images using a Large Collection of Photographs. IEEE Transactions on Visualization and Computer Graphics 17, 9, KWATRA, V., SCHÖDL, A., ESSA, I., TURK, G., AND BOBICK, A Graphcut Textures: Image and Video Synthesis Using Graph Cuts. ACM Transactions on Graphics 22, 3, LALONDE, J.-F., HOIEM, D., EFROS, A. A., ROTHER, C., WINN, J., AND CRIMINISI, A Photo Clip Art. ACM Transactions on Graphics 26, 3, 3:1 3:10. PÉREZ, P., GANGNET, M., AND BLAKE, A Poisson Image Editing. ACM Transactions on Graphics 22, 3, ROTHER, C., KOLMOGOROV, V., AND BLAKE, A Grab- Cut Interactive Foreground Extraction using Iterated Graph Cuts. ACM Transactions on Graphics 23, 3, ROTHER, C., BORDEAUX, L., HAMADI, Y., AND BLAKE, A Autocollage. ACM Transactions on Graphics 25, 3, SIVIC, J., AND ZISSERMAN, A Video Google: A Text Retrieval Approach to Object Matching in Videos. In IEEE International Conference on Computer Vision,

Color Me Right Seamless Image Compositing

Color Me Right Seamless Image Compositing Color Me Right Seamless Image Compositing Dong Guo and Terence Sim School of Computing National University of Singapore Singapore, 117417 Abstract. This paper introduces an approach of creating an image

More information

Image Stitching using Watersheds and Graph Cuts

Image Stitching using Watersheds and Graph Cuts Image Stitching using Watersheds and Graph Cuts Patrik Nyman Centre for Mathematical Sciences, Lund University, Sweden patnym@maths.lth.se 1. Introduction Image stitching is commonly used in many different

More information

Shift-Map Image Editing

Shift-Map Image Editing Shift-Map Image Editing Yael Pritch Eitam Kav-Venaki Shmuel Peleg School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel Abstract Geometric rearrangement

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out

More information

Automatic Generation of An Infinite Panorama

Automatic Generation of An Infinite Panorama Automatic Generation of An Infinite Panorama Lisa H. Chan Alexei A. Efros Carnegie Mellon University Original Image Scene Matches Output Image Figure 1: Given an input image, scene matching from a large

More information

Photoshop Quickselect & Interactive Digital Photomontage

Photoshop Quickselect & Interactive Digital Photomontage Photoshop Quickselect & Interactive Digital Photomontage By Joseph Tighe 1 Photoshop Quickselect Based on the graph cut technology discussed Boykov-Kolmogorov What might happen when we use a color model?

More information

Panoramic Image Stitching

Panoramic Image Stitching Mcgill University Panoramic Image Stitching by Kai Wang Pengbo Li A report submitted in fulfillment for the COMP 558 Final project in the Faculty of Computer Science April 2013 Mcgill University Abstract

More information

Supervised texture detection in images

Supervised texture detection in images Supervised texture detection in images Branislav Mičušík and Allan Hanbury Pattern Recognition and Image Processing Group, Institute of Computer Aided Automation, Vienna University of Technology Favoritenstraße

More information

Sketch and Match: Scene Montage Using a Huge Image Collection

Sketch and Match: Scene Montage Using a Huge Image Collection Sketch and Match: Scene Montage Using a Huge Image Collection Yiming Liu 1,2 Dong Xu 1 Jiaping Wang 2 Chi-Keung Tang 3 Haoda Huang 2 Xin Tong 2 Baining Guo 2 1 Nanyang Technological University 2 Microsoft

More information

Final Exam Schedule. Final exam has been scheduled. 12:30 pm 3:00 pm, May 7. Location: INNOVA It will cover all the topics discussed in class

Final Exam Schedule. Final exam has been scheduled. 12:30 pm 3:00 pm, May 7. Location: INNOVA It will cover all the topics discussed in class Final Exam Schedule Final exam has been scheduled 12:30 pm 3:00 pm, May 7 Location: INNOVA 1400 It will cover all the topics discussed in class One page double-sided cheat sheet is allowed A calculator

More information

Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University

Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University Beyond Mere Pixels: How Can Computers Interpret and Compare Digital Images? Nicholas R. Howe Cornell University Why Image Retrieval? World Wide Web: Millions of hosts Billions of images Growth of video

More information

Automatic Colorization of Grayscale Images

Automatic Colorization of Grayscale Images Automatic Colorization of Grayscale Images Austin Sousa Rasoul Kabirzadeh Patrick Blaes Department of Electrical Engineering, Stanford University 1 Introduction ere exists a wealth of photographic images,

More information

Assignment 4: Seamless Editing

Assignment 4: Seamless Editing Assignment 4: Seamless Editing - EE Affiliate I. INTRODUCTION This assignment discusses and eventually implements the techniques of seamless cloning as detailed in the research paper [1]. First, a summary

More information

Additional Material (electronic only)

Additional Material (electronic only) Additional Material (electronic only) This additional material contains a presentation of additional capabilities of the system, a discussion of performance and temporal coherence as well as other limitations.

More information

MRFs and Segmentation with Graph Cuts

MRFs and Segmentation with Graph Cuts 02/24/10 MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Today s class Finish up EM MRFs w ij i Segmentation with Graph Cuts j EM Algorithm: Recap

More information

Prof. Feng Liu. Spring /17/2017. With slides by F. Durand, Y.Y. Chuang, R. Raskar, and C.

Prof. Feng Liu. Spring /17/2017. With slides by F. Durand, Y.Y. Chuang, R. Raskar, and C. Prof. Feng Liu Spring 2017 http://www.cs.pdx.edu/~fliu/courses/cs510/ 05/17/2017 With slides by F. Durand, Y.Y. Chuang, R. Raskar, and C. Rother Last Time Image segmentation Normalized cut and segmentation

More information

Fast Image Labeling for Creating High-Resolution Panoramic Images on Mobile Devices

Fast Image Labeling for Creating High-Resolution Panoramic Images on Mobile Devices Multimedia, IEEE International Symposium on, vol. 0, pp. 369 376, 2009. Fast Image Labeling for Creating High-Resolution Panoramic Images on Mobile Devices Yingen Xiong and Kari Pulli Nokia Research Center

More information

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester

Topics to be Covered in the Rest of the Semester. CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester Topics to be Covered in the Rest of the Semester CSci 4968 and 6270 Computational Vision Lecture 15 Overview of Remainder of the Semester Charles Stewart Department of Computer Science Rensselaer Polytechnic

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into

2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into 2D rendering takes a photo of the 2D scene with a virtual camera that selects an axis aligned rectangle from the scene. The photograph is placed into the viewport of the current application window. A pixel

More information

Seam-Carving. Michael Rubinstein MIT. and Content-driven Retargeting of Images (and Video) Some slides borrowed from Ariel Shamir and Shai Avidan

Seam-Carving. Michael Rubinstein MIT. and Content-driven Retargeting of Images (and Video) Some slides borrowed from Ariel Shamir and Shai Avidan Seam-Carving and Content-driven Retargeting of Images (and Video) Michael Rubinstein MIT Some slides borrowed from Ariel Shamir and Shai Avidan Display Devices Content Retargeting PC iphone Page Layout

More information

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli

SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS. Luca Bertinetto, Attilio Fiandrotti, Enrico Magli SHOT-BASED OBJECT RETRIEVAL FROM VIDEO WITH COMPRESSED FISHER VECTORS Luca Bertinetto, Attilio Fiandrotti, Enrico Magli Dipartimento di Elettronica e Telecomunicazioni, Politecnico di Torino (Italy) ABSTRACT

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 14 130307 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Stereo Dense Motion Estimation Translational

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection

CHAPTER 3. Single-view Geometry. 1. Consequences of Projection CHAPTER 3 Single-view Geometry When we open an eye or take a photograph, we see only a flattened, two-dimensional projection of the physical underlying scene. The consequences are numerous and startling.

More information

An Algorithm for Seamless Image Stitching and Its Application

An Algorithm for Seamless Image Stitching and Its Application An Algorithm for Seamless Image Stitching and Its Application Jing Xing, Zhenjiang Miao, and Jing Chen Institute of Information Science, Beijing JiaoTong University, Beijing 100044, P.R. China Abstract.

More information

Robust Shape Retrieval Using Maximum Likelihood Theory

Robust Shape Retrieval Using Maximum Likelihood Theory Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Statistical image models

Statistical image models Chapter 4 Statistical image models 4. Introduction 4.. Visual worlds Figure 4. shows images that belong to different visual worlds. The first world (fig. 4..a) is the world of white noise. It is the world

More information

Region-based Segmentation

Region-based Segmentation Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

More information

Opportunities of Scale

Opportunities of Scale Opportunities of Scale 11/07/17 Computational Photography Derek Hoiem, University of Illinois Most slides from Alyosha Efros Graphic from Antonio Torralba Today s class Opportunities of Scale: Data-driven

More information

Human Head-Shoulder Segmentation

Human Head-Shoulder Segmentation Human Head-Shoulder Segmentation Hai Xin, Haizhou Ai Computer Science and Technology Tsinghua University Beijing, China ahz@mail.tsinghua.edu.cn Hui Chao, Daniel Tretter Hewlett-Packard Labs 1501 Page

More information

Digital Makeup Face Generation

Digital Makeup Face Generation Digital Makeup Face Generation Wut Yee Oo Mechanical Engineering Stanford University wutyee@stanford.edu Abstract Make up applications offer photoshop tools to get users inputs in generating a make up

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction

Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction Supplemental Material for ACM Transactions on Graphics 07 paper Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes

More information

SkyFinder: Attribute-based Sky Image Search

SkyFinder: Attribute-based Sky Image Search SkyFinder: Attribute-based Sky Image Search SIGGRAPH 2009 Litian Tao, Lu Yuan, Jian Sun Kim, Wook 2016. 1. 12 Abstract Interactive search system of over a half million sky images Automatically extracted

More information

Large scale object/scene recognition

Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images query Image search system ranked image list Each image described by approximately 2000 descriptors 2 10 9 descriptors to index! Database

More information

Use of Shape Deformation to Seamlessly Stitch Historical Document Images

Use of Shape Deformation to Seamlessly Stitch Historical Document Images Use of Shape Deformation to Seamlessly Stitch Historical Document Images Wei Liu Wei Fan Li Chen Jun Sun Satoshi Naoi In China, efforts are being made to preserve historical documents in the form of digital

More information

Image Composition. COS 526 Princeton University

Image Composition. COS 526 Princeton University Image Composition COS 526 Princeton University Modeled after lecture by Alexei Efros. Slides by Efros, Durand, Freeman, Hays, Fergus, Lazebnik, Agarwala, Shamir, and Perez. Image Composition Jurassic Park

More information

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013 Feature Descriptors CS 510 Lecture #21 April 29 th, 2013 Programming Assignment #4 Due two weeks from today Any questions? How is it going? Where are we? We have two umbrella schemes for object recognition

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy BSB663 Image Processing Pinar Duygulu Slides are adapted from Selim Aksoy Image matching Image matching is a fundamental aspect of many problems in computer vision. Object or scene recognition Solving

More information

First Steps in Hardware Two-Level Volume Rendering

First Steps in Hardware Two-Level Volume Rendering First Steps in Hardware Two-Level Volume Rendering Markus Hadwiger, Helwig Hauser Abstract We describe first steps toward implementing two-level volume rendering (abbreviated as 2lVR) on consumer PC graphics

More information

Edge and local feature detection - 2. Importance of edge detection in computer vision

Edge and local feature detection - 2. Importance of edge detection in computer vision Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature

More information

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos Image as Markov Random Field and Applications 1 Tomáš Svoboda, svoboda@cmp.felk.cvut.cz Czech Technical University in Prague, Center for Machine Perception http://cmp.felk.cvut.cz Talk Outline Last update:

More information

This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you

This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you This work is about a new method for generating diffusion curve style images. Although this topic is dealing with non-photorealistic rendering, as you will see our underlying solution is based on two-dimensional

More information

Segmenting Scenes by Matching Image Composites

Segmenting Scenes by Matching Image Composites Segmenting Scenes by Matching Image Composites Bryan C. Russell 1 Alexei A. Efros 2,1 Josef Sivic 1 William T. Freeman 3 Andrew Zisserman 4,1 1 INRIA 2 Carnegie Mellon University 3 CSAIL MIT 4 University

More information

Mosaics. Today s Readings

Mosaics. Today s Readings Mosaics VR Seattle: http://www.vrseattle.com/ Full screen panoramas (cubic): http://www.panoramas.dk/ Mars: http://www.panoramas.dk/fullscreen3/f2_mars97.html Today s Readings Szeliski and Shum paper (sections

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

IMA Preprint Series # 2153

IMA Preprint Series # 2153 DISTANCECUT: INTERACTIVE REAL-TIME SEGMENTATION AND MATTING OF IMAGES AND VIDEOS By Xue Bai and Guillermo Sapiro IMA Preprint Series # 2153 ( January 2007 ) INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS

More information

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing Tomoyuki Nagahashi 1, Hironobu Fujiyoshi 1, and Takeo Kanade 2 1 Dept. of Computer Science, Chubu University. Matsumoto 1200,

More information

Image Analysis Lecture Segmentation. Idar Dyrdal

Image Analysis Lecture Segmentation. Idar Dyrdal Image Analysis Lecture 9.1 - Segmentation Idar Dyrdal Segmentation Image segmentation is the process of partitioning a digital image into multiple parts The goal is to divide the image into meaningful

More information

Fast Image Stitching and Editing for Panorama Painting on Mobile Phones

Fast Image Stitching and Editing for Panorama Painting on Mobile Phones Fast Image Stitching and Editing for Panorama Painting on Mobile Phones Yingen Xiong and Kari Pulli Nokia Research Center 955 Page Mill Road, Palo Alto, CA 94304, USA {yingen.xiong, kari.pulli}@nokia.com

More information

Image Resizing Based on Gradient Vector Flow Analysis

Image Resizing Based on Gradient Vector Flow Analysis Image Resizing Based on Gradient Vector Flow Analysis Sebastiano Battiato battiato@dmi.unict.it Giovanni Puglisi puglisi@dmi.unict.it Giovanni Maria Farinella gfarinellao@dmi.unict.it Daniele Ravì rav@dmi.unict.it

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

Multiple Model Estimation : The EM Algorithm & Applications

Multiple Model Estimation : The EM Algorithm & Applications Multiple Model Estimation : The EM Algorithm & Applications Princeton University COS 429 Lecture Nov. 13, 2007 Harpreet S. Sawhney hsawhney@sarnoff.com Recapitulation Problem of motion estimation Parametric

More information

Enhanced Hemisphere Concept for Color Pixel Classification

Enhanced Hemisphere Concept for Color Pixel Classification 2016 International Conference on Multimedia Systems and Signal Processing Enhanced Hemisphere Concept for Color Pixel Classification Van Ng Graduate School of Information Sciences Tohoku University Sendai,

More information

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class Image Segmentation Announcements Status reports next Thursday ~5min presentations in class Project voting From Sandlot Science Today s Readings Forsyth & Ponce, Chapter 1 (plus lots of optional references

More information

Learning to Recognize Faces in Realistic Conditions

Learning to Recognize Faces in Realistic Conditions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Image Blending and Compositing NASA

Image Blending and Compositing NASA Image Blending and Compositing NASA CS194: Image Manipulation & Computational Photography Alexei Efros, UC Berkeley, Fall 2016 Image Compositing Compositing Procedure 1. Extract Sprites (e.g using Intelligent

More information

Bias-Variance Trade-off (cont d) + Image Representations

Bias-Variance Trade-off (cont d) + Image Representations CS 275: Machine Learning Bias-Variance Trade-off (cont d) + Image Representations Prof. Adriana Kovashka University of Pittsburgh January 2, 26 Announcement Homework now due Feb. Generalization Training

More information

Segmentation. Separate image into coherent regions

Segmentation. Separate image into coherent regions Segmentation II Segmentation Separate image into coherent regions Berkeley segmentation database: http://www.eecs.berkeley.edu/research/projects/cs/vision/grouping/segbench/ Slide by L. Lazebnik Interactive

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Gradient Domain Image Blending and Implementation on Mobile Devices

Gradient Domain Image Blending and Implementation on Mobile Devices in MobiCase 09: Proceedings of The First Annual International Conference on Mobile Computing, Applications, and Services. 2009, Springer Berlin / Heidelberg. Gradient Domain Image Blending and Implementation

More information

Large Scale 3D Reconstruction by Structure from Motion

Large Scale 3D Reconstruction by Structure from Motion Large Scale 3D Reconstruction by Structure from Motion Devin Guillory Ziang Xie CS 331B 7 October 2013 Overview Rome wasn t built in a day Overview of SfM Building Rome in a Day Building Rome on a Cloudless

More information

University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision

University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision report University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision Web Server master database User Interface Images + labels image feature algorithm Extract

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Image Base Rendering: An Introduction

Image Base Rendering: An Introduction Image Base Rendering: An Introduction Cliff Lindsay CS563 Spring 03, WPI 1. Introduction Up to this point, we have focused on showing 3D objects in the form of polygons. This is not the only approach to

More information

Broad field that includes low-level operations as well as complex high-level algorithms

Broad field that includes low-level operations as well as complex high-level algorithms Image processing About Broad field that includes low-level operations as well as complex high-level algorithms Low-level image processing Computer vision Computational photography Several procedures and

More information

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering Digital Image Processing Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 21 Image Enhancement Frequency Domain Processing

More information

Visualization 2D-to-3D Photo Rendering for 3D Displays

Visualization 2D-to-3D Photo Rendering for 3D Displays Visualization 2D-to-3D Photo Rendering for 3D Displays Sumit K Chauhan 1, Divyesh R Bajpai 2, Vatsal H Shah 3 1 Information Technology, Birla Vishvakarma mahavidhyalaya,sumitskc51@gmail.com 2 Information

More information

Su et al. Shape Descriptors - III

Su et al. Shape Descriptors - III Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that

More information

Video Operations in the Gradient Domain. Abstract. these operations on video in the gradient domain. Our approach consists of 3D graph cut computation

Video Operations in the Gradient Domain. Abstract. these operations on video in the gradient domain. Our approach consists of 3D graph cut computation Video Operations in the Gradient Domain 1 Abstract Fusion of image sequences is a fundamental operation in numerous video applications and usually consists of segmentation, matting and compositing. We

More information

AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S

AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S Radha Krishna Rambola, Associate Professor, NMIMS University, India Akash Agrawal, Student at NMIMS University, India ABSTRACT Due to the

More information

Evaluation of GIST descriptors for web scale image search

Evaluation of GIST descriptors for web scale image search Evaluation of GIST descriptors for web scale image search Matthijs Douze Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg and Cordelia Schmid INRIA Grenoble, France July 9, 2009 Evaluation of GIST for

More information

CSE 252B: Computer Vision II

CSE 252B: Computer Vision II CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribes: Jeremy Pollock and Neil Alldrin LECTURE 14 Robust Feature Matching 14.1. Introduction Last lecture we learned how to find interest points

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Midterm Examination CS 534: Computational Photography

Midterm Examination CS 534: Computational Photography Midterm Examination CS 534: Computational Photography November 3, 2016 NAME: Problem Score Max Score 1 6 2 8 3 9 4 12 5 4 6 13 7 7 8 6 9 9 10 6 11 14 12 6 Total 100 1 of 8 1. [6] (a) [3] What camera setting(s)

More information

PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing

PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing Barnes et al. In SIGGRAPH 2009 발표이성호 2009 년 12 월 3 일 Introduction Image retargeting Resized to a new aspect ratio [Rubinstein

More information

Efficient Rendering of Glossy Reflection Using Graphics Hardware

Efficient Rendering of Glossy Reflection Using Graphics Hardware Efficient Rendering of Glossy Reflection Using Graphics Hardware Yoshinori Dobashi Yuki Yamada Tsuyoshi Yamamoto Hokkaido University Kita-ku Kita 14, Nishi 9, Sapporo 060-0814, Japan Phone: +81.11.706.6530,

More information

Graph-Based Superpixel Labeling for Enhancement of Online Video Segmentation

Graph-Based Superpixel Labeling for Enhancement of Online Video Segmentation Graph-Based Superpixel Labeling for Enhancement of Online Video Segmentation Alaa E. Abdel-Hakim Electrical Engineering Department Assiut University Assiut, Egypt alaa.aly@eng.au.edu.eg Mostafa Izz Cairo

More information

ELEC Dr Reji Mathew Electrical Engineering UNSW

ELEC Dr Reji Mathew Electrical Engineering UNSW ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion

More information

Segmentation and Grouping

Segmentation and Grouping CS 1699: Intro to Computer Vision Segmentation and Grouping Prof. Adriana Kovashka University of Pittsburgh September 24, 2015 Goals: Grouping in vision Gather features that belong together Obtain an intermediate

More information

Region-based Segmentation and Object Detection

Region-based Segmentation and Object Detection Region-based Segmentation and Object Detection Stephen Gould Tianshi Gao Daphne Koller Presented at NIPS 2009 Discussion and Slides by Eric Wang April 23, 2010 Outline Introduction Model Overview Model

More information

Multiple Model Estimation : The EM Algorithm & Applications

Multiple Model Estimation : The EM Algorithm & Applications Multiple Model Estimation : The EM Algorithm & Applications Princeton University COS 429 Lecture Dec. 4, 2008 Harpreet S. Sawhney hsawhney@sarnoff.com Plan IBR / Rendering applications of motion / pose

More information

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC COS 429: COMPUTER VISON Segmentation human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC Reading: Chapters 14, 15 Some of the slides are credited to:

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 WRI C225 Lecture 02 130124 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Basics Image Formation Image Processing 3 Intelligent

More information

Lecture 10: Semantic Segmentation and Clustering

Lecture 10: Semantic Segmentation and Clustering Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305

More information

Novel Lossy Compression Algorithms with Stacked Autoencoders

Novel Lossy Compression Algorithms with Stacked Autoencoders Novel Lossy Compression Algorithms with Stacked Autoencoders Anand Atreya and Daniel O Shea {aatreya, djoshea}@stanford.edu 11 December 2009 1. Introduction 1.1. Lossy compression Lossy compression is

More information

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors Texture The most fundamental question is: How can we measure texture, i.e., how can we quantitatively distinguish between different textures? Of course it is not enough to look at the intensity of individual

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

A Hillclimbing Approach to Image Mosaics

A Hillclimbing Approach to Image Mosaics A Hillclimbing Approach to Image Mosaics Chris Allen Faculty Sponsor: Kenny Hunt, Department of Computer Science ABSTRACT This paper presents a hillclimbing approach to image mosaic creation. Our approach

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

An Integrated System for Digital Restoration of Prehistoric Theran Wall Paintings

An Integrated System for Digital Restoration of Prehistoric Theran Wall Paintings An Integrated System for Digital Restoration of Prehistoric Theran Wall Paintings Nikolaos Karianakis 1 Petros Maragos 2 1 University of California, Los Angeles 2 National Technical University of Athens

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 30 (Monday) HW 4 is out

More information

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford

Video Google faces. Josef Sivic, Mark Everingham, Andrew Zisserman. Visual Geometry Group University of Oxford Video Google faces Josef Sivic, Mark Everingham, Andrew Zisserman Visual Geometry Group University of Oxford The objective Retrieve all shots in a video, e.g. a feature length film, containing a particular

More information

A System of Image Matching and 3D Reconstruction

A System of Image Matching and 3D Reconstruction A System of Image Matching and 3D Reconstruction CS231A Project Report 1. Introduction Xianfeng Rui Given thousands of unordered images of photos with a variety of scenes in your gallery, you will find

More information

Texture. CS 419 Slides by Ali Farhadi

Texture. CS 419 Slides by Ali Farhadi Texture CS 419 Slides by Ali Farhadi What is a Texture? Texture Spectrum Steven Li, James Hays, Chenyu Wu, Vivek Kwatra, and Yanxi Liu, CVPR 06 Texture scandals!! Two crucial algorithmic points Nearest

More information

Segmentation and Grouping

Segmentation and Grouping Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation

More information