Volumetric Shape Grammars for Image Segmentation and Shape Estimation

Size: px

Start display at page:

Download "Volumetric Shape Grammars for Image Segmentation and Shape Estimation"

Hubert Webster
5 years ago
Views:

1 Volumetric Shape Grammars for Image Segmentation and Shape Estimation Elias Mahfoud Dept. of Computer Science University of North Carolina at Charlotte Charlotte, NC Andrew Willis Dept. of Elect. and Comp. Engineering University of North Carolina at Charlotte Charlotte, NC Abstract In this paper, we present a technique that uses 3D shape grammars to segment and estimate rectilinear shapes in non-rectified images. While others have proposed the use of shape grammars for segmentation, i.e., assigning labels to rectangular 2D regions within rectified images, the proposed method is innovative for the following reasons: (1) it uses projections of 3D shapes to define non-rectangular segmentation regions and (2) the approach also estimates the unknown shape parameters of the imaged object. As others have done in the past, we requireuser interaction to learn probability distributions for classifying pixels from the input image to each of the unknown classes. We then use a 3D shape grammar to hypothesize 3D models for the imaged object. A search procedure hypothesizes different 3D models by modifying the shape and pose parameters of the 3D shape grammar. The parameters of the hypothesized model that best fits the input image provide a segmentation of the image into semantic parts and shape estimates for each of the segmented parts. The key difference between the proposed approach and previous approaches is the 3D nature of our shape generation and estimation process which represents an advancement over existing approaches that are currently restricted to 2D representations. We describe the method and show segmentation and estimation results using non-rectified images. I. INTRODUCTION Segmentation, the process of assigning semantic labels to objects within images, has been a core computer vision problem for more than 30 years. Shape grammars provide a way to specify dynamic geometric models that result in structured semantic labeling and segmentation with plausible parameters. This has numerous applications in the games and movies industry, mapping applications such as Google maps and Bing maps, and reconstruction of archaeological sites from images. Several attempts have been made to use shape grammars as models for objects to aid in solving segmentation and estimation problems [1], [2]. The majority of these methods [3], [4], [5], [6] use rectified images of building facades as input, and apply two-dimensional grammars to segment, i.e., divide, the input image into semantically meaningful rectangular regions, e.g., windows, walls, doors, etc. Other methods such as that described in [7] use non-shape grammars to segment non-rectified images, yet such grammars have limited representation capabilities applicable to Manhattanstyle buildings (i.e. buildings with faces aligned to the world axes) and used calibrated oblique-angle images (i.e. images which have their pixels mapped to the actual longitude and latitude coordinates). Our proposed method addresses the constrains of having to use rectified images and limiting the space to Manhattan-style buildings, by using a generic volumetric grammar and utilizing a free camera model. We created PSML (Procedural Shape Modeling Language); a volumetric shape grammar language that generates hierarchies of terminal and non-terminal solid shapes with associated semantic labels. The models generated by this language are volumetric and can be viewed from any angle. Some terminals might be partially or fully occluded by other terminals. Our system is capable of rendering image masks of isolated terminals, taking into account occlusions by other terminals with different labels. The image masks can then be used to select regions from pixel-level probabilities and feed them into an optimization algorithm. Any optimization method can be used on the resulting pixel-probabilities in order to find the best fit between the generated grammar and the input image. The contribution of this paper lies mainly in the introduction of such a modular framework, in which different modules can be easily replaced by more advanced and complicated subsystems. This yields a highly customizable optimization system capable of producing plausible image segmentation from non-rectified images. II. RELATED WORK Procedural modeling is an important topic in computer graphics where they are often used to automatically generate very large and detailed models for buildings and cities. Many different types of procedural methods have been devised. Early examples used L-Systems to automatically generate 3D plant models [8] and simulate their interaction with the environment [9]. Subsequent work developed shape grammars for modeling cities [10] and buildings [11]. Other work used tensor fields instead of grammars for modeling street networks [12], where a tensor refers to a symmetric and traceless matrix used to guide the generated streets. One recent and very popular method for generating buildings using shape grammars is CGA (Computer Generated Architecture) [13]. CGA is capable of generating highly detailed building shells which include windows, doors, and walls. (We are not claiming a shape grammar representation or model contribution in this paper).

Figure 1: A diagram demonstrating the high-level modules in the optimization engine.

have published a sequence of articles that use two-dimensional grammars based on CGA to represent rectified facades of urban style buildings.

An optimization procedure based on random walks uses the shape grammar to hypothesize a segmentation of the image into a collection rectangular tiles, each of which is assigned a grammar object.

Later work described in [6] improved the performance of the optimization process by using reinforcement learning, which was shown to achieve better results in less computing time on their test data.

Our method does not require the use of rectified images and, as such, can be applied to a wider range of problems.

2 Figure 1: A diagram demonstrating the high-level modules in the optimization engine. Methods which apply shape grammars to help solve computer vision segmentation problems are relatively new and several approaches have been proposed. Teboul et. al. have published a sequence of articles that use two-dimensional grammars based on CGA to represent rectified facades of urban style buildings. Their first approach used randomized forests to learn a likelihood distribution which expresses the likelihood of each grammar object for each image pixel. An optimization procedure based on random walks uses the shape grammar to hypothesize a segmentation of the image into a collection rectangular tiles, each of which is assigned a grammar object. A score for each hypothesized segmentation is taken as the likelihood of the observed image pixels given segmentation hypothesized by the shape grammar [5]. Later work described in [6] improved the performance of the optimization process by using reinforcement learning, which was shown to achieve better results in less computing time on their test data. However, as mentioned previously, these approaches are limited to using 2D grammars and are constrained to only work on rectified images. Our method does not require the use of rectified images and, as such, can be applied to a wider range of problems. Another approach was proposed by Vanegas [7] which uses 3D grammars to estimate Manhattan-style buildings from aerial images. Although this approach is not limited to rectified images and 2D grammars, it limits the problem to using calibrated oblique-angle images. In addition, the grammar used for representing building changes produces very simple building shapes with no details such as windows and doors. Our proposed method utilizes a more generic grammar capable of generating complex shapes with different terminal labels allowing for generating optimal solutions over all labels. III. A PPROACH Our approach formulates the problem as a Bayesian inference problem where we seek to find the segmentation that maximizes the likelihood of the observed data, i.e., as a maximum likelihood estimation (MLE) problem. The input to the system is a 2D image D, a shape grammar (in PSML format; see III-A for details) having parameters θ g, and a set of view parameters θ v. The output is a segmentation of the image, provided as a collection of pixel labels, L, an estimate of the object shape, θ g, and, if appropriate, an estimate of the view from which the image was generated, θ v. While it Figure 2: Some examples of models generated using PSML. is theoretically possible to estimate all of these parameters in the proposed framework, this article assumes that the view parameters, θ v, are a-priori known and are not estimated for any of the presented results in this article. A flow chart of the estimation process is shown in figure 1. The system is divided into several subsystems that collectively perform the estimation: A supervised learning module that generates pixel-level probabilities A shape grammar which takes in grammar variables,θ g, and produces 3D shapes S(θ g ) A rendering engine that synthesizes mask images from the created shape S(θ g ) An optimization procedure that searches for the collection of parameters that maximize the likelihood of the observed data; (θ g,l ) It is worth noting that each subsystem can be independently replaced by more advanced algorithms to improve the performance or accuracy of the proposed method. However, in this initial version of the system, we chose very simple implementations for each of those modules. Specifically, we manually train the pixel classifier and we use a standard gradient-descent algorithm as the optimization procedure for exploring the space of plausible solutions. More advanced algorithms can be added in the future. A. Model Generation using PSML (Procedural Shape Modeling Language) The purpose of the shape grammar is to provide plausible 3D models for objects within the image. Shape grammars use values of the shape parameters to automatically generate shapes which would otherwise require a very large number of variables to accurately represent (see figure 2). This greatly reduces the number of unknown variables associated with possibly complex shapes and provides a computationally tractable way to estimate possibly complex shapes from sparse measurements, e.g., a single image.

3 The shape grammar parser generates an instance of a 3D model using a value for the shape parameters, θ g, and a shape grammar program. Shape grammar programs are specified in PSML, a volumetric 3D modeling language we developed to create 3D models of real-world objects. The shape grammar program contains a collection of rules which describe how to construct the 3D model and labels for each of the elements within constructed 3D models. These labels are used as the unknown segmentation labels which we seek to assign to the image pixels. PSML derives its syntax from Java and adds syntax and functionality needed for specifying shape grammars. In particular rules blocks may be declared within the source code which specify the methods to automatically construct a shape from a collection of primitive shape components, e.g., pieces of spheres, cylinders, 3D rectangles. Each rule within a rule block is provided and input shape, the predecessor, and replaces the shape with one or more possibly different shapes referred to as successors. Rules have a syntax as shown below: predecessor:condition:[modifiers]{successors}; where condition represents an expression that, if true, allows the rule to be executed and [modifiers] denotes an optional list of modifiers which can modify the successor symbols, e.g., change their color. Algorithm 1 is an example of a shape grammar specified in PSML to represent the shape in figure 3a. The program starts with the main() method at line 24, which creates a root object and selects a properties file. The properties file is what defines the grammar parameters which will be optimized by our system. The BuildingMassModel() function generates the shape using these parameters. The process starts by creating a box with the dimensions of the building (lines 4-5) then it splits the box horizontally into two parts having different widths (lines 6 and 8). The resulting parts are then split vertically into the boxes having different heights (lines 11, 14). The shape parameters of this grammar determine the specific locations for the height and width splits (shown as variables with prefix in Algorithm 1). Different values for these parameters allow the shape grammar to represent simple rectilinear objects, i.e., objects whose surface is well approximated as a collection of three different 3D rectangles. B. Rendering Engine The rendering engine takes 3D shapes generated by the shape grammar, S(θ g ), and values for the view, θ v, and renders the synthesized object. The rendering process decomposes the image into a collection of mask images, one for each semantic label within the shape grammar. These mask images denote, for the provided shape and view parameters, where we expect to observe pixels from each semantic class assuming that the values of both the shape and view parameters are correct. The system utilizes jmonkey 3.0 as the rendering engine to generate the mask images. jmonkey is a modern, open-source 3D engine written in Java. The engine facilitates Algorithm 1 The PSML program used to create figure 3a 1 grammar BuildingMassModel { 2 method BuildingMassModel() { 3 rules { 4 parent::i("box", 6 mainmass::split("x", {@westwidth, scope.sx - 7 westwidth}){westmass, east}; 8 westmass::split("z", {@west.northdepth, 9 scope.sz 10 {westnorthmass, westsouthmass}; 11 westsouthmass::split("y", {@west.southheight, 12 scope.sy 13 {south, space}; 14 westnorthmass::split("y", {@west.northheight, 15 scope.sy 16 {north, space}; 17 south::{j3d.terminal}; 18 north::{j3d.terminal}; 19 space::void(){j3d.terminal}; 20 east::{j3d.terminal}; 21 } 22 } method main() { 25 rules { 26 Axiom::I("box", {1, 1, 1}){model}; 27 model::useattributes("building.properties", 28 "building"){buildingmassmodel()}; 29 } 30 } 31 } the creation of custom scene processors that can be applied to each frame before or after it is rendered. We used this functionality to write a mask scene processor which renders a specified list of terminals in place, occluding them by the nonspecified terminals which are invisible. The result is a bitmap mask showing the silhouette of the visible parts of the selected terminals (see figure 3). (a) (b) (c) (d) Figure 3: Example of bitmap mask rendering. (a) shows three adjacent boxes with different sizes and different labels. (bd) show three different mask images rendered for the three different boxes. Notice how the white silhouettes take into account occlusions by other labels. C. Optimization The optimization process searches the grammar parameter space for values of the shape parameter vector, θ g, that best fit the specified input image, i.e. it searches for parameter values that generate shapes that, when projected using the view and overlaid onto the input image, partition the input image data into collections of pixels which agree with the proposed labeling. As we mentioned earlier, our main contribution lies

(a) Building (b) Ceiling (c) Walls (d) Windows Figure 4: A manually-labeled non-rectified image of a synthetic building (image courtesy to http://drawlogic.

We chose gradient descent as the search engine in this current implementation as it is simple to implement.

Our current implementation involves the following steps: 1) Use a label classifier to compute likelihood distributions for each pixel.

2) Provide a PSML grammar representing the class of objects which are expected to be found in the input image data.

a) Using the current guess for the shape parameters, θ g, compute image masks for each label and apply them to the corresponding likelihood images to identify those pixels lying in regions associated

4 (a) Building (b) Ceiling (c) Walls (d) Windows Figure 4: A manually-labeled non-rectified image of a synthetic building (image courtesy to in the introduction of such a framework, not in the specific algorithms used in its modules. We chose gradient descent as the search engine in this current implementation as it is simple to implement. On the other hand, this algorithm is not very robust and it is prone to converging to local minima. We believe that replacing this algorithm with a more advanced one will greatly improve performance. Our current implementation involves the following steps: 1) Use a label classifier to compute likelihood distributions for each pixel. The distribution is represented as a collection of images which referred to as likelihood images. One image must be computed for every label in the label set L. 2) Provide a PSML grammar representing the class of objects which are expected to be found in the input image data. 3) Apply the optimization engine to search for the best value of the shape grammar parameters. a) Using the current guess for the shape parameters, θ g, compute image masks for each label and apply them to the corresponding likelihood images to identify those pixels lying in regions associated with a given label. b) Compute the joint likelihood of all the pixel data given the labels that they have been assigned from (a). c) Compute the numerical gradient of the likelihood function as a function of the shape parameters and update the shape parameters to increase the joint likelihood. d) Stop when the likelihood cannot be increased, the corresponding shape parameters provide a shape estimate for the objects in the image and the image masks specify a segmentation of the image for each of the labels. Collectively these steps allow us to estimate the best segmentation and shape model for the objects in the image. D. Optimization Process Details 1) Pixel-Level Probabilities: The first step in the optimization process is to use a supervised-learning algorithm to generate initial approximate pixel probabilities. For each label, we input a gray-scale image of the same size as the original image, where the pixel value at position (x, y) represents the likelihood of the pixel belonging to the corresponding label L i, denoted as P (D(x, y) L i ). The gray-scale likelihood images can be generated using any standard image classifier such as Gaussian Mixture Model (GMM) or randomized forests. In our case and for demonstration purposes, we simply used manually assigned probability images as input, where areas that belong to a specific label are painted with white while the remaining areas remained black as shown in figure 4. 2) Input PSML Grammar: Each shape grammar is written to represent a class of objects, e.g., one grammar might represent various realizations of a coffee mug. Hence, a shape grammar program must be selected as is appropriate to the object(s) contained in the input image. The various shapes that can be realized by changing the shape parameters is referred to as the shape space of the grammar or, more simply, they are thought of as shapes that satisfy the grammar. As mentioned earlier, the labels of shapes listed within the shape grammar program provide the possible values for the label set L and the pixel probability images state how likely each pixel is to have a specific label value, e.g., the Figure 4(c) shows the likelihood image for the label Wall. 3) Generate Mask Images: For each label L i, a mask is rendered using the current shape, S(θ g ), and the current view, θ v. Each mask image, I Li (x, y θ g ), indicates the image pixels which are assumed to have label. This is represented mathematically using the indicator function as defined in equation (1). I Li (x, y θ g ) = { 0 if the pixel at (x, y) Li 1 if the pixel at (x, y) = L i (1) The indicator function is used by the search engine to select regions from pixel-probabilities for each label. The use of image masks allows selecting non-rectangular regions of pixels, which is the main difference from previous methods that used flat rectangles to select pixels. For a specific label L i, a region R Li is the set of pixel positions (x, y) such that the indicator function is not zero, as defined in equation (2). R Li = {(x, y) I Li (x, y θ g ) = 1} (2) 4) Generate an Energy Function: The probability of the region associated with the i th label from the label set is given by equation (3). p(r Li θ g ) = p(x, y L i, θ g ) (3) (x,y) R Li Note that in this equation the labels of (x, y) locations are presumed to be independent. Products of a large number of probabilities such as that in equation (3) tend to generate extremely small numbers which can introduce difficulties to optimization algorithms due to numerical instabilities. For this reason we take the negative log of the probabilities which converts equation (3) into an energy function as shown in equation (4).

E(R Li θ g ) = (x,y) R Li log(p(x, y L i, θ g )) (4) The energy of the chosen shape parameters is the sum of region energies for each of the label values shown as equation (5).

This makes the energy function smoother and facilitates convergence of the minimization.

The components of the gradient are the partial derivatives of the function E(θ g ) as seen in equation (6). ( E E(θ g ) =, E ) E,.

5 E(R Li θ g ) = (x,y) R Li log(p(x, y L i, θ g )) (4) The energy of the chosen shape parameters is the sum of region energies for each of the label values shown as equation (5). (a) BG (b) East (c) North (d) South (e) Input (f) Diff E(θ g ) = L i L E(R Li θ g ) (5) To facilitate convergence of the gradient based energy minimization process, we blur the likelihood images. This makes the energy function smoother and facilitates convergence of the minimization. 5) Gradient Descent Optimization: A gradient of a function f(θ g ) where θ g = (θ g1, θ g2,, θ n ) is a vector field that points in the direction of maximum increase in the value of that function. The components of the gradient are the partial derivatives of the function E(θ g ) as seen in equation (6). ( E E(θ g ) =, E ) E,..., (6) θ g1 θ g2 θ gn Gradient descent is a simple optimization algorithm that finds the minimum of a function. Starting at an initial value of the input parameters θ 0 g, the algorithm repeatedly computes the value of the gradient vector and changes the parameters vector according to the value of the gradient vector using equation (7). θ n+1 g = θ n g γ E(θ n g ) (7) where γ is a small number. Theoretically, the value of E(θ n+1 g ) is less than or equal to E(θ n g ). As a result, the value of the function will decrease until it reaches a local minimum. The process stops when E(θ n+1 g ) E(θ n g ) < ɛ. The value of θ n g at that point is the optimization result ˆθ g. For our implementation, we numerically compute the gradient of the parameters vector by introducing small perturbations to the vector components one at a time, and computing the difference in the energy function of the generated image. After the differences have been computed for all the parameters, we multiply the resulting gradient vector by γ and subtract the result from the current parameters vector, then repeat the process until convergence. However, since computing the numerical derivative is not accurate, the actual process does not always converge and the value of E(θ n+1 g ) might actually increase in some cases. From a performance point of view, it is worth noting that in order to compute the energy function, the number of render operations needed is equal to the number of available classes. Furthermore, to compute the gradient, the energy function needs to be computed once for the original parameter values, and then once for every perturbation for each parameter. Let the number of available labels be N, and the number of grammar parameters be M, then the total number of render operations needed per gradient-descent iteration is N M. This (g) Result Figure 5: (a,b,c,d) collectively show the likelihood distribution for the image pixels, P (D(x, y) L i ) where for (a) L i = BG, for (b) L i = East, for (c) L i = North, and for (d) L i = South. Sub-figure (e) shows the input image, (f) shows the difference between the input image and the final segmentation result, and (g) shows the energy function values for each iteration of the minimization process and images along this curve depict intermediate results for the estimation. emphasizes the importance of using a fast rendering engine so that the rendering step does not form a bottleneck. IV. RESULTS We ran our test cases on an Intel Core 2 Quad CPU Q9550 running at 2.83GHz, with 4GB of RAM with Windows 7 64bit. We used synthesized renderings as the input images to make sure that the image object can be represented by the specified grammar. For pixel probabilities, we used manual labeling where the user specifies the areas associated with each label by marking them with a higher gray-scale value than other areas. The user labeling does not have to be accurate, as the process is reasonably tolerant to noise. The user labeling is then blurred to make sure the generated energy function is smooth. The first test case was a 3D shape consisting of three blocks that occlude each other. We used initial values for grammar parameters that were different from the original values used to generate the input image, θ g. We recorded the energy differences at various times as gradient descent approached θ g. The result is shown in figure 5. For the second test case we used a non-rectified facade image. The user input for this case is much more noisy and inaccurate than the previous case, yet the process was able to find parameter values that are reasonably close to θ g. The result is shown in figure 6. In the case of the facade, the gradient descent algorithm was less robust than the previous case and took longer to converge. We can see a spike in the energy difference in figure 6f

(a) BG (b) Wall (c) Window (d) Input (e) Final (f) Result Figure 6: (a,b,c) collectively show the likelihood distribution for the image pixels, P (D(x, y) L i ) where for (a) L i = BG, for (b) L i =

(f) shows the energy function values for each iteration of the minimization process and images along this curve depict intermediate results for the estimation.

The second reason is due to the more complicated behavior of the facade grammar in combination with the numerical computation of the gradient.

As a result, even small changes of some parameters cause a significant change in the generate facade and subsequently the energy function gradient might become inaccurate.

Table I shows the initial, input, and final values after gradient descent has converged for the facade test case. Facade Param. Name θg θ 0 g ˆθ g Error floor.height 3.0 2.7 2.9 0.03 window.height 2.

6 (a) BG (b) Wall (c) Window (d) Input (e) Final (f) Result Figure 6: (a,b,c) collectively show the likelihood distribution for the image pixels, P (D(x, y) L i ) where for (a) L i = BG, for (b) L i = W all and for (c) L i = W indow. (d,e) show a side-by-side comparison of the input image, (d), and the final segmentation result, (e). (f) shows the energy function values for each iteration of the minimization process and images along this curve depict intermediate results for the estimation. where the algorithm diverged for a while before it started converging again. There are two reasons for this behavior: first, the provided pixel-probabilities are very noisy and inaccurate. The second reason is due to the more complicated behavior of the facade grammar in combination with the numerical computation of the gradient. The windows and floors in this grammar are created by a periodic split operation in the x and y directions. As a result, even small changes of some parameters cause a significant change in the generate facade and subsequently the energy function gradient might become inaccurate. Addressing this problem requires the use of more advanced search algorithm than gradient descent. Table I shows the initial, input, and final values after gradient descent has converged for the facade test case. Facade Param. Name θg θ 0 g ˆθ g Error floor.height window.height window.width window.wallpane facade.width facade.depth facade.height Table I: Comparison between initial values, θ 0 g, input values, θ g, and final optimization values, ˆθ g, and percent error. is not limited to using 2D shapes or rectified images. It is also is capable of segmenting higher details from a single image. The main contribution is the introduction of such modular framework that facilitates the application of newer algorithms to improve segmentation results. In the future, we plan to address performance issues to improve optimization speed. Since the current approach uses non-rectangular silhouettes of labels, it is not possible to apply optimization techniques such as integral images [5]. We plan to use Graphics Processing Unit (GPU) to apply label masks in real time and then support multi-threaded summations to generate the energy function. We expect a significant increase in performance after applying those improvements. We also plan to apply similar principles to allow inferring parameters of occluded parts of 3D shapes without the need for multiple views from different directions. REFERENCES [1] J. Schlecht, K. Barnard, E. Spriggs, and B. Pryor, Inferring grammarbased structure models from 3d microscopy data, in Computer Vision and Pattern Recognition, CVPR 07. IEEE Conference on. IEEE, 2007, pp [2] A. Toshev, P. Mordohai, and B. Taskar, Detecting and parsing architecture at city scale from range data, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp [3] S. Lee and R. Nevatia, Extraction and integration of window in a 3d building model from ground view images, in Computer Vision and Pattern Recognition, CVPR Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2. IEEE, 2004, pp. II 113. [4] P. Müller, G. Zeng, P. Wonka, and L. Van Gool, Image-based procedural modeling of facades, in ACM SIGGRAPH 2007 papers, ser. SIGGRAPH 07. New York, NY, USA: ACM, [5] O. Teboul, L. Simon, P. Koutsourakis, and N. Paragios, Segmentation of building facades using procedural shape priors, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp [6] O. Teboul, I. Kokkinos, L. Simon, P. Koutsourakis, and N. Paragios, Shape grammar parsing via reinforcement learning, in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, pp [7] C. A. Vanegas, D. G. Aliaga, and B. Benes, Building reconstruction using manhattan-world grammars, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp [8] P. Prusinkiewicz, A. Lindenmayer, and J. Hanan, The algorithmic beauty of plants, ser. Virtual laboratory. Springer-Verlag, [9] R. Měch and P. Prusinkiewicz, Visual models of plants interacting with their environment, in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 1996, pp [10] Y. Parish and P. Müller, Procedural modeling of cities, in Proceedings of the 28th annual conference on Computer graphics and interactive techniques. ACM, 2001, pp [11] J. Duarte, Customizing mass housing: a discursive grammar for siza s malagueira houses, Ph.D. dissertation, Massachusetts Institute of Technology, [12] G. Chen, G. Esch, P. Wonka, P. Müller, and E. Zhang, Interactive procedural street modeling, in ACM SIGGRAPH 2008 papers, ser. SIGGRAPH 08. New York, NY, USA: ACM, 2008, pp. 103:1 103:10. [13] P. Müller, P. Wonka, S. Haegler, A. Ulmer, and L. Van Gool, Procedural modeling of buildings, in ACM SIGGRAPH 2006 Papers, ser. SIGGRAPH 06. New York, NY, USA: ACM, 2006, pp V. CONCLUSIONS AND FUTURE WORK We presented a system for segmenting non-rectified images using a 3D volumetric shape grammar. The proposed method

Image-Based Buildings and Facades

Image-Based Buildings and Facades Peter Wonka Associate Professor of Computer Science Arizona State University Daniel G. Aliaga Associate Professor of Computer Science Purdue University Challenge Generate