A Complete and Practical System for Interactive. Walkthroughs of Arbitrarily Complex Scenes

Size: px

Start display at page:

Download "A Complete and Practical System for Interactive. Walkthroughs of Arbitrarily Complex Scenes"

Mervyn Manning
5 years ago
Views:

1 A Complete and Practical System for Interactive Walkthroughs of Arbitrarily Complex Scenes DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Lining Yang, M.S. ***** The Ohio State University 23 Dissertation Committee: Professor Roger A. Crawfis, Adviser Professor Han-Wei Shen Professor Raghu Machiraju Approved by Adviser Computer and Information Science Graduate Program

2 ABSTRACT Complex renderings of synthetic scenes or virtual environments, once deemed impossible for consumer rendering, are becoming available as tools for young artists. These renderings, due to their high-quality image synthesis, can take minutes to hours to render. With the increase of the computing power, scientific simulation can produce datasets in the tera-bytes or even pita-bytes of range. Rendering one frame of these massive datasets sometimes can take a long time to finish. Therefore, interactivity for these large datasets and complex scenes is not possible using traditional rendering techniques. Our work focuses on using Image-Based Rendering (IBR) techniques to manage and explore large and complex datasets and virtual scenes on a remote display across the high-speed network. The key idea for this research is to pre-process the scene and render key viewpoints on pre-selected paths inside the scene. We then save these pre-processed partial results into a database on the server. The user can browse through the datasets on the client side. Whenever he or she needs information, the clients sends the request through the high speed network to the server and the server can retrieve the information and send it through the network. ii

3 In this dissertation, we present new Image-based models to reconstruct approximations to any view along the path, which allows the user to roam around inside the virtual environments with interactive frame rates. We partition the scenes or datasets into several depth slabs to avoid occlusion and dis-occlusion problems. We then use mesh simplification methods to simplify the geometry produced from the pre-rendering step so that the underline geometry is manageable for our system. We compare two simplification methods, namely, down-sampling and feature preserved mesh simplification schemes and compare the errors produced by each methods. We then present the methods to manage our image-based database to achieve even more efficient renderings. These methods include empty texture tiles removal, texture bindings reduction technique, a novel two-part caching and pre-fetching scheme and our new data reduction method using a track-dependent occlusion culling algorithm. The system has been successfully tested on several scenes and on two different platforms and satisfying results have been obtained which proves that our system is suitable for handling large datasets and complex scenes. iii

4 Dedicated to My Father iv

5 ACKNOWLEDGMENTS I would like to first thank my advisor, Dr. Roger Crawfis. Without Roger s help and encouragement, I don t think that I can achieve this. Not only does Roger give directions to my research problems, he also gives advise on how to conduct research itself. A lot is learned during my 4 years of stay as his student I would also like to thank my committee members, Dr. Han-Wei Shen and Dr. Raghu Machiraju. Your patience with reading my thesis is greatly appreciated. Your advice on how to improve the research is also very helpful. I would like to thank Naeem Shareef. Fruitful discussion and talk with you on research problems as well as problems in life helped me a lot. Really enjoyed the several years of working together. I also thank Jian Huang. It is him who introduced me to the Graphics and Visualization Lab here at OSU, which leads me into the wonderful research world of Computer Graphics. I would like to thank my other collegues such as Wulue Zhao, Daqing Xue. Working with you guys make me learn a lot. Finally, I would like thank my family for their support. My wife, Yani Gao, without her love and support, I don t think that I can go this far. My mother, her love is always there no matter what happens. v

6 TABLE OF CONTENTS Abstract..ii Dedication..iv Acknowledgement.v Page Vita viii List of Tables..x List of Figures xi Chapters:. Introduction 2. Related Work.6 2. Image-Based Rendering Introduction to Image-Based Rendering Plenoptic Function Depth Function and Reconstruction Efficiency IBR Accelerated Virtual Walkthrough Remote Visualization 9 3. Image-Based Model and Geometry Simplifications Overview System Architecture Reconstruction for Novel Views Visibility Polyhedron Slab Representation Geometric Simplifications Simple Down-Sampling Down-Sampling technique..38 vi

7 Depth Mesh Errors Visibility Errors Feature Preserved Decimation Decimation Algorithms Error Analysis Summary Data Management Empty Texture Tiles Removal Caching and Pre-fetching Texture Removal Using a Conservative Track-Dependent Occlusion Culling Algorithm., Review of Occlusion Culling Algorithms Our Occlusion Culling Algorithms Results and Discussions Summary Results, Discussions and Future Work..85 Bibliography..98 vii

8 VITAE August 4, 973 Born Nanjing, P.R. China 995..B.S., Biomedical Engineering, Southeast University, Nanjing, China 999..M.S. Computer and Information Science, The Ohio State University, Columbus, OH, USA Graduate Research Associate, MRI Research, The Ohio State University Internship, Siemens Corporate Research 999 present...graduate Research and Teaching Associate, Graphics and Visualization Lab, The Ohio State University PUBLICATIONS. Lining Yang, Roger Crawfis, A Panoramic Walkthrough System with Occlusion Culling, 23 Eurographics Workshop on Virtual Environment, Zurich, May, Lining Yang, Roger Crawfis, Rail-Track Viewer, an Image Based Virtual Walkthrough System, 22 Eurographics Workshop on Virtual Environment, pp , Robitaille P-M.L., Warner R., Jagadeesh J., Abduljalil A.M., Kangarlu A., Burgess R.E., Yu Y., Yang L., Zhu H., Jiang Z., Bailey R.E., Chung W., Somawiharja Y., Feynan P., and Rayner D. (999) Design and Assembly of an 8 Tesla Whole Body MRI Scanner. J. Comp. Assist. Tomog. 23: Kangarlu A., Baertlein B.A., Lee R., Ibrahim T., Yang L., Abduljalil A.M., and Robitaille P-M.L. (999) Dielectric Resonance Phenomena in Ultra High Field Magnetic Resonance Imaging. J. Comp. Assist. Tomogr. 23: Robitaille P-M.L., Abduljalil A.M., Kangarlu A., Zhang X., Yu Y., Burgess R., Bair S., Noa P., Yang L., Zhu H., Palmer B., Jiang Z.,Chakeres D.M., and viii

9 Spigos D. (998) Human Magnetic Resonance Imaging at Eight Tesla. NMR Biomed., FIELDS OF STUDY Major Field: Computer and Information Science ix

10 LIST OF TABLES Table Page 3. Shows the change of loading time, rendering speed, Mean Squared Error for a single panoramic (kx4k) with depth and 3 slabs using different tile sizes. The rendered image resolution was kxk. This is for the Nature dataset on our Sun Blade system. MSE is based on a maximum number of 256 shades for each color Shows the change of loading time, rendering speed, Mean Squared Error and Storage of a single panoramic (kx4k) with depth and 3 slabs using different tile sizes. The rendered image resolution was kxk. This is for the Nature dataset on our Dell Precision 53 system. MSE is based on a maximum number of 256 shades for each color Reduction Rates for Track Dependent Occlusion Culling using different α values for Castle Dataset Reduction Rates for Track Dependent Occlusion Culling using different α values for Nature Dataset... 8 x

11 LIST OF FIGURES Figure Page 2. The comparison and relationships between geometry based and imagebased rendering systems How two samples are used to warp and combine the pre-rendered image into the image for the current viewpoint.3 3. One pre-selected path for our internal viewer through a model of the Nature scene System diagram a two-part pipeline is used The definition of a kernel for polygons How to reconstruct new views from neighboring two reference viewpoints The graphical plots of the linear and non-linear compositing (b = 3) equations Shows the original depth image Shows the depth image after down-sampling Shows a single α -clipped depth slab Shows the first slab depth image afterα -clipping Shows the second slab depth image afterα -clipping Shows the Mean Square Error (MSE), the loading time, Frame Per Second (FPS) and the storage requirement with different tile sizes for a typical sampled viewpoint of the POVRAY rendered Nature dataset on our Sun Blade system xi

12 3.2 Shows the Mean Square Error (MSE), the loading time, Frame Per Second (FPS) and the storage requirement with different tile sizes for a typical sampled viewpoint of the POVRAY rendered Nature dataset on our Dell Precision 53 system Mean Square Error of interpolated views against actual rendered views A simplified triangular mesh using the feature-preserved decimation algorithm in VTK for one reference viewpoint A 6x6 quad-mesh is used for the same viewpoint, which has the same reduction rate as the triangular mesh Shows the errors using a simplified triangular mesh and a 6x6 fixed size quad mesh Comparison of an actual rendered view, an interpolated view using a quad-mesh and an interpolated view using a triangular mesh An example of how the empty tile removal scheme works How to group individual texture tiles into a one-dimensional tile array How our 2-part caching and pre-fetching works Unnecessary information produced by slab representations How our track-dependent occlusion culling works Shows the pseudo-code for our view dependent occlusion culling algorithm Shows the rendering images with track dependent occlusion culling with α values equal to Shows the rendering images with track dependent occlusion culling with α values equal to xii

13 4.9 Shows the rendering images with track dependent occlusion culling with α values equal to Shows the rendering results for LOX dataset Shows the rendering results for Castle dataset Shows the rendering results for Night dataset Shows the rendering results for Nature dataset (interpolated) Shows the Povray-rendered result for Nature dataset Shows the difference image between Figure 5.4 and xiii

14 CHAPTER. INTRODUCTION High-quality renderings of synthetic scenes or virtual environments, due to their complex image synthesis, can take minutes to hours to render. Ray-tracing or global illumination (for example, Distributed Ray Tracing [Cook84] or Radiosity [Cohen85] [Cohen93]) tools such as POVRAY [POVRAY] and Radiance [Ward94] can take a long time to render. With very complex scenes and illumination effects, such as those appearing at POVRAY s [POVRAY] competition site, it can take hours or even days to finish rendering one frame. An interactive virtual walkthrough of these large and complex scenes is almost impossible on a low to mid-end system using traditional rendering techniques.

15 With today s computational power, scientific datasets tend to become fairly large, often increasing from gigabytes to terabytes. The new massive parallel ASCI machines, for example, can easily generate terabytes of data from scientific simulations. These datasets can contain structured, unstructured points, vector, scalar or tensor fields, which require complicated rendering methods. For datasets of this size and complexity, even the most powerful machine existing can hardly manage them interactively. Feature detection [Banks94], simplification [Dey99] and parallelization [Huang] methods have been used to accelerate the renderings. However, even these methods cannot achieve interactivity. In fact, with traditional visualization techniques, it usually takes minutes, even hours, to render just one frame. To get a better understanding, users would like to interact with the data, get quick feedback about the data and build a mental model from the data. This requires interactive frame rates, at least several frames per second, being rendered. Without interactivity, users will have to wait and reorient themselves. This distracts the users focus on understanding the data. Interactivity is very important, however, not possible for complex scenes or large datasets using traditional rendering techniques. This dissertation work examines methods to solve this problem. Our goal is to allow the user to examine and walkthrough the scene from an internal vantage point on a relatively high-resolution display interactively. To achieve this goal, we deploy Image-Based Rendering (IBR) techniques as a post-processing tool for any traditional high-quality renderer. 2

16 IBR is a new research area in the computer graphics community, and offers advantages over the traditional rendering techniques. It can utilize real-life images and illumination for photo-realistic rendering. It requires a fixed or limited amount of work, regardless of the view or data context. It should be noted that this amount of work is proportional to the input image size. Many IBR techniques [McMillan95] [Mark97] [Rademac98] [Gortler96] [Levoy96] use the entire set of input images and therefore can only focus on accurate renderings of relatively low-resolution imagery. More information on IBR techniques will be presented in the next section. Here we explore techniques for large displays having a resolution from kxk to 8Kx3K, as in our new parabolic video wall. Our objectives are: i. Accurate results at pre-selected viewpoints. ii. Smooth movement from one accurate view to another with minimal rendering errors and no disruptive artifacts or popping. iii. Support for extremely high-resolution imagery in the interactive IBR framework. iv. A decoupling of the pre-computed imagery from the resulting viewing platform over high-speed networks. v. Support for many different rendering tools. vi. Guaranteed frame-rates regardless of the data size, rendering complexity or display configuration. 3

17 Our work can be viewed as an extension to QuickTime VR [Chen95] or other panoramic representations [Szeliski97]. Panoramic imagery allows one to spin around at their current viewing position, but does not allow the user to move forward or backward. We developed a system to allow movement along a piecewise linear path in three-dimensions. At any position on this curve, the user can interact with the scene as in a panoramic viewer. We call this type of viewing, a rail-track view, in which the user can move forward and backward along the track, while viewing the outside scenery in any direction. Darsa, et al [Darsa97] investigated techniques to incorporate information about the depth of the image into the panoramic scene. Depth allows for proper re-projection of the pixels from different viewpoints and provides a sense of motion parallax to give a true three-dimensional sense to the imagery. Only with depth incorporated into the panoramic viewer can the user be allowed to move away from the sampled viewpoint with reasonable errors. We adopted their integration of the depth information into our system. We want our system to be general enough so that the user can browse through the dataset or scene from any platform with different rendering packages for the data. Therefore, we implement a server-client based system. To guarantee interactivity regardless of the data size and scene complexity, we sample the pre-defined rail-track and render the partial results for these sampled (reference) viewpoints using any rendering packages the user requires. We then save these partial results into a database on the server. The user browses through the dataset on the client side. 4

18 Whenever the data is needed, the client sends a request through the high-speed network and the server retrieves the data and sends them down the network to the client. The client can then use the IBR technique presented here to reconstruct the novel viewpoints. This will be discussed more in Chapter 3. The dissertation is organized as follows: First we discuss relevant background and previous work. We then present an overview of the system and the Image-based model. We then present our geometric simplification schemes followed by error analysis. In Chapter 4 we discuss our data management methods that can reduce the database size for more efficient rendering performance. Finally we conclude with some test results, discussions and ideas for future work. 5

19 CHAPTER 2. RELATED WORK Our goal here is a complete and practical system for real-time panoramic walkthroughs of arbitrarily complex environments. In this chapter, we will discuss some background related to Image-based Rendering and remote application systems using IBR techniques. First we discuss Image-Based Rendering and present some of the recent work done in this area. 6

20 2. Image-Based Rendering: 2.. Introduction to Image-Based Rendering Image-based Rendering (IBR) is a newly evolved rendering technique. Many researchers in graphics and visualization are interested in issues of IBR itself as well as using IBR to accelerate renderings. The basic idea of IBR is to model or render a scene, or portions of a scene, with sets of images instead of pure geometry. These sets of images are rendered or taken at so-called reference viewpoints, which are used to reconstruct results for other viewpoints. It can reduce the cost of directly modeling or rendering the scene. Texture mapping [Blinn76] and Environmental mapping [Greene86] can be regarded as the earliest work of IBR techniques. A pure IBR representation has no model geometries but only images. The advantages of the IBR representation over the traditional rendering techniques using pure geometry are as follows. First it has the property of a bounded computation according to the input and output image size. This is advantageous over traditional polygonal based rendering where complex scenes or models can result in too many polygons within the viewing frustum. Secondly, it can use photography to give photo-realistic effects. Figure 2. is a slide from Mark Levoy s 997 Siggraph tutorial slide. This figure shows the comparison and relationships between traditional geometry based rendering and Image-Based rendering. They have different pipelines. The geometry based rendering techniques model the conceptual world and render these models for a real-time walkthrough. The IBR pipeline use image acquisition methods to capture images from the real world and use image-base models to reconstruction and render the scene interactively. From the figure we can also see that geometry-based and image-based 7

21 rendering systems can be easily tied together. Off-line pre-processing can be performed on the geometries modeled from the conceptual world to produce images, and these images can be used for image-based rendering. On the other hand, image analysis and other computer vision techniques can be used on images taken from real world to produce geometries needed to perform geometry-based rendering. Figure 2.: shows the comparison and relationships between geometry based and image-based rendering systems (courtesy of Mark Levoy, Siggraph 997 Tutorial). 8

22 Using pure images to represent the 3D geometric models often produces more errors than traditional geometry based rendering techniques because of the nature of reconstructions of novel views from images of the reference viewpoints. Our goals for this dissertation work are to guarantee accurate results at reference viewpoints and interactive frame-rates for walking through complex scenes with high resolution. Therefore, IBR is appropriate for our purposes. Next we will discuss some related work in the IBR area Plenoptic Function IBR systems are based on the plenoptic function. The 7D plenoptic [Andelson9] function is defined as the intensity of light scattered or reflected at every location, V, V, V, at every possible viewing angle, θ and φ, for every wavelength, λ, and at x y z any time, t. µ = Plenoptic( θ, φ, λ, V, V, V, t) (2.) x y z Even with faster CPU s and more memory, this function overwhelms modern architectures, making it impractical for interactive applications. In order to make practical use of image-based rendering concepts, this model has to be simplified. McMillan, et al [McMillan95] simplified the model to a 5D plenoptic function (equation 2.2) by assuming static scenes and that the reflections are wavelength independent. µ λ = µ ( θ, φ, ) (2.2), t V 9

23 Note that V is a vector representation for the reflection position V, V, V. x y z McMillan s image warping system is based on this 5D plenoptic function. The reference images are aquired in cylindrical manifolds and used as the plenoptic model. The epipolar geometries [Olsen92] [Zhang96] are then computed and derived dense angular disparity maps [Yaki78] [Falken94] using computer vision techniques. Using these, they can compute the aggregate warp, which produce the images for new viewpoints from the information of the reference view. If the scene can be confined inside a bounding box, the model can be further simplified to a 4D function, such as used in the Lumigraph [Gortler96] or the Lightfield [Levoy96] systems. In the Lightfield or Lumigraph systems, a pair of planes is used and all the rays in space that intersect the pair of planes are captured. This is actually a parameterization of the 4D plenoptic function. Essentially, an image of the scene represents a 2D slice of the light-filed. In order to generate a new view, a 2D slice must be extracted and resampling may be required. In a ray space context the image corresponding to a new view can be generated pixel by pixel from the database. Two steps are required: computing the coordinates of each required ray and re-sampling the radiance at that location. Therefore, in Lighfiled or Lumigraph, the plenoptic function is essentially parameterized as a 2-D array of 2-D light field images. These systems usually use a large amount of reference views and form a huge collection of rays. Views of the scene from arbitrary viewpoints are computed by interpolation of necessary rays from nearby rays belonging to the collection. A coarse depth structure is used to optimize the non-uniform sampling of the reference views. Shum et al [Shum99] introduced a 3D representation of the plenoptic function which they call Concentric Mosaics. They

24 reduce the plenoptic function to a 3D function by restricting the user s movement to lie within 2D circular regions. They capture the scene into a collection of concentric panoramas and allow the user to move his/her viewpoints freely in the circular regions on a plane within the collection of concentric panoramas. Each column of the reconstructed imagery of the novel viewpoint comes from an appropriate panorama using the tangent value of the current viewpoint on the circle as the viewing direction. QuickTime VR systems [Chen93] [Chen95] further reduce the function to two independent variables, namely the two viewing angles. This is achieved by letting the user stand at a fixed point and look around. The user is not allowed to move away from their current viewpoint, otherwise gaps and holes will occur. To move from one viewpoint to another, the system only allows discrete jumps which produce severe popping effect and the user s attention will be distract from these popping and artifacts. Our work [Yang2] [Yang3] focused on allowing the user to move smoothly on a pre-selected path while looking around. Hence it is modeling a userdefined 3D slice through the plenoptic function. A continuous representation of the plenoptic function is usually not achievable and hence we need to sample and discretize it. A discretized version of the 5D plenoptic function is represented using the following equation. µ i λ, t i i i = µ ( θ, φ, V ) (2.3) Many choices remain in the sampling locations and viewing directions. Some systems choose to sample the viewpoints outside of the scene [Shareef2] [Gortler96], while

25 others put their sample viewpoints inside the scene, on a path [Cohen99] [Yang2], inside a circle [Shum99], or at a fixed point [Chen95]. Systems may choose the reference views a priori, pre-rendering the sampled plenoptic function. Other systems [Qu] choose view samples on the fly and reconstruct images of new views from these until the image quality degrades and new reference views are needed Depth Function and Reconstruction For opaque scenes, the location or depth of the point reflecting the color is usually determined, which is the intersection of the viewing ray with the first object along the ray path. A separate 6D depth function (no wavelength dependency), which we denote ξ, is of value for reconstructing the novel views (i.e. views not located at the sample points). ξ = ξ ( θ, φ, V ) (2.4) We need to sample and discretize this function for practical use. ξ = ξ θ, φ, V ) (2.5) i ( i i i Equation 2.3 and 2.5 provide a framework for IBR representations of a scene. IBR systems can also differ in how they represent this depth function (geometric information). QuickTime VR [Chen95] and spherical panoramic systems [Szeliski97] do not store any geometric information. The images for new view-points are reconstructed using implicit geometric relationships. This is adequate if the user decides not to move away from the current viewpoint. If the user moves away from 2

26 the viewpoint, gaps and holes will occur. Other systems, such as McMillan s [McMillan95] warping system and Multiple-Center-Of-Projection system [Rademac98], store depth information for each sample point. They use the imagery and the stored depth/disparity information to warp/reconstruct the information for the viewpoints that are not in the pre-selected viewpoint set. Users can move slightly away from the pre-selected viewpoints. However, if the user moves away from all of the pre-selected views, errors and holes can occur. Similarly our system samples both the plenoptic and depth function and stores these samples in the database for reconstruction of novel views. Figure 2.2: shows how two samples are used to warp and combine the pre-rendered image into the image for the current viewpoint (Mark et al [Mark97]) 3

27 Holes will occur if the user moves farther away from the current viewpoint due to the occlusion and dis-occlusion problems that are not addressed by these 3D warping systems [McMillan95] [McMillan97]. This is primarily because the fact that only one layer of the depth value is associated with each pixel. As the user moves off axis, previously occluded objects at a pixel can appear. The system doesn t store information for these previously occluded objects and therefore holes appear. Mark, et al [Mark97] [Mark99] use information from more than one reference viewpoint to fill these holes. Figure 2.2 [Mark97] shows how they use two reference viewpoints to warp and generate the image for the current viewpoint. From the figure we can see that for each reference viewpoint, if we warp the image to the current viewpoint, there are holes which are indicated in green in the figure. However, if we combine the two warped images, we can fill the holes in one image using the information from the other image. Layered Depth Images (LDI) [Shade98] [Chang99] explicitly store several layers of depth and color information for each pixel. When rendering from an LDI, the user can move farther away and expose surfaces that were not visible in the first layer. The previously occluded objects can be rendered using information from distant layers. The model, however, is inefficient when we have complicated scenes that result in a large number of depth and color value per pixel. It is also inefficient when the output image has extremely high resolutions since the system needs to process or render several depth values per pixel for a large number of pixels. Other systems partition the object space into several slabs along the viewing direction [Max95] [Choi98] [Decoret99] [Mueller99]. The data within a slab is projected towards the image plane and then texture mapped onto a quad-mesh oriented parallel 4

28 with the image plane and perturbed by the z-values of its corresponding slab. When reconstructed for a novel view, this can approximate an image warp. Our system [Yang2] [Yang3] follows this approach Efficiency Most of the IBR systems concentrate on accurate rendering of relative low-res imagery. Systems based on 3D warping [McMillan95] [Mark97] [Shade98] [Rademac98] use per-pixel operations and do not benefit much from high-quality hardware acceleration. They are not suitable for interactive walkthroughs on very high-resolution displays. Lumigraph [Gortler96] and Light-Field Renderings [Levoy96] need very dense samplings, which is both time consuming and space inefficient. View dependent texture mapping (VDTM) [Debevee98] is a typical example utilizing texture hardware to accelerate rendering. However, for one reference viewpoint, the complete viewing direction is not adequately sampled and therefore it cannot allow head rotation during the fly-through. Cohen-Or et al, [Cohen99] also looked into ways to pre-compute the geometry and textures on a path and stream the results across the network. Again, their work lacks the ability to allow the user to change the viewing direction during the walkthrough because of the incomplete sampling. Their method focuses on how to compress the resulting textures and efficiently stream them down the network. The closest implementation to ours is Darsa et al s [Darsa97], in which they used cubical environmental maps with simplified triangular meshes. Three blending methods were explored for smooth walkthroughs between close-by viewpoints. Our system combines theirs and Cohen- 5

29 Or s systems into a complete and robust system. We have also adopted a multi-layer approach comprised of several depth meshes to better solve the occlusion and disocclusion problems. Furthermore, we investigated a texture removal algorithm using a track-dependent occlusion culling, an intelligent two-part caching and pre-fetching scheme to improve the performance of our system. Darsa s system reports a frame rate of 3.53 FPS on an Infinite Reality Engine using position weighted blending with a small 256x256 window, while our system can achieve around 2-3 FPS with kxk resolution on a low-cost Dell Precision workstation with a 28MB Nvidia Geforce4 Titanium 46 graphics board IBR Accelerated Virtual Walkthroughs: As described before, IBR is a very efficient way to accelerate the rendering the large and complex models. It has been used in the last few years to accelerate architecture walkthroughs. Systems using IBR accelerated walkthroughs can be summarized into two categories. The first category comprises those systems using images from reference viewpoints to completely replace and represent the model of the scene. The second category of systems combine the rendering of geometric models and IBR rendering in which they use images to replace the portals, for example, distant doorways or windows, and use traditional geometric models to represent the rest of the scene. We already discussed the first category in previous sections. It is based on the idea of view interpolation, in which different views of a scene are rendered as a pre- 6

30 processing step, and intermediate views are generated by performing image warping on the source images in real time. Typical examples are QuickTime VR [Chen95], Mcmillan s warping system [McMillan95] and others. Amitabh s work [Darsa97] and our work [Yang2] [Yang3] also fall into this category. The advantage of this approach is that the system is independent of the complexity of the original model or scene. This approach, however, suffers the problems of inaccuracy at non-reference viewpoints and it also requires significant amount of pre-processing. The second category of systems only uses IBR techniques to represent the geometries seen through a portal, like a distant doorway. The rest of the model is still represented by geometric models. The simplest approach [Airey9] is to use one or more images as a 2-D texture. This will substantially reduce the rendering burden. The problem with this approach is that too many images will be needed since the portal texture is only correct from a given viewpoint. A severe popping effect can be noticed when the system switches from one texture to another. Image-based rendering techniques [Aliaga97] can be used to warp the portal textures to the current viewpoint, providing a smoother transition. However, similar problems such as gaps will appear when using a single depth IBR models because of the reason presented in the previous section. Therefore, IBR models interpolating between more than one reference view [Rafferty98] or layered depth image [Popescu98] are used to solve this problem. The disadvantages of systems using this approach is that the rendering complexity can still be a problem if the remaining part of the model is too complicated. 7

31 Our system uses IBR techniques coupled with efficient occlusion culling and preprocessing and can achieve interactive frame rates regardless of the complexity of the model and still maintain reasonable accuracy at non reference viewpoints. 8

32 2.2 Remote Visualization With today s computational power, scientific datasets tend to become fairly large, often falling into gigabytes or even terabytes range. Interactivity is very important, however, not possible for large data sets using traditional rendering techniques. Much of current research is to suggest a system that can efficiently allow the user to examine the dataset on a relatively high-resolution display interactively. To achieve this goal, remote visualization should be considered. Although the past decade saw a vast revolution in computing technology available at an affordable cost, the dramatic difference in computers used by scientists on a daily basis and the supercomputing capabilities dedicated to large-scale simulations is still expanding. It is not plausible for application scientists to effectively study their tera-bytes to peta-bytes of simulation data sets on their personal workstation or small-scale PC clusters. The need to study large-scale data sets remotely is becoming even more pressing than ever. Both infrastructure and remote visualization algorithms are involved in the cause of the current limitation. Here, we propose our understanding of both issues. Many efforts have been expended into implementing efficient remote visualization systems. The first type is that the back end server renders the dataset and only transfer images to the front end. The front end does nothing but wait for the results and display them. This type of remote visualization system is easy to implement but not very efficient in dividing the work-loads because it only fully utilizes the back end. The user at the front end, on the other hand, can only wait for 9

33 the result from the back end. If the data set is very large and complex, the user can end up waiting at the front end for minutes or even hours before he can move to another viewpoint or time step. In a more complicated type of remote visualization system, the server continues rendering the results and sends images and simplified geometries to the front end. The front end client uses this information to reconstruct the result for the user. Typical examples of this type of systems are Lawrence Berkley National Lab s Visapult system [Bethel][Bethel3], University of Utah s Semotus Visum system [Luke2] and Huang et al s remote splatting system [Huang]. Our premise is that, we pre-define or let the user choose a path that they are interested to visualization the data set and sample the viewpoints on this path. We pre-compute the imagery and geometry for these sampled viewpoints and save the partial results into a database. During runtime, whenever the front end needs some information according to the user movement, it can send the request through the network and the server will retrieve the information and send it down the network to the front end. The front end then can use the information to reconstruct the imagery whenever the user moves to new viewpoints. The rendering performance of this type of remote visualization system doesn t depend on data size or data complexity. It only depends on the user desired output image resolution. However, this approach does need a lengthy pre-processing time to build the database, especially if the scenes or datasets are large or if there are many pre- 2

34 selected viewpoints. As we discuss later, the pre-processing time can range from hours to even days to complete. However, the pre-processing is off-line and doesn t affect the run-time performance. The user can achieve interactive walkthroughs for large and complex datasets. In the next two chapters we will talk about our IBR model as well as geometry simplification, data reduction and caching/pre-fetching algorithms to achieve the interactivity for large and complex datasets. 2

35 CHAPTER 3. IMAGE-BASED MODEL AND SIMPLIFICATION: 3. Overview: The goal of this research is to interactively manage and render complex, timeconsuming renderings of large scenes. We concentrated our efforts to allow a user to roam interactively inside a scene, exploring interesting features with the ability to look around. We achieve this by restricting the user s movement to a pre-selected path and allow him or her to view the scene from any point on the path in any directions. Figure 3. illustrates the notion of moving on a pre-selected path. It shows the Nature scene [NATURE] that we obtained from the POVRAY [POVRAY] quarterly competition site with two possible tracks. The dark-arrowed curve represents the track and the red dots represent reference viewpoints that will be prerendered and saved into a database. The users are allowed to move back and forth along this track smoothly and change their viewing directions freely, providing 3 degrees of freedom. 22

Figure 3.: The pre-selected paths for our internal viewer through a model of the Nature scene. The black curve represents our pre-selected path in the dataset.

36 Figure 3.: The pre-selected paths for our internal viewer through a model of the Nature scene. The black curve represents our pre-selected path in the dataset. Red dots represent viewpoints on the path where the plenoptic function is sampled. Users can move along the curve and look around. Although some software packages allow discrete jumps from one viewpoint to another, this disturbing teleportation requires a re-orientation of the user and invalidates any sense of a virtual environment. Our system, on the other hand, provides smooth movement for the users along the pre-selected track and lets them concentrate on their data instead of trying to determine their position inside the virtual environment. In this chapter, we will first discuss our basic framework and then talk about our Image-Based model to allow the user to move smoothly from one 23

37 viewpoint to another, with minimum errors and the ability to look around. We will also exploit geometry simplification techniques for better rendering performance. 3.2 System Architecture: Figure 3.2 illustrates the basic framework of our IBR system. The system consists of a two-part pipeline. In the first part, the system extracts partial renderings using any rendering package for the selected reference views along the path of the scene. The resulting geometry and imagery is pre-processed to a more manageable format and stored on the disk of the server. In the second step, whenever a viewing client needs the data, it sends the necessary request across the network to the server and the server retrieves the related data from the database and sends it down the network to the client. Both the server and the client maintain their own cache for fast data access. The client renders the novel views using the cached data from the server. In the following sections we will discuss how to reconstruct the novel views efficiently and smoothly with minimal errors. 24

38 Pre renderer and Pre processor POVRAY VTK RADIANCE AVS... IBR Database Server side Caching Pre fetching I Network II Client Display Reconstruction Client side Interpolation Caching Using Java 3D Pre fetching Figure 3.2: System diagram a two-part pipeline is used. The first step uses different rendering engines to pre-render the datasets. The resulting geometry and imagery are pre-processed to a more manageable format and stored on a server. Whenever the client requires the data, it sends a request to the server, the server retrieves the pertinent data from the database and sends it down the network to the client. Both the server and the client maintain their own cache for fast data access. 25

39 3.3. Reconstructions for Novel Views: 3.3. Visibility Polyhedrons: Moving from one viewpoint in a complex scene to another, with the freedom of looking around, is a challenging problem. Let us assume that we want to move from a view, V, to a view, V 2, in a complex scene. We define the following terms [Preparata85]. kernel Figure 3.3: shows the definition of a kernel for polygons. Definition : The visibility polyhedron of a viewpoint is the locus of points visible from the viewpoint. Definition 2: A polyhedron P is said to be star-shaped if there exists a point z, not external to P, such that for all points p P, the line segment zp lies entirely within P. Definition 3: The locus of the points, z, possessing the property described by Definition 2 is called the kernel of P. 26

40 Theorem : The visibility polyhedron of a viewpoint is a star-shaped polyhedron and has a kernel that contains at least the viewpoint. Figure 3.3 shows the kernel of a polygon. We can see that any two points inside the kernel of a polygon or polyhedron see exactly the same scene and since all the points on the line connecting these two points are also in the kernel, they likewise see exactly the same scene. Let s assume that the kernel of V s visibility polyhedron contains V 2 and likewise for V 2. The visibility polyhedrons of the two viewpoints are therefore identical. Any new views, V *, rendered along the line segment connecting the two points are also inside that kernel, due to above definitions. Therefore, the visibility polyhedron of V * is the same as V and V 2. If we define P as the visibility vi polyhedron of the reference viewpoint V i and P * as the visibility polyhedron of the v new view V *, we can write the equivalence relationship as: P = P = P (3.) * v v v2 We can use V s or V 2 s visibility polyhedrons P and P to reconstruct the new v v2 view V * s visibility polyhedron P *. v Note that the polyhedron P s assume a continuous representation of the visibility v i polyhedrons of the viewpoints and usually are not achievable for current computers system. In practice, we use a pre-rendering phase to generate the imagery and the associated depth information. The depth values are obtained either from the z-buffer or the first intersection point with each viewing ray, depending on what software package is used. In either case, we fit an approximated polyhedron ZP to the vi 27

41 collected depth volume, which is a discrete representation of the visibility polyhedron P i. We texture-map ZP with the pre-rendered imagery and use this as the vi reconstructed scene for the reference viewpoint. Having the approximations of visibility polyhedrons for all the reference viewpoints, we investigate techniques to reconstruct the information of the in-between novel viewpoints. From Equation (3.) we see that we can use the two visibility polyhedrons of the proximate reference views to reconstruct the new view. One way to move from one reference view to another smoothly is to project the two reference view s polyhedrons ZP and ZP v v2 into the z-buffer and perform a simple z-buffer comparison. The z-buffer comparison chooses the closest z value seen in the novel view. This results in an approximation for the visibility polyhedron of V *, called ZP *. This is actually an intersection v operation. ZP = ZP ZP (3.2) * v v v2 When the two viewpoints lie inside each other s kernel, the two visibility polyhedrons ZP v and v2 ZP are identical. Z-buffer comparison therefore will not introduce any artifacts. Actually in this case the z-buffer comparison (the intersection operation) is equivalent to using the polyhedron of either reference view to represent that of the new view. This relationship is shown in Equation (3.3) ZP = ZP h ZP = ZP = ZP (3.3) v * v v2 v v 2 However when the viewpoints are not within each other s kernel, the two visibility polyhedrons are not the same and therefore visibility discrepancy occurs for the two 28

42 reference viewpoints. If we still use the z-buffer comparison, errors will occur. Consider the example in Figure 3.4. O Polyhedron for V O O 2 p p O 2 Polyhedron for V 2 p 2 V V* V 2 (a) V V * V 2 (b) Slab 2 for V 2 O Polyhedron for V Slab 2 for V O Polyhedron for V 2 p p Polyhedron for V O 2 Slab for V Slab for V 2 O 2 p 2 s s Part of the Polyhedron for Slab2 V V (c) V 2 V V * V 2 (d) Figure 3.4: (a) shows two viewpoints V and V 2 and two objects O and O 2 and the visibility polyhedra. (b) V and V 2 are not within each other s kernel and therefore their visibility polyhedrons are not the same. By z-buffer comparison p 2 is used for V *, which is incorrect. (c) When user moves from track segment s to s, p 2 suddenly changes to p to represent p which results in popping effect. By using two slabs to separate the two objects, p is rendered to the second slab of both the viewpoint. Therefore, z-buffer comparison or blending will yield correct result. 29

43 In Figure 3.4 (a) we have two viewpoints V and V 2 and two objects O and O 2. We can see that in Figure 3.4 (b), V and V 2 are not within each other s kernel and therefore the visibility polyhedrons are not the same. For this example, if we still approximate the in between view, V * s visibility polyhedron, by using z-buffer comparison between V and V 2 s visibility polyhedrons, we have problems with some viewing rays. For example, for point p in the figure, p is the depth value in V s polyhedron and p 2 is the depth value in V 2 s polyhedron. If we use z-buffer comparison, p 2 is apparently closer than p. Therefore the depth and color value of p 2 are used for V *. This is obviously incorrect. The more serious problem of using z- buffer comparison is that: even at the reference view, we may get incorrect results. For the same point p, if z-buffer comparison is used, even at reference viewpoint V, we would use p 2 s value since it is closer and wins the z-buffer comparison. Recall that one of our purposes of this research is to guarantee accurate results at reference viewpoints and therefore this is not acceptable. Another disadvantage of using z-buffer comparison is the popping effect when changing reference views. This is shown in Figure 3.4 (c). In this figure, we are moving from V 2 to V then to V. We have the track segments s and s. As discussed before, when the user moves from V 2 to V, p 2 represents p, which is incorrect. Immediately after the user moves out of s and into s, p appears in the scene. Recall again that one of our goals is to allow the user move on the rail-track smoothly. This popping effect can seriously distract the user s attention and therefore is not desired. 3

44 An alternative way to reconstruct the visibility polyhedron ZP * for V * using the v close by reference viewpoints V and V 2 s polyhedronszp and ZP is to blend the v v2 two polyhedrons. This cannot completely eliminate the errors mentioned before. However it can mitigate the problem and make the transition between the track segments smoother. Again, consider the examples shown in Figure 3.4. For the problem shown in Figure 3.4 (b), instead of using z-buffer comparison to determine whether we use p or p 2, we blend p and p 2 s information. This blending weight is determined by the distance between V * and V, V 2. In this case, since V * is closer to V, the blending weight for p is heavier. Although this does not give us the correct value, it mitigates the problem. Most importantly, when we are at V, the blending weight for p 2 is zero and therefore, we only use p. This makes sure that at the reference viewpoints, we don t have the reconstruction (visibility) errors and meets our goal of guaranteed lossless or near-lossless representation at the reference views. Moreover, it reduces the popping effects and lets the user move on the track smoothly. Consider the example in Figure 3.4 (c). At the moment we move out of track s into s, the weights for V 2 and V are zero and therefore, only p is used. This is consistent and allows smooth movement along the track. In summary, computing the correct visibility polyhedron is not feasible without the full geometric scene description. Therefore our choices for merging the views are limited. By taking correctness and smoothness into account, we decide to use blending between the two reference views, instead of a z-buffer comparison to reconstruct the novel views along the rail-track. 3

45 3.3.2 Slab Representation As described in Chapter 2, with only one depth value per pixel stored in the IBR database, when the user moves away from the reference viewpoint, previously occluded objects can appear to be visible. However, if no information is stored for these objects, holes and cracks can appear. As discussed in Chapter 2, in order to allow for occluded information at a view to be retained Mark et al s post rendering 3D warping system [Mark97] used more than one reference view to fill the holes. Their heuristic of deciding which pixel from which reference view is used is a perpixel operation in software. Layered Depth Images [Shade98], on the other hand, stores more than one layer of depth for each pixel to avoid the occlusion and disocclusion problems. Again, their system is a per-pixel based system. Although they can achieve relatively accurate results on a low-resolution display, they are not suitable for our purpose of rendering interactively on high-resolution imageries. We hope to utilize the graphics hardware to accelerate our renderings. We therefore introduce our slab representation to reduce the occlusion and dis-occlusion problems. We divide the viewing frustum into several depth ranges, by setting the near and far clipping planes. We render the part of the scene in each range and obtain the imagery and depth polyhedron in the same way as described in section We denote each polyhedron for a reference view V i and jth slab ZP (the jth polyhedron for viewpoint Vi). We call the corresponding imagery as a slab image. A binary opacity map is assigned to the slab image so that the background (empty) pixels are transparent. We map the slab image as a texture onto the slab polyhedron and 32 j v i

46 composite the slabs (image and polyhedron) in a front to back order. Therefore, if at a certain viewpoint a pixel from the front slab has the non-background value, it blocks the pixel from the later slab, which can achieve the correct scene. The relationship of a polyhedron P v, its discretized version i ZPv and the slab polyhedron i ZPv i j is shown in equation (3.4). discretize = ZP (3.4) ( Pv ) ZPv = ZP i i v + ZP i v + ZP i 2 vi 3 vin By using slabs, we can reduce the occlusion and dis-occlusion problems. Compared to Mark et al s Post Rendering 3D Warping [Mark97] and Layered Depth Images [Shade98] systems, our slab-based representation uses 2D texture mapping hardware to accelerate the renderings and therefore is much more efficient during rendering. Let s examine how we can use a slab representation to further reduce the visibility errors mentioned in the previous section. That is, when two proximate viewpoints used for reconstruction are not within each other s kernel. Let s consider the same example as in Figure 3.4 (d). Here we use two slabs to separate the two objects. Object O is rendered to Slab, while object O 2 is rendered to Slab 2. In this case, for point p, the rendered values p and p 2 for both viewpoints fall into the second slab. Essentially p and p 2 are of the same value and therefore, using a z-buffer comparison or blending them will not produce any errors. We can see from the figure that by using slabs, we can achieve more accurate results. Since we cannot use infinite number of slabs in our system, considering the efficiency issues, there are still possibilities of occlusion and dis-occlusion problems within one slab and we still suffer errors which will be discussed in a later section. 33

47 With the introduction of a slab representation, how exactly can we blend between the sets of slabs for the reference viewpoints? As introduced in Porter et al s [Porter84] paper, the blending here should be categorized into the over operation of the digital compositing. The correct way to do the blending is to use an over operation on corresponding slabs of the two views and then compute the result of slab over the result of slab 2 and so on. Result = ((slab for V ) over (slab for V 2 )) over ((slab for V ) over (slab for V 2 )) over over ((slab n for V ) over (slab n for V 2 )) (3.5) This can be implemented using a special p-buffer, which is normally not available for current implementation of Java3D engine. Instead, we compute slab for V over slab for V 2 and then compute this over slab 2 of V and so on. Result = (slab for V ) over (slab for V 2 ) over (slab for V ) over (slab for V 2 ) over over (slab n for V ) over (slab n for V 2 ) (3.6) The blending requires a weight, w, which is again based on the distance (normalized to be between and ) between the reference views. The corresponding reference slab sets have their opacities modulated by the weight factor, and then projected in an interleaved fashion in a depth sort-order from the new viewpoint. The equation for the resulting color after the slab sets are composited together: C = C * α (3.7) α * w + ( α * w) * C2 * ( w) * 2 Here C is the final computed color at the new viewpoint, C and C 2 are colors from the slab images for the two neighboring reference views, and α and α 2 are alpha values from opacity maps of the slab images. Results show that when using linear 34

48 interpolation the computed image is very close to a correct rendering when the new viewpoint is close to a reference view. However, when the new view is located at the middle between reference views, the computed image is darker and transparent looking. The reason for this is due to the fact that when the new viewpoint is near the middle of the two references, C is less than the original C or C 2, and thus appears more transparent. For example, assuming C equals C 2 and both pixels have α s of, and our weight w equals.5. Thus as an example, C =.5* C = C (3.8) +.5* C2 *(.5).75* An alternative interpolation is to use a nonlinear weighting scheme. We define a nonlinear blending equation as: C = C α * w + ( α * w) * C *( pow( w, b))* ) (3.9) * 2 α 2 Here b is a constant. Let s assume the same example as before with b equaling 3. We have a resulting color of C =.5* C = C (3.) +.5* C2 * * The two equations (3.7 and 3.9) are plotted in Figure 3.5. From the figure we can see that the non-linear compositing give us a result that is more consistent and close to. The minimum value is at about.9 compared to.75 for linear compositing. Therefore by using non-linear compositing scheme, we alleviate the dark and transparent looking problems after the compositing. In practice we choose a value of 3 for b. 35

49 In this section, we discussed our reconstruction techniques and introduced the slab representation. Next, we will talk about how to pre-process and simplify the geometry information so that the system can be efficient enough to handle interactive walkthroughs on large displays. 36

50 Comparison of Linear and Non-linear Compositing Output Color Linear Compositing Non-linear Compositing Weight w Figure 3.5. Shows the graphical plots of the linear and non-linear compositing (b = 3) equations. From the figure we can see that the non-linear compositing give us a result that is more consistent and close to. The minimum value is at about.9 compared to.75 for linear compositing. Therefore by using non-linear compositing scheme, we alleviate the dark and transparent looking problems after the compositing. 37

51 3.4 Geometric Simplification Having a discrete slab representation of the visibility polyhedron at a viewpoint, we simplify ZP v i j for better rendering performance. Here, we consider two simplification schemes: down-sampling and feature-preserved triangular mesh simplification Simple Down-sampling Down-sampling technique A slab polyhedron ZP v i j is obtained by connecting the depth value for each pixel. This is essentially a model in which each pixel is a polygon itself and therefore is way too complex for interactive rendering on large displays. We can reduce the geometric complexity by down-sampling the depth buffer of each slab into a lower resolution grid or quad-mesh. The vertices of the quad-mesh retain the calculated depth values resulting from the pre-rendering phase. For the interior points, a linear interpolation function is assumed by the graphics hardware. Figures 3.6 and 3.7 show the depth images rendered before and after down-sampling. We call this simplified slab mesh for the reference viewpoint V i ZP v i j. Compositing all the simplified slab meshes together, we get the following formula as our complete scene. discretize + ( Pv ) = ZPv QZPv = QZPv + QZPv 2 + QZPv QZPv n (3.) i i i i i i i As discussed before, a binary opacity mask is assigned to enable per-pixel occlusion for each slab so that if both the front and back slab have information from the current viewpoint, the front slab will correctly occlude the back slab. This per pixel opacity 38

52 mask can also eliminate some blocky (blurring) effects resulting from the downsampling. This is known as α -clipping or sillouette-clipping [Sander]. Figure 3.8 illustrates the depth image after α -clipping without using any slabs. Figure 3.9 and 3. represent the depth images of the first and second slabs after α -clipping. All the images here are obtained from the Castle [CASTLE] dataset. 39

53 Figure 3.6. Shows the original depth image. 4

54 Figure 3.7. Shows the depth image after down-sampling. 4

55 Figure 3.8. Shows a single α -clipped depth slab. 42

56 Figure 3.9. Shows the first slab depth image afterα -clipping 43

57 Figure 3.. Shows the second slab depth image afterα -clipping 44

58 Depth-Mesh Errors We consider two sources of errors after down-sampling and merging two reference views for any novel view. The first source of errors results from the linear interpolation of the down-sampled depth-meshes. As we discussed before, after down-sampling, only the vertices of the quad-mesh have the actual depth values obtained from the pre-rendering (Z buffer or intersections). The depth values for the interior points of the quad-mesh are calculated using a bi-linear interpolation function. For regions having high curvature (in depth), the linear interpolation introduces errors. As will be discussed in Chapter 4, we decompose the pre-rendered image into fixed sized and treat each tile as a texture map. Mapping the textures to a linear interpolated quad is different from mapping them to the highly-curved surface. The errors appear when we reconstruct either the reference views or the novel views. The down-sampling error can be reduced by using a finer quad-mesh (smaller tile size). However, decreasing the tile size increases the rendering time, loading time and storage requirements. Table and 2 show the experiments that we performed using different sizes of a quad-mesh for one POVRAY [POVRAY] rendered view of the Nature [NATURE] dataset with kxk output resolution on our Sun Blade and Dell Precision 53 systems. Here we compare the change of Mean Squared Error (MSE), the loading time, the rendering time and the storage requirement with different tile sizes. These are also plotted graphically in Figure 3. and 3.2. From the Tables and the Figures we can see that for the Sun Blade system, the MSE decreases almost linearly when we decrease the tile size. However, the loading time 45

59 increases quite dramatically with smaller tile sizes. The rendering time also increases with decreased tile size. As the tile size increases, the storage first decreases, and then increases. The decrease (from right to left in Figure 3.) at first is due to the fact that we need to store fewer depth values when we have fewer tiles. The latter increase occurs since there are fewer empty tiles to remove, that is, tiles which do not have any information. As we will discuss in the next chapter and illustrate in Figure 4., a large image can contain many empty tiles which can be removed to save storage and network transmission time. This is also why the rendering speed levels off when the tile size increases we have more empty space to rasterize. We should choose a tile size that can achieve proper frame rates and good image quality. Please note that loading time is also a very important factor because it affects the pre-fetching performance which will be addressed in Chapter 4. For our Dell Precision 53 system which has a faster processor and a better graphics card, we notice that the rendering time and loading time are not affected too much when the tile sizes are larger than 8x8. To make our system interactive on both systems, 6x6 or 32x32 tile sizes are good candidates. In practice, we have chosen a tile size of 32x32, since rendering performance is the most important issue for our system. 46

60 Loading Time(s) Rendering MSE Storage (MB) Time (s) 64x x x x x x Table 3.: shows the change of loading time, rendering speed, Mean Squared Error and Storage of a single panoramic (kx4k) with depth and 3 slabs using different tile sizes. The rendered image resolution was kxk. This is for the Nature dataset on our Sun Blade system. MSE is based on a maximum number of 256 shades for each color. Loading Time(s) Rendering MSE Storage (MB) Time (s) 64x x x x x x Table 3.2: shows the change of loading time, rendering speed, Mean Squared Error and Storage of a single panoramic (kx4k) with depth and 3 slabs using different tile sizes. The rendered image resolution was kxk. This is for the Nature dataset on our Dell Precision 53 system. MSE is based on a maximum number of 256 shades for each color. 47

61 Loading Time, FPS, MSE, Storage vs. Tile Size for Nature scene Rendering Time (s) Loading Time (s) MSE Storage Requirement (MB). 64x64 32x32 6x6 8x8 4x4 2x2 8 Tile Size Loading Time (s) Rendering Time (s) Mean Squared error Storage Requirement (MB) Figure 3.: Shows the Mean Square Error (MSE), the loading time, Frame Per Second (FPS) and the storage requirement with different tile sizes for a typical sampled viewpoint of the POVRAY rendered Nature dataset on our Sun Blade system. 48

62 Loading Time, FPS, MSE, Storage vs. Tile Size for Nature scene Renderin Time (s) Loading Time (s) MSE Storage Requirement (MB).2 64x64 32x32 6x6 8x8 4x4 2x2 8 Tile Size Loading Time (s) Rendering Time (s) Mean Squared error Storage Requirement (MB) Figure 3.2: Shows the Mean Square Error (MSE), the loading time, Frame Per Second (FPS) and the storage requirement with different tile sizes for a typical sampled viewpoint of the POVRAY rendered Nature dataset on our Dell Precision 53 system. 49

63 Visibility Errors Another source of errors result from reconstructing the novel view when we blend two close-by reference views. As described previously, if the reference views are within each other s kernel of the visibility polyhedrons, the actual polyhedrons of the two reference views coincide with each other perfectly and therefore no visibility errors should occur. However in real situations this is seldom true and these visibility errors are inevitable. Figure 3.3 shows the Mean-Squared Errors (MSE) of the resulting images for interpolated views against POVRAY rendered views at corresponding positions along our path. As we can easily discover, the closer the sampled viewpoints are, the smaller the visibility errors will be. We performed an experiment using three different sampling rates along the rail-track. To maintain the same down-sampling error for all the cases, we use the image resolution of kxk as before and keep the tile size constant at 32x32. We can see that the curve using only 2 reference viewpoints has the highest Mean-Squared Errors (MSE). This is as expected. The peak errors occur somewhere close to the middle of the two sampled viewpoints for all three cases. The peak MSE is about 7 out of 256, or approximately 6.7%. The curve which uses 5 views has the finest sampling rate and peaks out with MSE of about 4-5. From the figure, we can also see that the MSE drops dramatically at the reference viewpoints. This is due to the fact that at the reference viewpoints, there are no visibility errors and the errors solely come from down-sampling and the bi-linear interpolation that we discussed in the previous section. 5

64 Visibility Errors for Nature Dataset MSE t 2 views 3 views 5 views Figure 3.3: Mean Square Error of interpolated views against actual rendered views. Interpolated views have a tile size of 32x32 5

65 3.4.2 Feature Preserved Decimation Decimation Algorithm Simplification by down-sampling the depth mesh can result in errors. The errors are more significant at places with large depth change or high curvature, for example, silhouette edges. In order to reduce the errors, a feature-preserving meshsimplification scheme is studied. Our original mesh, as described, is a slab polyhedron constructed by connecting the depth value of every pixel. We triangulate the mesh by casting each pixel as two triangles. A mesh simplification technique is then used to simplify the triangular mesh while preserving features such as sillouette edges. There are several existing feature preserving decimation algorithms. We choose to use the Visualization Toolkit s (VTK) [VTK] DecimatePro class, since VTK is one of our pre-renderer and fits well for our purpose. VTK s decimation algorithm is based on the paper by Schroeder et al [Schroeder92]. The algorithm makes several passes over all the vertices in the mesh. During each pass, each vertex is classified either as a simple or a complex vertex. A simple vertex can be further classifed to be a boundary or an interior vertex. Simple interior vertices are selected to be deleted based on a distance to plane criteria, while simple boundary vertices are selected based on a distance to edge criteria. The resulting holes are patched by local triangulation. The algorithm results in a 52

66 topology/feature preserved simplified triangular mesh. Figure 3.4 shows the decimated result for one view of our Nature scene [NATURE]. The resulting mesh contains 32,346 triangles, which is about.3% of the original mesh (24x496x2). As a comparison, figure 3.5 shows a 6x6 quad-mesh of the same viewpoint, which has about the same number of geometric primitives and simplification rate as the triangular mesh (.3%). 53

67 Figure 3.4: shows a simplified triangular mesh using the feature-preserved decimation algorithm in VTK for one reference viewpoint. 54

68 Figure 3.5: shows a 6x6 quad-mesh is used for the same viewpoint, which has the same reduction rate as the triangular mesh. 55

69 Error Analysis Using a feature preserved triangular mesh simplification scheme will reduce the errors caused by down-sampling. In this section, we compare the errors of these two mesh simplification methods. Figure 3.6 shows the comparison of errors using a simplified triangular mesh and a 6x6 fixed size quad mesh. The two configurations have a comparable number of geometric primitives. The errors are calculated with the resulting images of our IBR interpolated views against the resulting images of the actual rendered views at corresponding viewpoints along the track segment. The IBR interpolation is performed using two reference viewpoints, V and V. We can see that at the reference viewpoints, the Mean Squared Errors are at their minimum for both the triangular mesh and the quad mesh. This is because at reference views, we only have the depth-mesh interpolation errors. The interpolation errors at the reference viewpoints are almost cut in half, from about 7.5 (out of 256, about 3%) to 4 (.6%), by using a simplified triangular mesh versus a quad-mesh. The MSE for both meshes peaks at about the middle of the track segment. 56

70 MSE comparison between Quad mesh and Triangular mesh for Nature Dataset 2 Mean Squared Error (MSE) Triangle mesh Quad mesh Figure 3.6: shows the errors using a simplified triangular mesh and a 6x6 fixed size quad mesh. The Mean Squared Errors are at their minimum for both the triangular mesh and the quad mesh at the reference views. Using triangular mesh cuts the MSE almost in half at these points. The MSE s for the in-between views are not affected that much by using different mesh simplification methods. 57

71 From the results we can see that the MSE for the in-between novel views do not improve substantially using the feature preserving simplified triangular mesh against the quad-mesh. Most of the errors for the novel views occur due to the warping and visibility errors. Figure 3.7 (a), (b) and (c) show a POVRAY rendered view, the IBR rendered view using a triangular mesh and the IBR rendered view using a quad-mesh respectively. Figure 3.7 (f) shows the difference image between (a) and (b) while figure 3.7 (g) shows the difference image between (a) and (c). Figure 3.7 (d) and (e) show enlarged images for the highlighted regions in Figure 3.7 (b) and (c). We can see that triangular mesh has a better quality visually. It does not have the blocky effects resulting from using a regular quad mesh. However, the differences between these two methods are not significant. This can be seen from the difference images and the MSE comparison. Using triangular mesh requires substantially more work when dealing with the overlayed texture maps. Since using a triangular mesh did not drastically improve the rendering quality, we will use a quad-mesh for our further studies. 58

72 (a) (b) (d) (c) (e) (f) (g) Figure 3.7 (a) Actually rendered view, (b) interpolated view using a quad mesh and (c), interpolated view using a triangular mesh. (d) and (e): the enlarged images for the red rectangular regions in (b) and (c). (f): The difference image between actually rendered view and interpolated view using a quad mesh ((a) and (b)). (g) The difference image between actually rendered view and interpolated view using a triangular mesh ((a) and (c)). 59

73 3.5 Summary In this chapter, we discussed our Image-based model for walkthroughs of large synthetic scenes. We presented the theoretical foundations of our rail-track model and described how to combine close-by reference views to reconstruct the information for the new viewpoints. We also discussed our novel approach of how to divide the scene into several depth slabs to mitigate the occlusion and dis-occlusion problems. Furthermore, we discussed two different approaches of geometry simplification, namely down-sampling and feature preserved mesh simplification scheme, to achieve better performance. After the comparison of the errors from the two approaches and considering our current implementation, we chose to use down-sampling as our mesh simplification approach. With geometry simplification, we can achieve much better performance comparing to using the original depth per pixel model. However, with large resolution output and higher density of sampling rate along the track, the resulting imagery is becoming overwhelming. In the next chapter, we present several algorithms to reduce the data for efficient storage, network transmission and even better performance. 6

74 CHAPTER 4. DATA MANAGEMENT In chapter 2, we presented our technique to reconstruct a novel view from the neighboring two reference views along the track. We also discussed depth mesh simplification and slab representation schemes for the depth function ξ. In this section, we will concentrate on algorithms of managing IBR data to further improve the storage and rendering efficiency of our system. The IBR data is pre-rendered as images and mapped to the approximated simplified slab polyhedrons as textures. The texture information is much larger and more time consuming to load and render. Consider the pre-rendered results for the Nature [NATURE] scene with kx4k for each slab with 3 slabs for one viewpoint, which is 48MB. For smooth navigation, we need to pre-fetch the textures and load them into the graphics hardware. For reference views on our track, we quickly amass data in the GB range. Without proper management and data reduction, our system would be overwhelmed with this texture data. In this chapter we will discuss data management techniques such as empty tile removal, texture binding objects reduction, caching and pre-fetching and texture removal using a track-dependent occlusion culling algorithm. We will begin with empty tile removal and describe how to limit the number of texture bindings. 6

75 4. Empty Texture Tiles Removal The images resulting from the pre-rendering can contain parts that don t have any useful information except background color. Keeping these is a waste of resources. To get rid of them, we break the resulting image into fixed size tiles according to the down-sampled quad-mesh. This can be shown in Figure 4.2 (a). For those tiles (the white ones in Figure 4.2 (a)) that contain no information, we detect and label them as empty tiles and remove them from the database. Treating each image tile as a distinct texture map easily allows for removal of these empty tiles. Figure 4. shows an example of how the empty tile removal scheme works. In the figure, the white grids represent the quad-mesh and the texture is broken into tiles and texture mapped to the quads individually. Notice that the background tiles and tiles with no information are not stored in the database. These tiles are not rendered. Even with the empty tiles removed, we can still have lots of tiles, especially when the resulting image is very large for high-resolution display. Also remember that for reconstructing a new viewpoint, we need the information from the neighboring two viewpoints and each viewpoint may have several layers of images. This further increases the number of individual texture maps. Different systems can support only limited number of texture binding units, for example, 2 6 = 64K units (Please remember that the current Java3D engine supports even fewer). For a pre-rendered image with a resolution of K 4K, 4 slabs for each viewpoint and a tile size of 6 6 for two viewpoints, we have 28K individual texture maps which exceeds the 62

76 number of texture binding units available in the system. In this case, texture thrashing can occur which reduces the performance significantly. To alleviate this problem we group individual image tiles into larger texture units for rendering. One example is shown in Figure 4.2. Consider a full image of size WxH as in Figure 4.2 (a), we divide the image into equally sized small tiles, w by h, according to the down-sampled quad-mesh. This results in each row having W / w tiles and each column having H / h tiles. We remove empty tiles and merge the remaining tiles into a larger texture map. We accomplish this by squeezing each column, removing the empty tiles (as in Figure 4.2 (b)), and linking the resulting columns of tiles into a one-dimensional tile array (as shown in Figure 4.2 (c)). This is used as a single texture unit. Since OpenGL/Java3D engine can only handle textures with the size in the power of 2, we split this -D array into several arrays and pad the last one. The reason for splitting the array is to facilitate our novel pre-fetching scheme. We want to make the indexing and loading of part of the panorama faster and easier. This will be discussed in more detail in the next section. A header file is constructed which contains pointers to where the beginning of each column in the - D array and how many non-empty tiles there are in each column. During run-time, this header file is first loaded into memory and it is fairly easy to keep track of which tile corresponds to which quad. Therefore, the corresponding texture coordinates can be generated for appropriate texture mapping. Through this mechanism, we deal with much fewer texture bindings and therefore can avoid potential texture thrashing problems. 63

77 Figure 4.: Shows an example of how the empty tile removal scheme works. The white grids represent the quad-mesh and the texture is broken into tiles and texture mapped to the quads individually. Notice that the background tiles and tiles with no information are not kept in the database. These tiles are not rendered here. 64

78 65 (b) (a) (c) Figure 4.2: (a) the original tile organization. (b) To remove empty tiles and merge the remaining into a larger texture map, we squeeze each column, removing the empty tiles. (c) We link the resulting columns of tiles into a one-dimensional tile array and use this as the texture unit.

79 4.2 Caching and Pre-fetching To reconstruct the new viewpoints on the track, the information from the two neighboring viewpoints are needed. The IBR rendering engine determines the closest two reference viewpoints on the path. When the user moves to a new path segment, requiring a new reference view, a severe latency occurs while the needed geometry and textures are read into memory. Therefore, instead of loading just two sampled views, the system needs to load several views into the memory and store them into a cache. The pre-fetching engine, which is implemented as a separate thread, continually requests texture data as the user moves along the track and stores it into the cache. The maximum number of views allowed in the cache is determined by the available memory size and the texture or panoramic image size. Our first experiment treated the whole panorama as a caching unit. It alleviates, but does not eliminate, the latency problem. When the user moves too quickly along the track, noticeable latency still occurs as the server strives to push the data to the client because loading the whole panoramic image is quite time-consuming. Keeping the whole panorama texture in the memory allows the user to smoothly look around in all directions, but requires substantial memory and network burden. By examining the minimal information needed to reconstruct a novel view, we can reduce these demands and increase our pre-fetching length. We consider two scenarios. The first case is when the user moves along the track while keeping their orientation fixed. In this case, only the information within the user s viewing 66

80 direction is needed. The second case is when the user stops on the track while examining the virtual environment. This requires more of the panoramic for the current two closest reference views. Therefore, we have the choice of pre-fetching more of each panoramic or pre-fetching partial views of more viewpoints along the track. We handle this using an adaptive two-part pre-fetching scheme one part along the track and the other part along the viewing direction. The algorithm works as followings. At the time the system starts up, we first load in the first and second viewpoints on the track. While the system is performing other tasks, like rendering and blending the scene, the pre-fetching thread tries to pre-fetch the information of the next few viewpoints along the track. We adaptively reduce the amount of the panoramic we pre-fetch for views that are farther down the track. This means that as we pre-fetch the information of the view points farther away, we prefetch less and less of the information into the cache. If at this time the user stops and looks around, we have enough information (the whole panorama for the first two viewpoints) to reconstruct the information for him/her. Or if the user decides to go along the track, we also have the information he or she needs. When the user moves out of the first track segment to the next one, the pre-fetching engine tries to load the information of the viewpoints even farther away. In the meantime, another thread tries to load in the remaining part of the panorama for the viewpoints that are already in the cache. 67

81 The whole process is illustrated in Figure 4.3. Figure 4.3 (a) defines a path with V through V 6 reference views. A track segment is defined between adjacent viewpoints. We have labeled these segments through 5. The current novel viewpoint is on segment which is between V and V 2. Reference views V 3, V 4 and V 5 are farther and farther away from the current view. Figure 4.3 (b) shows a possible pre-fetching in which the whole panorama for views V and V 2 are loaded into the cache, while only parts of the panorama (tiles) are loaded in for views V 3, V 4 and V 5. As the user moves to segment 2, the novel view lies between V 2 and V 3. We use another thread to fetch the remaining information for V 3 and pre-fetch partial information for V 6 which currently has no information in the cache. Partial texture tiles for V are discarded. This is shown in Figure 4.3 (c). This algorithm can balance the user s need for walking down the track or looking around. Which pre-fetching thread should have higher priority should be determined by the preference of the user: whether he or she wants to move along the track or spin around fast. 68

82 Segment V 4 V V 2 Segment 2 V 3 V 5 (a) V 6 V V 2 V 3 V 4 V 5 V 6 (b) V V 2 V 3 V 4 V 5 V 6 (c) Figure 4.3 (a) defines a path with V through V 6. A track segment is defined between the two closest-by viewpoints. The current novel viewpoint is on segment number which is between V and V 2. (b) The entire panorama for V and V 2 are loaded into the cache, while only parts of the panoramas (tiles) are loaded in for V 3, V 4 and V 5. And less information is actually loaded for V 5 over V 3. (c) The user moves into segment 2, the novel view lies between V 2 and V 3, we use another thread to load in the remaining information for V 3 and pre-fetch partial information for V 6 which currently has no information in the cache. 69

83 4.3 Texture Removal Using a Conservative Track-Dependent Occlusion Culling Algorithm The slab representation is used to better address the occlusion and dis-occlusion problems. As the user moves away from the reference viewpoint, previously occluded information can be rendered using later slabs. However, the problem with partitioning and pre-rendering scenes into several slabs is that it produces unnecessary information. Consider the example in Figure 4.4. In this example, we have three reference viewpoints on the track segment: V, V 2 and V 3. Objects O 2, O 3 and O 4 are occluded by Object O for V but are rendered and stored in slab 2. O 2 is visible for V 2 and O 4 is visible for V 3. Hence, when the user moves away from V towards V 2, the information stored in slab 2 of V is used to represent O 2. Likewise for O 4. However in this example, O 3 is never visible from any viewpoints along the track segment. Without occlusion culling we would still render O 3 and store the result in slab 2 as part of the texture map. This unnecessary data affects the storage cost, network transmission time and the rendering performance. A conservative track dependent occlusion-culling scheme is adopted to remove these occluded textures. 7

84 O3 O 4 O2 O V 2 V V 3 Slab Slab 2 Figure 4.4. Objects O 2, O 3 and O 4 are occluded by Object O for V and therefore are rendered and stored in the slab 2. O 2 is visible for V 2 and O 4 is visible for V 3. However O 3 is not visible from any of the viewpoints along these track segments Review of Occlusion Culling Algorithms. It is beyond the scope of this dissertation to present all the previous literatures on occlusion culling algorithms. We will concentrate on some of the most important and most related work to our occlusion culling algorithm. Rendering of very complex geometric environments can be greatly accelerated by occlusion culling algorithms. Occlusion culling is a technique which attempts to identify the visible objects of the scene and therefore reduce the number of primitives to be rendered. Efficient occlusion culling algorithms can improve the rendering speed by orders of magnitude. 7

85 Current existing occlusion culling algorithms can be categorized into two types of methods: on-the-fly methods and pre-processing methods. On-the-fly methods are those that perform the occlusion culling operations for each frame during run-time. While pre-processing methods perform the occlusion culling operations before-hand and store the visibility information. Most of the existing occlusion culling algorithms are conservative. These compute a conservative visibility set [John9] [Seth9] which includes at least all of the visible set, plus maybe some additional invisible objects. An occluded object can be classified as visible, however, a visible object can never be classified as occluded. A potentially visible set (PVS) can therefore be constructed that includes all the visible objects, plus a number of occluded objects. An efficient algorithm should perform the occlusion culling test fast enough and limit the number of non-visible objects in the PVS. Visibility culling algorithms date back to as early as Jones [Jones7] and Clark s [Clark76] work. Greene et al [Greene93] create a 2D hierarchical z-buffer, used in conjunction with a 3D octree hierarchy to accelerate the occlusion of hidden objects. Zhang et al [Zhang97] present an algorithm using a hierarchy of occlusion maps. The occlusion maps are easy to build and used to perform the overlap tests and the depth tests. The former decides whether the projection of possible occludees lie within that of the occluders and the latter determine whether the occludees are actually behind the occluders. They also approximate the visibility for regions nearly occluded such as the coverage through a dense bush. These on-the-fly algorithms are very effective. 72

86 However, they have significant computational overhead during display and cannot be simply adapted for use with pre-fetching if the model cannot fit in memory. Instead of performing the occlusion culling on-the-fly, some algorithms precompute the visibility set and save the information before the actual rendering. Examples such as Funkhouser s algorithms [Fun95] [Fun96] address the problem of data size and the treatment of disk pre-fetching and network bandwidth. Most of the earlier work is good for indoor scenes or architecture walkthroughs. Algorithms for handling more general scenes, such as [Cof98] and [Coz98] emerged these years. However, most of these algorithms concentrated on using a single convex occluder at a time. In Durand et al s work [Durand], they ateempt to handle occluder fusion problems where the occlusion culling test is based on the cumulative projection of occluders and occludees. The occlusion test is also performed against several viewpoints, which form a view cell. They introduce the concept of extended projections of occluders and occludees. The extended projection of an occluder underestimates its projection from any point in the view cell, while the extended projection of an occludee is an overestimation. Since we wish to perform the occlusion culling as a pre-processing step and since we consider the close-by reference viewpoints in addition to the current viewpoint, Durand s work [Durand] is the most suitable for us. Therefore, we adapted their algorithm to develop our Track-Dependent Occlusion Culling technique. We will discuss this in more detail in the next section. 73

87 4.3.2 Occlusion Culling Algorithm description: We call this algorithm track dependent occlusion-culling because we need to consider current and neighboring viewpoints for the algorithm. How much information is occluded depends on the sampling rate of the reference views on the pre-selected track. Durand et al [Durand] introduced an algorithm that combines multiple viewpoints into one cell. Occlusions of the objects are calculated for the whole cell. They introduced the extended projection operators. The extended projection for the occluder is the intersection of the views within the cell while the extended projection for the occludee is the union of the views within the cell. To calculate the occlusion, they just need to compare the extended projections of the occluders and occludees. We want to determine whether texture tiles in the later slabs are occluded by those in the previous slabs for both the current and the neighboring two viewpoints. Therefore in our algorithm, we consider current and neighboring viewpoints as a cell. Texture tiles of earlier slabs are possible occluders for texture tiles of the later slabs. The algorithm works as follows. For each reference viewpoint, we first build an occlusion map and fill the map with the opacity values of the projected first slab texture tiles. We treat the following slab textures in a front-to-back order, examining each tile in a slab to see whether it is occluded. The occlusion is performed by comparing the extended projections of the occluders: texture tiles from the previous slab(s) and the extended projections of the occludees: texture tiles from the later slab. Figure 4.5 (a) and (b) show the extended projections of occluder tiles and occludee tiles, with regard to the current viewpoint V and its neighboring viewpoints V 2 and V 3. If the extended projection of the occludee falls in that of the 74

88 occluder, the occludee is occluded. In practice, we first project all the occluder tiles and form an occlusion map and then convolve (average) the window in the occlusion map with the size of extended projection of the occludee. If the result is greater than a pre-defined threshold, every pixel in the convolution window has an average opacity value of greater than the threshold value and that indicates anything behind which projected to the window is occluded. For easier computation, we make several conservative simplifications. According to [Durand], for non-flat tiles, the depth of the occluder is the maximum depth of the tile while the depth of the occludee is the minimum depth of the tile. For all the occluder tiles, we chose the slab depth which is larger than any maximum tile depth as another conservative simplification. By taking the minimum depth of the occludee tile and the slab depth, we can consider them as flat tiles and therefore we have a setup as in Figure 4.5 (c). 75

89 Tile Projection Tile V V 3 V 2 V V 3 V 2 Projection (a) (b) P Tile to be tested Q Q s 2 s w h 2 d 2 h d 3 V 3 V (c) V 2 h d 2 V α V 2 β β l (d) x l Figure 4.5: (a) and (b) the extended projections for occluder and occludee tile respectively, with regard to three viewpoints on the track. (c) The occlusion culling setup after conservative simplification of depth. h is the distance from the viewpoint to slab and h 2 is the largest depth of the occludee tile. (d) how to calculate d 2 76

90 Each tile in our system has a width of w. Considering the viewing angles of V 2 and V 3, we now need to convolve (average) an extended area with a width of w + s + s 2 in the opacity map to see if the result equals. Considering the 2D case, s can be calculated using the following equation. h h 2 S = d (4.) 2 h 2 Where h is the distance from the viewpoint to slab and h 2 is the minimum depth of the occludee tile. To calculate d 2, consider Figure 4.5 (d). d l l' 2 = (4.2) Where l and = (4.3) ( Q V ) x h l ' = (4.4) tan β Thus cos ( Q V ) x Q V 2 β (4.5) = 2 A similar equation can be used to calculate s 2. If the averaged opacity value of the enlarged window is one, we mark the tile as an empty tile and do not store the geometry and color information. If the value is less than one, the tile is not occluded and we add the opacity values of this tile to the opacity map. We treat all the tiles in 77

91 one slab and continue to the next one until all the slabs are processed. The Pseudocode for the algorithm is shown in Figure 4.6. As pointed out in [Mueller99_2][Zhang97], we can use a method similar toα -acceleration, which lowers the opacity threshold to less than one to cull the tiles more efficiently without degrading the quality of the rendering result too much. occlusionculling( ) for (each slab S i ) if (S i = ) initialize the opacity map else calculate s and s2 for (each tile T i ) calculate the average opacity value a if (a = ) Mark the tile as empty tile else Update the opacity map endif end for endif end for Figure 4.6: shows the pseudo-code for our view dependent occlusion culling algorithm Results and Discussions We tested our occlusion culling algorithm on the castle [CASTLE] dataset. The scene is partitioned into three slabs. After culling, we can reduce the information in the second slab by 77% from 6.69 MB to.54 MB. The storage requirement for the third slab is reduced by 72% from 3.9 MB to.92 MB. This is without the α - 78

92 acceleration. With α set to.9, the reduction rates are 8% and 77% for the second and the third slab. The reduction rates reach 82% and almost 9% when the α value is set to.8. This can be shown in Table 4.. The rendering results of different α values are shown in Figure 4.7, 4.8 and 4.9. From the figure we can see that the rendering quality doesn t degrade too much if we set a reasonable α value. The results show that the track dependent occlusion culling is quite efficient for this dataset. It can reduce the storage requirement, decrease the network transmission time and increase the pre-fetching efficiency and improve the rendering performance. Another benefit is that it can reduce the rendering cost/overhead that is caused by increasing the number of slabs. More slabs can address the occlusion/dis-occlusion problem better. Using the occlusion culling technique, less information will be left for later slabs after culling. Therefore, increasing the number of slabs does not affect the rendering speed too much. The efficiency of the algorithm is highly dataset-dependent. For the Nature dataset [NATURE] we tested in which the scene is more open and therefore our slab representation does not have too much unnecessary information in the first place, we can only cull about 2 percent without α -acceleration and % with α -acceleration. This is shown in Table

93 Slab Slab 2 Slab 3 Before After Reduction Before After Reduction Before After Reduction α =. α =.9 α =.8 Tiles % % Size % % (MB) Tiles % % Size % % (MB) Tiles % % Size (MB) % % Table 4.. Reduction Rates for Track Dependent Occlusion Culling using different α values for Castle Dataset Slab Slab 2 Slab 3 Before After Reduction Before After Reduction Before After Reduction α =. α =.9 α =.8 Tiles % Size %.3.3 (MB) Tiles % % Size %.3.2 % (MB) Tiles % % Size (MB) %.3. 2% Table 4.2. Reduction Rates for Track Dependent Occlusion Culling using different α values for Nature Dataset 8

94 Figure 4.7. Shows the rendering images with track dependent occlusion culling with α values equal to 8

95 Figure 4.8. Shows the rendering images with track dependent occlusion culling with α values equal to.9. Notice the artifacts indicated by the red 82

96 Figure 4.9. Shows the rendering images with track dependent occlusion culling with α values equal to.8. Notice the artifacts indicated by the red pen. 83

97 4.4 Summary In this chapter, we presented several ways to reduce the database as well as our novel caching and pre-fetching scheme to speed up the rendering. The database reduction algorithms include an empty-tile removal scheme, a texture-binding algorithm and a texture removal algorithm using our new track-dependent occlusion culling method. After applying these simplification methods, we can reduce the database by a factor of up to. Our novel caching and pre-fetching scheme consider the user movement along the track as well as the user s panoramic viewing. This is implemented as a two-pass pre-fetching algorithm. Using this approach, we mitigate the latency problems originally occurred when the simple caching and pre-fetching scheme was used. In the next chapter, we will present the testing results of our system along with conclusions and future research approaches. 84

98 CHAPTER 5. RESULTS, DISCUSSION AND FUTURE WORK Our system was implemented in Java/Java3D, allowing the front-end to be any platforms (e.g., Microsoft Windows-based PCs, Linux-based PCs or Unix-based workstations). A user should have the ability to use our system on any of these platforms without modifying the codes. Java3D [JAVA3D] is a 3D API for Java programming language. It is an API based on top of OpenGL or DirectX and therefore can fully utilize the graphics hardware to accelerate the renderings. 85

99 We tested our system on 4 datasets. The first one is a virtual scene called Nature [NATURE] rendered using Povray [POVRAY]. The scene is moderately complex, containing trees, bushes, rocks and birds. It takes about 2 minutes to render one frame on our 2GHZ Dell Precision 53 workstation using POVRAY. One path was chosen for this scene with 2 sampled viewpoints along the track. Three panoramic layers with a resolution of 24x496 per layer were pre-computed for each view sample. The total size for the database after pre-processing is 22MB. Without empty tile removal and track-dependent occlusion culling it would require over.2gb. The geometry and imagery was broken up into 32x32 quads. Our second dataset is a LOX post dataset which Visualization Toolkit (VTK) [VTK] provides. This dataset simulates the flow of liquid oxygen across a flat plate with a cylindrical post perpendicular to the flow. It contains both scalar and vector fields in the data. A rendering was chosen with the post, a slice plane and several stream-polygons. One path was pre-selected going into the stream-polygon region with 23 view samples. Four panoramic layers with a resolution of 52x248 per layer were pre-computed for each view sample. The reference image database was pre-rendered using VTK and the total size for the image database was reduced to 38MB. The geometry and imagery was broken up into 6x6 quads. This database required 9.2 hours to prerender. We also tested our system on two other Povray scenes, namely the Castle [CASTLE] and the Night [NIGHT] datasets obtained from the same source as the Nature dataset. The Castle dataset is quite complicated, with about 6, objects in the scene. It took our 2GHZ Dell workstation about one hour to render each frame. 86

100 For both datasets, we chose 5 viewpoints on a path and each viewpoint has 3 layers with a resolution of 24x496 per layer. It took our Dell workstation 3 days to finish pre-rendering the whole database. The Night dataset demonstrated our ability to deal with datasets that have complicated lighting conditions. The projective texture map along with soft shadows took more of the rendering time. However, we should note that as our system is based on IBR techniques, it is not suitable to handle viewdependent features such as reflection, refraction or specular highlights. We have tested our IBR viewer on two platforms. On the Sun Blade workstation with dual Ultra-Sparc III 75MHZ, GB of memory and an Elite 3D graphics card, we achieve 5 frames per second for the 3 POVRAY scenes at a kxk rendering resolution and 2 frames per second for the LOX dataset with a 52x52 rendering resolution. On our Dell Precision 53 workstation with a 2GHZ Xeon processor and 2GB of memory, a 28MB Nvidia Geforce4 Titanium 46 video card, we achieve above 3 frames per second for all four datasets. Figure 5., 5.2, 5.3 and 5.4 shows a resulting image for each of the four datasets. Please note that these images are interpolated images generated at the middle of the two reference viewpoint which means that they are the images with highest MSE. 87

101 Figure 5.: Shows the rendering results for LOX dataset. 88

102 Figure 5.2: Shows the rendering results for Castle dataset. 89

103 Figure 5.3: Shows the rendering results for Night dataset. 9

104 Figure 5.4: Shows the rendering results for Nature dataset (interpolated) 9

105 Figure 5.5: Shows the Povray-rendered result for Nature dataset 92

106 Figure 5.6 Shows the difference image between Figure 5.4 and

A Panoramic Walkthrough System with Occlusion Culling

Yang/Crawfis: A Panoramic Walkthroughs System with Occlusion Culling A Panoramic Walkthrough System with Occlusion Culling Lining Yang, Roger Crawfis Department of Computer and Information Science The