RDTC Optimized Streaming for Remote Browsing in Image-Based Scene Representations

TC Optimized Streaming for Remote Browsing in Image-Based Scene Representations Ingo Bauermann, Yang Peng and Eckehard Steinbach Institute of Communication Networks, Media Technology Group, Technische Universität München, Munich, Germany {ingo.bauermann,yang.peng,eckehard.steinbach}@tum.de Abstract Remote navigation in compressed image-based scene representations requires random access to arbitrary parts of the reference image data to recompose virtual views. The degree of inter-frame dependencies exploited during compression has an impact on the effort needed to access reference images and delimits the rate distortion () trade-off that can be achieved. If, additionally, a given receiver hardware and a maximum available channel bitrate is taken into account, the traditional rate-distortion optimization is extended to an TC trade-off between rate (R), distortion (D), transmission data rate (T), and decoding complexity (C). In this work we introduce our TC optimized compression framework. In addition, an experimental testbed for streaming of these TC optimally compressed image-based scene representations is described and the impact of client side caching is investigated. Our results show a significant reduction in user perceived delay for TC optimized streams in such a remote browsing environment compared to optimized or independently encoded scene representations. *. Introduction Remote interactive viewing of photorealistic 3D scenes has many applications in virtual reality, gaming, virtual museums and E-commerce. Image-based scene representations like light fields [,2], concentric mosaics [3], panoramas, and others (see [4] for an overview) allow for a fast and easy acquisition and rendering compared to the traditional tedious and time consuming geometric modeling process. Based on sampling of the plenoptic function [5], the downside of image-based rendering approaches is the large amount of reference image data that has to be stored. With recent advances in image and video compression, efficient compression schemes for image-based scenes have emerged also (again, see e.g. [4]). This work was supported, in part, by Siemens AG. Compression Server (Source) Navigation Internet Partial view DSL WLAN UMTS Clients (Decoder) Figure. Considered scenario: An image-based scene representation is acquired, compressed, and stored on a server. Clients with different network access and computational resources are used to navigate in the acquired virtual scene. Rate-distortion optimization has been well studied for video compression in the recent years. But, while for video sequences sequential play out is dominant and therefore temporal dependency structures are known a priori, free real-time navigation in image-based scenes requires random access to arbitrary single frames or even image parts, and therefore does not allow for exploiting all dependencies that are present in image sequences. Additionally, in a remote navigation scenario and with heterogeneous computational capabilities of user devices and different bitrate access (see Figure ) there are strict requirements on the decoding complexity and transmission data rate. A low channel throughput of some few kilobits per second and at the same time a high computational power available at the client side allow for more dependencies to be exploited than a handheld device with the same available bitrate. In both scenarios a highly adapted stream would be preferable as, for instance, too many dependencies would force the handheld device to download too much data that has to be decoded with too much computational effort. Independent coding of reference images that are needed to generate virtual views might be the best solution here. In order to solve the challenges arising with such

heterogeneous systems a technique can be used that has been adopted from video coding: real-time scheduling (e.g. [7]). A priori information is used by the server to decide which part of a compressed representation should be sent to meet the scenario and the clients requirements. Unfortunately, for browsing in compressed image-based scene representations only little a priori information can be exploited as the motion trajectory and therefore the dependency structure within the reconstructed virtual views presented to the user are not known. Online scheduling would be very complex and so the most intuitive solution is to provide different, precoded datasets that are chosen according to the momentary system parameters like throughput or available computational resources. The goal of this work is to investigate a compression scheme that allows for maximizing reconstruction quality and minimizing user-perceived delay for a single stream representation taking into account the available computational resources at the decoder and the available channel throughput as well as the server side representation file size. Additionally, the impact of a client side cache is investigated. A streaming testbed is implemented to measure the performance of streams optimized subject to constraints on several parameters. The remainder of this paper is structured as follows. In Chapter 2 we discuss related work. In Chapter 3 we introduce TC optimization and the proposed coding framework. Chapter 4 discusses the influence of client side caching on the whole system while Chapter 5 describes the streaming testbed implemented. Chapter 6 shows experimental results followed by Chapter 7 concluding this paper. 2. Related Work There has been much work done on compression and streaming of image-based scenes. Vector quantization [2,3], MPEG-2 like encoding [6], geometry aided motion compensation [8], and many others (see [4] for an overview). All of them were trying to trade off random access complexity and coding efficiency, where coding efficiency is almost always measured in terms of server side file size rather than the actual number of bits transmitted per. Additionally, most of these schemes maximize PSNR subject to a rate constraint or a fixed dependency structure. Rate-distortion optimization for streaming of image-based scene representations even over error prone networks has been investigated in [9]. A major drawback of this scheme is that the moving trajectory is restricted to be on a hemisphere. This simplifies dependency structure exploitation but restricts the applicability to browsing of small objects rather than rooms or even outdoor scenes. The investigations on multiple representation coding for light fields in [] share the same restrictions. Both schemes show a very good performance but do not consider the client s computational resources at all. For multimedia content delivery a rate-distortioncomplexity (C) optimization framework has been proposed in []. Investigations on the C trade-off for a block-based coder for image-based scenes providing random access and evaluating operational rate-distortion-complexity curves have been done in [2]. In this paper, we introduce TC optimization [3] with a client side caching model and measure the performance of the TC streaming system using a testbed implementation. 3. TC Optimization Traditionally, image and video compression and streaming of video sequences have been studied within the rate-distortion theory framework. However, when encoding or decoding has to be done in real-time, algorithms have to be investigated with respect to computational complexity. Additionally, in the context of remote navigation a desired client-server round trip time allowing immersive user interaction severely constrains the dependency structure of the compressed data. 3.. Measures The measures used in this work to parameterize the TC space are defined as follows: - Rate (R). R is the mean number of bits required to store a s RGB values on the server (file size). - Distortion (D). D is defined as the mean of squared differences (MSE) of RGB intensity values measured using the difference between original intensity values and intensity values reconstructed from the compressed bitstream. - Transmission data rate (T). Transmission data rate T is defined as the mean number of bits that have to be signaled to completely decode a. The transmission data rate T is a measure for user-perceived delay in remote navigation and can be significantly larger than R as dependencies might have to be resolved. - Decoding complexity (C). The decoding complexity C of a given is the mean number of independently encoded s that have to be decoded to reconstruct the current completely. Though our definition of the decoding complexity is rather rough, the number of inverse DCT-transformations gives a good measure of the true complexity that the stream 2

implements. Investigations on the mapping from a specific hardware to the decoding complexity C is part of other work, e.g. []. 3.2. The Coding Framework The basic building blocks of the codec investigated in this work are the discrete cosine transform (DCT) and motion compensated prediction (MCP) performed on 8x8 blocks. An example Group of Pictures (GOP) structure for a camera trajectory along a single path (e.g. an image sequence produced by a concentric mosaic setup) and some example block modes are shown in Figure 2. Although the presented concept applies to all image-based scene representations, we use concentric mosaics [3] in this paper as an example data representation. frame intra... Figure 2. D GOP structure with example block modes for an odd number (n) of images. The anchor frame is encoded independently. 3. Block Modes skip/inter intra anchor frame...... n /2 GOP of n images anchor-inter/ anchor-skip skip/inter The block modes used in our coding framework are adopted from [2] and based on common hybrid video codecs. TC measures for each block to be encoded are calculated as follows: R = [depends on content and q] [ D = [depends on content and q] T = R+ Dependencies + [signaling overhead] [ C = M + Dependencies [ ] The rate R and distortion D have to be determined by experiment and depend on the content of a specific block and on the quantization parameter q. The transmission data rate T differs from R as dependencies might have to be resolved (referenced blocks) and signaling overhead has to be taken into account. The decoding complexity C depends on the block mode decision (M= for, INTER, and ANCHOR-INTER, otherwise M=). Dependencies might be resolved which increases the decoding complexity of a specific block in a chosen mode. This recursive structure helps to control the amount of dependencies within a bitstream.... n 3.4. Lagrangian Optimization Once the rate, distortion, transmission data rate and decoding complexity of a given block in a given mode are known, the encoder can perform an TC optimization. While encoding, each block in each frame of the image data is encoded in all available modes and then the mode decision mechanism goes through each block and minimizes the Lagrangian cost function: J = D+λ R+ λ T +λ C () q 2 3 Here, λ, λ 2, λ 3 are parameters that are used to select the trade-off between rate R, distortion D, transmission data rate T, and decoding complexity C. The subscript q indicates that this cost is computed for a specific quantization parameter q. As illustrated in Figure 3, for a specific scenario this means that if the client s computational resources C max and the maximum available transmission data rate T max are not allowed to be exceeded then for a specific distortion D, a traditional optimized encoder would produce some uncontrolled TC trade-off minimizing the rate R (not shown in the Figure for easier presentation). Though the rate is minimal with respect to D the TC trade-off is not applicable because too many dependencies might have to be resolved for one random access to the bitstream. On the other hand independent encoding of all blocks () would not exploit all resources because available computational power is not used for decoding in this case. The lower bound for TC-trade-offs drawn in Figure 3 can be argued as follows: Exploiting redundancies between neighboring frames decreases the transmission data rate T but increases C as more and more dependencies are exploited. An experimentally determined curve for the TC lower bound will be shown in Chapter 6. Decoding Complexity C C max lower TC bound operational TC-points T max not applicable Transmission Data Rate T Not optimized (MSE = D ) Figure 3. Analysis of a common encoder in the TC space given a distortion D. T max and C max are maximum resources determined by the channel and the client hardware. Only uncontrolled TC trade-offs are produced by common coders. A lower TC bound can not be under-run for D. 3

Independent encoding of frames will lead to a feasible TC trade-off in a streaming scenario, though, wasting computational resources. Decoding Complexity C C max Figure 4. In order to optimally use the available resources, a coding scheme has to be adopted that allows for shifting the lower TC bound to intersect with both the maximum available transmission rate and the maximum allowed decoding complexity while achieving a minimum distortion D 2 < D of a reconstructed view. To optimally utilize the available resources the lower bound has to be moved so that it intersects with T max and C max while decreasing the distortion D as shown in Figure 4. For a minimum distortion D 2 < D the optimal working TC point can be reached. This is done by searching for a quadruple (q, λ, λ 2, λ 3 ) that - used as weights for a block-based minimization using () - produces a stream with the desired TC trade-off for a given scenario. 4. Caching For remote interactive navigation, caching parts of the bitstream or even of uncompressed reference blocks at the client side is very efficient. We distinguish three cases discussed in the following sections. 4.. No Cache If there is no cache present at the client side, TC optimization degrades to an T or C optimization as C and T are proportional in this case [3]. From the perspective of TC optimization there is no point in choosing something other than the -mode as otherwise dependencies have to be resolved. 4.2. Bitstream Caching T max Optimized (MSE = D 2 < D ) optimal TC-working-point lower TC bound not applicable Transmission Data Rate T If there is a cache present at the client side which holds the already transmitted and received bitstream, the TC optimization framework can be applied. If the bitstream of an infinite number of blocks can be cached and every block is requested exactly once as the user moves through the virtual environment, the mean transmission data rate is reduced compared to the case with no cache as no dependencies have to be considered (every block is transmitted exactly once). The same model applies if the mean recursion level for INTER- and SKIP-modes is less than or equal to the number of frames that can be cached. Figure 5 illustrates this for the case of maximum recursion level of 2. The TC measures in this case are updated as follows: R = [depends on content and q] [ D = [depends on content and q] T = R [ C = M + Dependencies [ ] Note that the decoding complexity is still computed as if there is no cache implemented because only the bitstream is cached. The resulting model is referred to as the B-Cache model in the remainder of this paper. time scene object dependencies S Figure 5. Caching of two frames and a mean recursion level of 2. A scene object moves through the blocks. No matter what access pattern is performed, if the cache can hold at least two frames, every block has to be transmitted only once. In the case of bitstream caching, the mean transmission data rate T does not depend on the access pattern (similar to video coding) while the momentary decoding complexity C and transmission data rate T for a single block access strongly depends on the access pattern. For domain caching both C and T are independent of the access pattern. 4. Pixel Domain Caching I previous frame (recursion level 2) previous frame (recursion level ) current frame If the transmitted and decoded images or blocks are cached at the client side, both the transmission data rate T and the decoding complexity C are reduced compared to the case with no cache or bitstream caching as fewer recursions in the computation of Lagrangian costs have to be considered. A simple example is given in Figure 5 to show how the domain caching influences the transmission data rate and decoding complexity for a specific bitstream. In our example the -block I is referenced exactly twice (in the previous frame with recursion level ). Its transmission data rate T I =R I and decoding complexity C I = are computed without any dependencies. Assuming S 2 S 3 S 4 4

SKIP-modes for all blocks except the -block in our example, we compute T S,S2,S3 =R I and C S,S2,S3 =C I =, T S4 =2 R I and C S4 =2 C I =2 (one for S and S 2 each). For a system with no cache the total decoding complexity and transmission data rate when decoding the two blocks in the current frame sum up to C sum =6 and T sum =6 R I. Coding all the blocks in -mode would be less expensive and therefore the optimal way of encoding for such a target system. When there is a cache at the client side with at least the capacity to store as many decoded RGB images as the mean number of recursions within the bitstream, then, whenever S 4 is requested by the client, S, S 2, and I are requested in turn. After transmission and decoding these blocks are available at the client side as long as they stay in cache. The same procedure would apply to all other dependent blocks. Now, assuming that each block is requested at least once during a decoding pass and each block is requested equally likely the mean decoding complexity and transmission data rate of each block reduce to C sum = and T sum = R I. The first block of this small group of blocks requested will have a decoding complexity of and a transmission data rate of R I. All succeeding requests do not force the server to transmit the -block anymore, vanishing the momentary decoding complexity and transmission data rate. An exact implementation of this model would mean to globally optimize the whole dataset using e.g. a markov random field model and is part of future work. In this work, to approximately reflect the behavior of the cache in an iterative frame-by-frame optimization, the TC measures are updated as follows: performs the TC optimized encoding of a dataset and provides a compressed single representation bitstream to the server. During operation of the testbed the user chooses a view using the keyboard for translational movement and the mouse for rotational movement. The chosen view is signaled by the client to the server which assembles the bitstream required to decode and display the requested view. The B-Index (for the client s bitstream-cache) and P-Index (for the client s -domain-cache tables) provide the information to the server that a block is already present in the client s cache or not. This prevents the server to transmit a compressed block twice if it is still present in the client s cache. A first-in-first-out replacement strategy is used in both caching systems. The assembled bitstream is transmitted to the client using the channel simulator. The client decodes the compressed data and its cache is updated. Finally the view is displayed to the user. Parameters that can be adjusted in the testbed include: Encoder: q, λ, λ 2, λ 3 ; Server: processing time (delay); Client: cache sizes; decoding capacity; Channel Simulator: maximum available bitrate, delay; Renderer: sample-rate, interpolation filter; The testbed can generate random views and rigid motion trajectories and also can capture motion performed by a user. These trajectories are then used to emulate user input to perform tests with varying system settings. R = [depends on content and q] [ D = [depends on content and q] T = R+ Dependencies [ KT C = M + Dependencies [ ] KC Here we weight the decoding complexity and transmission data rate dependencies using KT=KC=3 for a mean recursion level of approximately 2. In the example from Figure 5 C sum =2 and T sum =2 T I. In Chapter 6 we show results using this simplified model. This model will be referred to as P-Cache model in the remainder of this paper. 5. The Streaming Testbed To evaluate the performance of the proposed system a streaming testbed for concentric mosaics is implemented. The testbed consists of an encoder, a server and a client, a channel simulator, a renderer, and a caching system. Figure 6 illustrates the components of the system. The encoder Encoder Server B-Index P-Index compressed data blocks, columns, views,... Channel Simulator Renderer Client (decoder) B-Cache P-Cache Figure 6. Testbed for streaming of TC optimized streams. The encoder provides an TC optimized stream to the server. The client requests blocks, columns or views and decodes the assembled and transmitted compressed data. A caching system for RGB blocks (P-Cache) and the encoded bitstream (B-Cache) is implemented. The server uses index tables (B-Index and P-Index) to keep track of the clients cache state. A channel simulator allows to adjust the maximum available bitrate and a fixed delay. The client also 5

simulates the decoding capacity of a specific machine by limiting the decoding throughput. 6. Experimental Results In this section we evaluate the results of the TC optimization in the TC space using our caching model. We show results using an -only coded stream, an -optimized stream, and an TC optimized stream for remote navigation in a concentric mosaic dataset using our streaming testbed. The test dataset classroom used in our experiments consists of a normal concentric mosaic captured with a camera radius of meters. 525 frames at CIF resolution with a FOV of approximately 4 degrees are captured. Figure 7 shows an example frame of the dataset which is partitioned into GOPs of size 25 frames with an anchor frame (I-frame) in the middle (see Figure 2). Figure 7. One frame of the test sequence classroom. The concentric mosaic consists of 525 frames in CIF resolution. 6.. TC Analysis Actual TC trade-offs for different streams are shown in Figure 8. For a specific distortion D (PSNR=4dB), the resulting transmission data rate T and decoding complexity C, as well as rate R of different optimization strategies are illustrated in the TC space. 6.. TC Analysis Using the B-Cache Model While the traditional optimization gives the lowest rate and transmission data rate (assuming an infinitely large bitstream cache), its TC trade-off might not be applicable in some remote scenarios as the decoding complexity is too high. Same applies for -only encoded streams that might not be optimal when high computational power is available. Two TC trade-offs of TC-optimized streams are shown in Figure 8. TC 2 has a complexity of.ppp which is almost as low as for -only encoded streams but also shows rate savings of 3%. For TC there is a reduction of decoding complexity of a factor of 2 compared to optimized streams while the rate increases only about 2%. 6..2 TC Analysis Using the P-Cache Model Using the P-Cache model to analyze an TC optimized stream gives a decoding complexity of ppp for the optimized stream. Note that this model only holds when every block is decoded at least once and an infinitely large cache is used. For the TC stream a transmission data rate of.4bpp and a decoding complexity of.5ppp are determined by the P-Cache model. This working point will be used in Chapter 6.2. as we found it to reflect a feasible trade off for adaptation to random view requests (measured more accurately by the B-Cache Model with.8ppp) and dependent view requests like translational movement and rotation (modeled more accurately by the P-Cache Model) performed by a human operator navigating through a virtual environment. C [/] P-Cache Model.4. B-Cache Model 4 2.. Figure 8. Analysis in the TC space given a fixed distortion D (PSNR=4dB). For systems with no cache is the optimal choice. However, optimized streams produce the lowest rate R. The TC optimization proposed here allows for TC trade-offs between -only encoded bitstreams and an optimization for a better random access performance. 6.2. System Performance R=2bpp TC R=8bpp.5 TC 2 R=.7bpp R=.bpp..5 T [bit/] We investigate the performance of our streaming system with respect to different modes of movement and the available domain cache size. The bitstream cache is switched off. For our experiments with the streaming testbed the -only encoded and -optimized streams will be used as comparison to the TC optimized stream TC in Figure 8. A target system capable of decoding million s per second (Mpps) and a maximum channel bitrate of Mbps is chosen accordingly. This target system is approximately equivalent to a Pentium III 8MHz with DSLMbps Internet connection in the actual state of implementation. Four different scenarios are defined, and several motion trajectories for each of them are generated and used for the 6

performance evaluation. The four modes of motion are: Random Views, Rotation, Translation, and trajectories that would appear in a first person 3D game. The random views trajectory consists of 5 random positions and random viewing directions within the concentric mosaic testset. Five of these trajectories are generated. The mean delay introduced by interactively transmitting these 25 views is measured before we change the domain cache size and re-run the experiment. Figure 9 shows the result. Note that performs best when the domain cache is switched off. Already with a small cache TC optimized streaming outperforms encoding (with only a third of the overall file size). optimized encoding has always the worst performance. The same kind of experiments are performed for the remaining modes of motion and the results are shown in Figures -2. Delay [ms] 8 6 4 Figure 9. Mean user-perceived delay for random views as a function of the domain cache size at 4dB PSNR, a maximum bitrate of Mbps, and a maximum available decoding complexity of Mpps. With almost no cache available performs best as expected. Already with a small cache size the TC optimized stream outperforms coding. 6. Impact of the Caching System TC 2 4 5 6 7 [ b ytes ] 3 3 [ images ] With a cache size of about.2 times the image size a significant gain in terms of reduced delay can be observed for and TC. This is the cache size where the first blocks within the decoding process of a column can be reused. There is another significant gain when the cache size is larger than 3-4 images. This is when blocks between neighboring virtual views are shared via caching. Note that approximately the number of blocks equivalent to 3 images have to be decoded to display one virtual view. The reason for this is that in our concentric mosaic streaming system single columns are used for view interpolation but only blocks can be requested and transmitted. This overhead has a direct impact on the system performance. For a domain cache size of about MB or 3 images there is no decrease of the user-perceived delay anymore. A mean delay of less than 2ms for a gamer trajectory can be achieved. TC and share the same gains over (about 9ms). Looking at the worst case delay for the gamer trajectory (mainly due to an empty cache before streaming the first view of each trajectory) in Figure 3 we can see that TC optimized streams clearly outperform and encoding even for approximately the same mean delay for all three streams (compare to Figure 2). Finally Figure 4 shows the mean delay as a function of the maximum available channel bitrate and the maximum allowable decoding complexity of a system using a MB domain cache and a MB bitstream cache. For widely used machines (e.g. Pentium IV 2GHz) and bit rates of about.2mbps a delay less than ms introduced by the system can be achieved with our testset at a PSNR of 4dB (calculated in the RGB-space). 2 4 5 6 7 [ bytes ] 3 3 [ images] Figure. Mean user-perceived delay for pure rotation as a function of the domain cache size. Delay [ms] Delay [ ms ] 6 4 8 6 4 2 4 5 6 7 [ bytes ] 3 3 [ images ] Figure. Mean user-perceived delay for pure translation as a function of the domain cache size. Note the significant gains of and TC optimal streams compared to an coded stream for larger cache sizes. Delay [ ms ] 8 6 4 2 TC TC TC 4 5 6 7 [bytes] 3 3 [images] Figure 2. Mean user-perceived delay for a trajectory consisting of translation and rotation as it can be found in first person 3D games. 7

8. References Delay [ms] [] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. The Lumigraph. in Proc. SIGGRAPH 96, pp. 43 54, 996. [2] M. Levoy and P. Hanrahan. Light Field Rendering. in Proc. SIGGRAPH 96, pp. 3 42, 996. [3] H.-Y. Shum, L.-W. He. Rendering with Concentric Mosaics. ACM SIGGRAPH 99, Los Angeles, CA, Aug. 999, pp.299 36. [4] H.-Y. Shum, S.B. Kang, and S.-C. Chan. Survey of Image-Based Representations and Compression Techniques. IEEE Transactions on Circuits and Systems for Video Technology, pp. 2 37, Volume: 3, Issue:, Nov. 23. [5] E. H. Adelson and J. Bergen. The Plenoptic Function and the Elements of Early Vision. Computational Models of Visual Processing, pp. 3 2, MIT Press, Cambridge, MA, 99. [6] C. Zhang and J. Li. On the Compression and Streaming of Concentric Mosaic Data for Free Wondering in a Realistic Environment over the Internet. IEEE Trans. on Multimedia. pp. 7-82, Volume:7 Issue: 6, Dec. 25 [7] P. Ramanathan and B. Girod. Receiver-Driven Rate-Distortion Optimized Streaming of Light Fields. IEEE International Conference on Image Processing, ICIP 25, Genoa, Italy, September 25. [8] M. Magnor, P. Ramanathan, and B. Girod. Multiview Coding for Image-Based Rendering using 3-D Scene Geometry. IEEE Transactions on Circuits and Systems for Video Technology, vol. 3, no., pp. 92-6, Nov. 23. [9] P. Ramanathan, M. Kalman, and B. Girod. Rate-Distortion Optimized Interactive Light Field Streaming. To appear in IEEE Transactions on Multimedia. [] P. Ramanathan and B. Girod. Random Access for Compressed Light Fields Using Multiple Representations. Proc. IEEE International Workshop on Multimedia Signal Processing, MMSP 24, Siena, Italy, September 24. [] M. van der Schaar and Y. Andreopoulos. Rate-Distortion-Complexity Modeling for Network and Receiver Aware Adaptation. IEEE Transactions on Multimedia, vol. 7, no. 3, pp. 47 48, Jun. 25. [2] E. Vutukuri, I. Bauermann, and E. Steinbach. Decoding Complexity-Constrained Rate-Distortion Optimization for the Compression of Concentric Mosaics. Picture Coding Symposium, PCS 24, San Francisco, CA, December 24. [3] I. Bauermann, Y. Peng, and E. Steinbach. Receiver- and Channel-adaptive Compression for Remote Browsing of Image-Based Scene Representations. Picture Coding Symposium, PCS 26, Beijing, China, April 26 4 2 8 6 TC 4 2 4 5 3 6 7 [bytes ] 3 [ images] Figure 3. Maximum user-perceived delay for a trajectory consisting of translation and rotation as it can be found in first person 3D games. Delay [ms] 6 5 4 3 2 2 4 4 2 8 8 6 6 Max. Decoding Complexity [kpps] 32 32 Max. Available Bitrate [ kbps] Figure 4. Rate-Complexity-Delay performance of the proposed coding scheme using a MB cache in the and bitstream domain. 7. Conclusion In this paper we investigated a rate, distortion, transmission data rate, decoding complexity (TC) optimization for the encoding of image-based scene representations. A streaming testbed is implemented and used to measure the impact of a caching system on the overall performance. Results on TC optimized compression of concentric mosaics and domain caching for streaming of concentric mosaics show that an adaptation to both the client computational resources and the available transmission data rate can be used to significantly decrease the user perceived delay in a remote navigation scenario. 8