Predictive Point-Cloud Compression

Predictive Point-Cloud Compression S. Gumhold Z. Karni M. Isenburg H.-P. Seidel SMT-CGV Computer Science TU Dresden 01062 Dresden, Germany sg30@mail.inf.tu-dresden.de Computer Graphics Group Max-Planck-Institut für Informatik 66123 Saarbrücken, Germany karni@mpi-inf.mpg.de Computer Science Division UC Berkeley Berkeley, CA, 94704-1776 isenburg@cs.berkeley.edu Computer Graphics Group Max-Planck-Institut für Informatik 66123 Saarbrücken, Germany hpseidel@mpi-inf.mpg.de 1 Abstract Point clouds have recently become a popular alternative to polygonal meshes for representing threedimensional geometric models. With modern scanning technologies producing larger and larger amounts of point data, it is beneficial to have compact representations for storage and transmission of such data. We present a novel predictive method for single-rate compression of 3D models represented as point-clouds. Our method constructs a prediction tree that specifies which previously encoded points are used for predicting the position of the next point. We use a greedy point-bypoint construction of the tree that tries to minimize the prediction error. The results show that our approach can compete with other schemes for compressing point positions. Because our method can be adapted for streaming, out-of-core operation, we can compress arbitrary large point sets using only moderate memory resources. 2 Introduction Three-dimensional graphics has become an integral part of everyday multimedia content. Three-dimensional data is used in the entertainment industry for interactive games and movie production, as well as in scientific applications for visualization of simulation results or sensoracquired data. The dominant representation for three-dimensional models is the polygon mesh, mostly due to its native support in modern graphics chips. Polygonal meshes are defined by geometric information that specifies a position for each vertex together with connectivity information that specifies how to connect the vertices into polygons. Additional properties such as color and normal are used in rendering to increase the model's photorealistic appearance. Figure 1: The chameleon point-cloud model together with the prediction-tree Until recently, three-dimensional models were mostly generated using dedicated modeling software. This was a tedious task that often required mathematical and engineering expertise together with artistic talent. The development of three-dimensional photography techniques and scanning systems has introduced a simpler and more intuitive way to create digital representations of real-world objects. The scanning is done by sampling the surface of an object and results in a set of points embedded in threedimensional space - a point-cloud. Depending on the scanning method, other attributes such as colors and surface normals can also be acquired for every point. Usually the acquired point-cloud needs to be processed further to obtain a representation that is suitable for rendering or some other task. A standard processing task is the reconstruction of the surface that the points were sampled from - usually in form of a polygon mesh. For visualization purposes alone it is possible to avoid the complex task of reconstructing a surface by rendering directly with points. For non-surface-like models such as trees this is sometimes the only possible way to display the data.

Today's three-dimensional scanning technology can generate huge amount of points and often require several pre-processing steps to remove noise originated in the sampling process, to align scans from different directions, or to reduce the complexity through re-sampling. A survey of recent developments in the field of points acquisition, point processing, and surface reconstruction can be found in [1,2,8]. Since modern acquisition techniques produce larger and larger amounts of points, dedicated compression schemes have become necessary to compactly store and to efficiently transmit the generated point data. In the following paper we present a technique for efficiently compressing the positions of points in point-cloud based models. Our method compresses point positions in a spatially predefined sequential order that supports single resolution, streaming encoding and decoding. It is based on the construction of a prediction-tree that establishes neighborhood relations between the points that allows us to predict the position of a new point based on its already decoded neighbors using a simple prediction rule. 3 Previous Work Compression of three-dimensional mesh models has been extensively studied in the past few years and many compression schemes were developed (the reader is advised to refer to the survey by Alliez and Gotsman [3] for more details). Most methods aim to compress the mesh underlying structure - geometry and connectivity; a few also include attributes such as normal and color. Among the variety of methods, [5,11,12] combines an efficient traversal order on the connectivity graph together with a good geometry prediction. The prediction rule (e.g., the parallelogram rule used in [12]) positions a new vertex based on its connectivity neighbors already positioned. These methods are known to be efficient both in their running time and in the compression rates they achieve. However, they cannot be applied to point-based models, due to their need for a connectivity structure. Several methods for compressing point clouds have already been proposed. Devillers and Gandoin [4] quantized the points into q-bits for coordinates and built a kd-tree by recursively subdividing the tight bounding box around them. In each of the subdivision steps the number of points in one of the subdivision cells is coded using log 2 (n+1) bits when n is the total number of points in the subdivided box. They gave an upper bound of Q- log 2 n+2.402 bits per point for encoding the point positions (Q is the quantization number of bits per coordinate and n is the total number of points). They also report heuristics for predicting the number of points in a cell that in practice further reduces the code length. We found that this method tends to give the overall best compression rates for encoding only the positions of points. It is not clear whether a straight-forward generalization of their scheme to include per point normal and color information would achieve a similar overall efficiency. Peng and Kuo [10] suggested a similar method that encodes the occupancy of an adaptive oct-tree structure. Each subdivision stage creates eight new cells and eight bits are used to specify whether each cell is empty or not. They used several more bits to optimize for cases when is only one point in the cell and in cases when the bit-stream contains more 1's than 0's. They require fewer bits than the kd-tree approach of Devillers and Gandoin [4] for decoding a point cloud at low resolution levels, but require more bits than the kd-tree approach for decoding the full resolution model. In addition, their method has no obvious generalization to include normal and color information. Waschbüsch et al. [13] proposed a multi-resolution approach that exploits the fact that the points lie on a surface. Their method nicely generalizes to include point attributes such as normals and colors. Similar to the progressive-mesh approach [6], each pair is replaced by its centroid to form the next coarser level. The code is the base level, which includes significantly fewer points than the original as well as details to split each point into its ancestors. For the point positions, the splitting details are given in a local reference plane that passes through the centroid point as the radius from it, the azimuth angle around the plane's normal and the altitude along it. For the point normals, spherical coordinates are used to specify the difference between the average normal in the centroid point and each of its higher level parents. Colors are encoded in their difference from the average color of the centroid point projected to the YUV color space. When considering only the positions of points, this method is less efficient than the kd-tree approach [4]. However, this is the only method known to us that includes attribute details and gives a real multi-resolution reconstruction. 4 Predictive Point-Cloud Coding Our method compresses the points in a spatially, single resolution sequential order. Starting from a seed point, the position of each new point is predicated from previously processed points. Unlike in mesh compression, where such prediction techniques are commonly used, we do not have connectivity information to determine the neighboring points that best predict a new one. It is, in general, not possible to permute the points into a sequential order such that each point could simply use its predecessors for prediction. Thus we augment the data by a prediction tree a spanning tree over the vertices. 4.1 Encoding Order Currently, the entire set of points is ordered prior to compression. Our method checks for the best result achieved by ordering along each of the spatial axes (i.e., sorting along the x, y and z axes) or sorting along approximated geodesic distances. Nonetheless, our method does not guarantee that the points will be encoded and decoded in that same order, as points are re-ordered locally during the greedy construction of the prediction tree.

4.2 Prediction Tree Inspired by Kronrod and Gotsman [9], we build a prediction tree that minimizes the residuals, i.e. the lengths of the corrective vectors. Instead of solving a global optimization problem we construct the tree in a sequential build order. After initializing the tree to the first point, each successive point is greedily attached to the node that predicts the new point with the smallest residual. Let T be the prediction tree and let p i be the next point to be encoded (the tree already contains point p 1 to p i-1 ). The prediction tree specifies for the new point p i, which of the tree's vertices p j 1 j i-1 best predicts its position. The best prediction is the minimum "distance" between the point p i and the predicted point p ĩ (see section 4.3). The point p i is added to the prediction tree as a child of p j. 4.3 Prediction Rule (a) (b) The user can choose between two prediction rules: constant or linear prediction. In constant prediction the new point p i is predicted from the parent point p j alone, by placing the predicted point p ĩ in the same position as p j (p ĩ = p j ). In linear prediction, the new point is predicted based on its two generation ancestors its parent and grandparent: p i = 2p j - p k (p k is the parent of p j ). (c) 4.4 Encoding After the user has chosen a prediction rule and potentially reordered the points into a suitable order, the points are added one by one to the prediction tree at the node with the closest prediction. The insertion position is efficiently found with the help of a dynamically updated binary space partition. The constructed tree is encoded in a breadth first order. The valence is encoded for each node. As most of the valences are 0, 1 or 2 arithmetic coding allows compressing the valences to less than two bits per vertex. The vertices position is encoded via the correction vectors r i = p i - p ĩ from the predicted position p i to the original position p i. The vectors r i are represented in the global coordinate frame of the model. After quantization of the correction vectors the integer valued coordinates are split into packages of four bits. Each package is sent to an adaptive arithmetic coder. In case of the streaming compression described in section 4.6, the construction process of the prediction tree is interleaved with the breadth first encoding traversal. Every time the in-core buffer flows over, the next node in the breadth first encoding order is encoded and removed from the prediction tree. 4.5 Decoding In the decoding process the prediction tree is re-build in the same breadth first order as was used during encoding. Every time a new node is encountered during breadth (d) Figu Figure 3: 2: (a) (a) The Stanford bunny model, rendered as as p point-cloud. (b-d) The Stanford bunny, the chameleon and and the maple01the models maple01 prediction-tree. models prediction-tree first traversal, a corrective vector is decoded and added to the position predicted from the parent node in order to recover the original position. 4.6 Streaming Point-based models, especially those that were created by digital scanning, can easily reach the size that prohibits in-core processing. Recently a streaming approach to geometry processing has been proposed [7], which starts

processing as soon as some of the data was read into a small buffer. It is important to regularly stream-out (delete) data to make room for streaming in new data and yet maintain a small memory footprint throughout processing. By specifying a buffer size, our compression method will store a portion of the prediction-tree that fits into the buffer. When a new point is requested to stream-in, the oldest point or the point that was most rarely used is streamed-out. During the decoding, the buffer holds the points required for prediction of new points and can be used for various types of processing. If we first have to globally re-order the points to bring them into a special order (i.e. sorting them spatially along an axis) then our compressor is not truly streaming, as it requires to re-order the data in an additional pass, so that compression cannot begin until all the point data is available. A truly streaming compressor can compress data in whichever order it happens to be in and it can start compressing it even before all the data is available (e.g. gzip is a streaming all-purpose compressor). For predictive compression of point clouds in a streaming manner we do not need a globally ordered point set. As long as there is sufficient neighborhood in memory around each point at the time at which this point is compressed we can construct a reasonable prediction tree. Obviously we cannot compress points that are streaming in at random (unless we would use a buffer large enough to hold all points). But really large point sets often exhibit strong local coherence in the spatial order of their points-- -especially if the process that originally created the point cloud was also subject to memory constraints. Compression performance is affected by the point order and the amount of memory that we allow our compressor to consume (e.g. the size of the buffer). When all good predictions for a point have already been streamedout by the time this point is reached it will be predicted poorly. However, streaming compression is highly IOefficient and eliminates the need for pre-processing. Future research has to further explore the trade-off in resource use and bit-rate when compressing huge datasets. 5 Results We implemented the non-streaming approach of our method and compared our results to the bit-rates that are reported in [4] (KD) and [13] (PC). For the latter we use software from the authors. Table 1 compares the resulting compression rates in bits per point (bpp) after 12 bit quantization. The column n gives the number of points, pred the used prediction rule, tree the bpp used to encode the tree, geod the total bpp for the geodesic build order including the cost for the tree and min x/y/z the lowest total bpp among the axis aligned build orders. Compression times are about 20 seconds per 100k points and decompression is 500k points per second on a 2GHz PC. Point clouds from three-dimension scans and resampling techniques (lower part of table) have a fine-scale structure that is exploited by our technique resulting in significantly better compression rates even without the use of local coordinate systems proposed in [13]. model n pred tree geod min x/y/z KD PC fandisk 6k lin 1.24 13.04 19.94z 12.1 20.69 horse 20k lin 1.80 17.41 17.43y 16.4 21.22 grass14 29k con 1.54 19.88 19.74z 18.8 21.76 maple01 45k con 1.57 18.11 17.94x 16.9 16.31 bunny 35k lin 1.15 14.48 14.31x 14.8 18.52 santa 75k lin 1.23 12.23 13.17y - 18.28 chameleon 102k lin 1.33 6.23 6.35z - 16.47 igea 134k lin 1.16 10.83 10.90z - 14.28 male 148k lin 0.89 7.29 7.26y - 13.59 Table 1: Comparison of compression rates in bits per point (bpp) after uniform quantization to 12 bits per coordinate, i.e. 36 bpp uncompressed data. 6 Summary and Future Work We present a single-rate compression scheme for the positions of points in a point-cloud model. The compression is achieved by exploiting the correlation between corrective vectors that are the difference between the predicted and the real position of a point. Since we do not have the connectivity structure available in polygonal meshes, our method greedily constructs a prediction tree that orders the points and determines which points to use for the prediction. Depending on the model, the points order and the prediction scheme our method gives bit-rates similar to the state of the art methods. Although other method can perform slightly better they globally reorder the points making it impractical to compress larger models. By limiting the number of points kept in memory and re-ordering the points only locally, we will be able to do streaming compression and encode huge point-clouds that cannot be processed in core potentially on the fly at creation time without ever having to store the uncompressed data. Another goal as future work is to integrate common point attributes such as color and surface normals into the encoding and to find good prediction rules for them. REFERENCES [1] M. Alexa, "Survey Acquisition and Reconstruction", AIM@Shape State-of-the-Art-Report, 2004. [2] M. Alexa, M. Gross, M. Pauly, H. Pfister, M. Stamminger, M. Zwicker, "Point-Based Computer Graphics", SIGGRAPH 04 Course Notes, 2004. [3] P. Alliez, C. Gotsman, "Recent Advances in Compression of 3D Meshes", Proceedings of Symposium on Multiresolution in Geometric Modeling, 2003.

[4] O. Devillers, P. M. Gandoin, "Geometric Compression for Interactive Transmission", Proceedings of IEEE Visualization, 2000, pp. 319-326. [5] S. Gumhold, W. Starßer, "Real Time Compression of Triangle Mesh Connectivity", Proc. ACM SIG- GRAPH, 1998, pp. 133-140. [6] H. Hoppe, "Progressive Meshes", Proceedings of SIGGRAPH, 1996, pp. 99-108. [7] M. Isenburg, P. Lindstrom, "Streaming Meshes", Proceedings of Visualization, 2005, pp. 231-238. [8] L. Kobbelt, M. Botsch, "A Survey of Point-Based Techniques", Computer & Graphics, Vol 28(4), 2004, pp. 801-814. [9] B. Kronrod, C. Gotsman, "Optimized compression of triangle mesh geometry using prediction trees", Proceedings of 3DPVT, 2002, pp. 602-608. [10] J. Peng, C.-C. Kuo, "Octree Based Progressive Geometry Encoder", Proceedings of SPIE, 2003, pp. 301-311. [11] J. Rossignac, "EdgeBreaker: Connectivity Compression of Triangle Meshes", IEEE Transaction on Visualization and Computer Graphics, Vol 5(1), 1999, pp. 47-61. [12] C. Touma, C. Gotsman, "Triangle Mesh Compression", Proceedings of Graphics Interface, 1998, pp. 26-34. [13] M. Waschbüsch, M. Gross, F. Eberhard, E. Lamboray, S. Würmlin, "Progressive Compression of Point- Sampled Models", Eurographics Symposium on Point Based Graphics, 2004, pp. 95-102.