On Transform Coding Tools Under Development For VP10

On Transform Coding Tools Under Development For VP10 Sarah Parker *, Yue Chen *, Jingning Han *, Zoe Liu *, Debargha Mukherjee *, Hui Su *, Yongzhe Wang *, Jim Bankoski *, Shunyao Li + * Email: { sarahparker, yuec, jingning, zoeliu, debargha, huisu, yongzhe, jimbankoski } @google.com + lishunyaothu@gmail.com * Google, Inc., 1600 Amphitheatre Parkway, Mountain View, CA, USA 94043. + University of California, Santa Barbara, CA 93106. ABSTRACT Google started the WebM Project in 2010 to develop open source, royalty free video codecs designed specifically for media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube, and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition codec, VP10, that achieves at least a generational improvement in coding efficiency over VP9. Starting from VP9, a set of new experimental coding tools have already been added to VP10 to achieve decent coding gains. Subsequently, Google joined a consortium of major tech companies called the Alliance for Open Media to jointly develop a new codec AV1. As a result, the VP10 effort is largely expected to merge with AV1. In this paper, we focus primarily on new tools in VP10 that improve coding of the prediction residue using transform coding techniques. Specifically, we describe tools that increase the flexibility of available transforms, allowing the codec to handle a more diverse range or residue structures. Results are presented on a standard test set. Keywords: video coding, VP8, VP9, VP10, webm, H.264, HEVC, prediction, motion, transform, transform, DCT, DST, Identity. 1. INTRODUCTION Google embarked on the WebM project [1] to develop open source, royalty unencumbered video codecs for the Web. The first codec released as part of the project was called VP8 [2] and is still used extensively in Google Hangouts. The next edition of the codec, entitled VP9 [3][4], was released in mid 2013 and is the current generation codec from the WebM project. It achieves a coding efficiency similar to the latest video codec from MPEG entitled HEVC [5]. VP9 has found huge success with adoption by YouTube, and has delivered big improvements to the YouTube service in terms of quality of experience metrics such as watch time and mean time to rebuffer over the primary format H.264/AVC [6]. Specifically, VP9 streams delivered by YouTube today are not only 30 40% more compact than corresponding H.264/AVC streams but are also somewhat higher in quality. Consequently, even with predominant software decoding on compatible browsers Chrome, Firefox, Opera on potent devices, the number of VP9 videos viewed daily by YouTube users today is in the order of billions. As VP9 hardware decoders become more readily available on mobile devices we expect the proliferation of VP9 to accelerate even more. Even though the gains achieved with VP9 are tangible and significant, the continued growth in online video consumption has made the need for efficient video coding increasingly critical. The WebM project has been focusing on developing the next generation video codec VP10 [7] since 2014, and modest gains in coding efficiency have already been achieved. In 2015, Google joined a consortium of major tech companies called the Alliance for Open Media to jointly develop a Applications of Digital Image Processing XXXIX, edited by Andrew G. Tescher, Proc. of SPIE Vol. 9971, 997119 2016 SPIE CCC code: 0277-786X/16/$18 doi: 10.1117/12.2239105 Proc. of SPIE Vol. 9971 997119-1

new royalty free codec to be named AV1. The plan is to propose the experimental tools developed in VP10 to the AV1 process in course of time. In this paper we primarily focus on the tools developed for VP10. Though improvements in prediction modes can successfully decrease the prediction error, more than half of the bitrate in modern video codecs is still spent coding the residual. In this paper we discuss the new transform coding tools that have been added in VP10 to improve the coding of the residue. First, we discuss the super transform, which allows the application of one large transform to a predictor created by combining several prediction blocks using overlapped block motion compensation. Next, we discuss two extensions to our transform sizes: recursive transform units and rectangular transforms. Finally, we discuss an expanded bank of transform types available to Intra and Inter prediction blocks. Overall, we find that increasing the flexibility of available transforms allows VP10 to better handle a wide range of residue structures and leads to a significant reduction in. 2. VP9 TRANSFORM CODING FRAMEWORK VP9 has already made great strides towards devolping effective transform coding tools. In the current implementation, a recursive block partitioning scheme is used to break up each 64x64 superblock into a partition tree of smaller prediction blocks. Each prediction block can be encoded using either an Intra or Inter mode. Both Intra and Intra modes use square transforms of sizes less than or equal to that of the prediction block. Additionally, side information is provided for transform size. The structure of Intra and Inter prediction residues tend to differ in several respects, and thus require different transform coding methods. First, VP9 aims to tailor the transform type towards the most likely energy distribution produced by Intra and Inter prediction modes. Intra prediction tends to become less accurate as you move farther from the prediction border, producing residue with a higher energy concentration on one side. In these instances, an asymetrical transform such as ADST is most appropriate. VP9 offers a choice between ADST and DCT in both the horizontal and vertical directions for Intra modes, providing a total set of 4 mode dependent transform types: or explicitly for horizontal vertical pairs: {DCT, ADST} horizontal x {DCT, ADST} vertical DCT DCT, DCT ADST, ADST DCT, ADST ADST. The structure of Inter mode residue is less easily defined, and DCT DCT is the only transform option for Inter prediction blocks in VP9. VP9 also handles prediction differently for Intra and Inter modes. When the transform size is smaller than the prediction block size, Intra coded blocks use a recursive prediction and transform to produce a full reconstruction, allowing the next transform block to use the reconstruction as a better predictor. This process is not necessary for Inter blocks since they use regions of previously reconstructed frames as predictors. 3. TRANSFORM TOOL ENHANCEMENTS IN VP10 VP10 seeks to build upon the previously mentioned transform tools in VP9, and introduce a richer and more flexible set of available transforms for both Intra and Inter prediction modes. This secion provides an overview of all new transform coding tools currently under exploration. SUPER TRANSFORMS VP9 uses a recursive block partitioning scheme for the purpose of prediction; however, the transform used to code the prediction residue of a prediction block is restricted to be of a size no larger than the prediction block itself. VP10 attempts to remove this restriction for Inter modes by allowing transform blocks to span across multiple prediction blocks. Specifically, at any level of the partition tree, the syntax can optionally indicate that a single large transform will be used at that level, irrespective of how fine the partition tree may be below that level. Fig. 1 shows an example of a Proc. of SPIE Vol. 9971 997119-2

partition tree with two super transform blocks indicating that the prediction residue will be coded jointly with a large transform at these sizes. Fig. 1. Partition tree with super transform blocks Through our investigations, we found that a simple juxtaposition of prediction residues from different prediction blocks to create one large final predictor is often non ideal. Instead, super transform creates a new predictor based on a recursive application of overlapped block motion compensation [8]. In particular, prediction residues from smallest blocks within the super transform tree are aggregated together with overlapped block motion compensation successively in a recursive fashion, until the final predictor bubbles up to the super transform level. Note that predictors at each level need to be extended by a width equivalent to the width of the smoothing filter across prediction boundaries. RECURSIVE TRANSFORM UNITS VP9 provides a wide range of available transform sizes, but each prediction block is limited to selecting only one of these. In VP10, we remove this constraint and allow any Inter prediction block to use several different transform sizes. Transforms within a single prediction block may now have recursive tree structured partitions. A simple 2 way partition quadtree with only square split types is used to produce these recursive units. We have found that this size flexibility allows finer targeting of high energy regions in the resdual signal. Fig. 2 illustrates the available partition types in the 2 way quadtree, as well as an example of a final transform partition tree within a single prediction block. Fig. 2. Prediction block residue with recursively partitioned transform units using a 2 way partition tree RECTANGULAR TRANSFORMS In VP9, we were restricted to a set of transform sizes that are always square. In VP10, we expanded our transform sizes for Inter modes to include rectangular transforms that can be 4x8, 8x4, 8x16, 16x8, 16x32, or 32x16. Rectanguar transforms are currently only available to rectanglar prediction blocks and are always the same size as the prediction block. Proc. of SPIE Vol. 9971 997119-3

Fig. 3. Rectangular transform units within a superblock EXTENDED TRANSFORM TYPES To code Inter prediction residues, VP9 exclusively uses DCT of different sizes namely 4x4, 8x8, 16x16 and 32x32; however, for coding of Intra prediction residues, a richer set of transforms that include a hybrid combination of DCTs and Asymmetric DSTs (ADST) are used [9][10][11]. Intra prediction residues are likely to be smaller near the boundaries from where they are predicted. As such, the asymmetric DST is better suited to code it. Specifically, VP9 uses DST IV, which is an approximation to the original ADST [9], but with a faster butterfly implementation [11]. For ease of exposition, we still refer to this transform as the ADST. In VP9, for each Intra predicted block size, 4x4, 8x8 and 16x16, up to four different separable 2D transforms may be used: DCT DCT, DCT ADST, ADST DCT, ADST ADST, where each transform pair listed denote the horizontal and vertical transforms of a separable 2D implementation respectively. For VP10, we are exploring a richer set of transforms for coding Inter and Intra prediction residues. Inter prediction residues do not have a well defined structure as in the Intra case, but we have found that using a bank of transforms, each adapted to a specific type of residue profile within the block, is generally helpful. In VP10, we use not only the ADST (DST IV) but also flipped version of the ADST (FlipADST) that applies ADST in reverse order. Further, an identity transform (IDTX) is now available, which seems to be particularly useful for coding residue with sharp lines and edges. Previously, we experimented with a symmetric DST, namely DST II, but found the identity transform to be more beneficial for coding efficiency. Finally, both Inter and Intra modes continue to make use of DCT. Thus, for each coded block, we can choose to use one of up to 16 different transforms as follows: or explicitly for horizontal vertical pairs: {DCT, ADST, FlipADST, IDTX} horizontal x {DCT, ADST, FlipADST, IDTX} vertical DCT DCT, DCT ADST, ADST DCT, ADST ADST, DCT FlipADST, FlipADST DCT, FlipADST FlipADST, ADST FlipADST, FlipADST ADST, DST DST, IDTX DCT, DCT IDTX, IDTX ADST, ADST IDTX, IDTX FlipADST, FlipADST IDTX, IDTX IDTX. As block sizes get larger, some of these transforms begin to act similarly. Thus, a reduced set of transforms is used for 16x16, 32x32 and 64x64 block sizes. In the transform selection process for Inter and Intra modes, the encoder does a search over the entire set of transforms and selects the one that produces the best rd cost. Once a transform is selected, a transform type symbol from the set of types available at that size is used to indicate the actual transform used in the bitstream. Note that the one dimensional transforms DCT IDTX or IDTX DCT in the list above are similar in spirit to directional transforms [12] or 1 D trasnforms [13] in the literature. However, we chose two use only two directions horizontal and vertical, since these seem to be the minimal set that provide the best gains. Also, note that IDTX IDTX is equivalent to transform skip, which yields substantial benefit for screen content. While the multiple transforms do not add any decoding complexity since all transform sizes and types are explictly signaled, there is significant added complexity needed on the encoder side to make the best rd based decision by searching over the set of available transform types. We are currently experimenting with methods to mitigate this Proc. of SPIE Vol. 9971 997119-4

complexity. Specifically, we are exploring classification schemes based on simple features derived from the residue signal, to prune out transform types from the rd search set. In particular, one classifier is trained to prune out either DCT or IDTX, and a second classifier is trained to prune out either ADST or FlipADST in each direction. The DCT vs IDTX classifier relies on features comprised of horizontal and vertical neighboring pixel correlation in the residual, while the ADST vs. FlipADST classifier relies on features comprised of the energy distribution in various regions of the residual signal. We continue to explore different methods to reduce the added encoder complexity burden produced by this expanded transform set. 4. CODING RESULTS To evaluate our new tools, we performed a controlled bitrate test using 3 different video sets: lowres, which includes 40 videos of CIF resolution, midres, wich includes 30 videos of 480p and 360p resolution, and hdres, which contains 38 videos at 720p and 1080p resolution. where we code 150 frames of each video with a single keyframe. The coding results are shown in Tables 1 3 below. For quality metrics we use average sequence PSNR and SSIM [14] computed by the arithmetic average of the combined PSNRs and SSIMs respectively for each frame. Combined PSNR for each frame is computed from the combined MSE of the Y, Cb and Cr components. In other words: MSE combined = [4MSE y + MSE Cb + MSE Cr ]/6, assuming 4:2:0 sampling PSNR combined = min ( 10log 10 (255 2 / MSE combined ), 100 ) SSIM for each component in each frame is computed by averaging the SSIM scores computed without applying a windowing function over 8x8 windows for each component. Combined SSIM for the frame is computed from the SSIMs of the Y, Cb and Cr components as follows: SSIM combined = 0.8 SSIM y + 0.1 (SSIM Cb + SSIM Cr ) To compare RD curves obtained by two codecs we use a modified [15] metric that uses piecewise cubic Hermite polynomial interpolation ( pchip) on the the rate distortion points before integrating the difference over a fine grid using the trapezoid method. The OVERALL number at the bottom is the arithmetic average of the numbers over all the videos in the same column. The is computed separately based on the average sequence PSNR and SSIM metrics as computed above. For all the tables below, we use a slightly modified version of VP9 as the baseline, referred to as VP9+ for ease of exposition, which was also the starting point of the AV1 codec. VP9+ is better than VP9 by about 0.6% becasue it already incorporates multiple explicit transforms for INTER and INTRA with the set of four original VP9 transforms as described in Section 2. Specifically, all the results below are generared on the nextgenv2 branch of the libvpx repository, where the configurations tested are as follows: VP9+ baseline : enable av1 [very similar to the AV1 baseline codec] Extended Transform Set: enable av1 enable experimental enable ext tx Extended Transform Set + Rectangular Transforms: enable av1 enable experimental enable ext tx enable rect tx Super Transform: enable av1 enable experimental enable supertx Proc. of SPIE Vol. 9971 997119-5

All New transform tools: enable av1 enable experimental enable supertx enable ext tx enable rect tx At the time of writing of this paper we found some bugs in the Recursive Transform Units tool and so the results for those are excluded. Table 1. VP10 results on lowres set (VP9+ baseline) Extended Transform Set Extended Transform Set + Rectangular Transforms Super Transform All New Transform Tools Video akiyo_cif.y4m 2.656% 2.024% 3.074% 1.833% 1.099% 0.868% 3.971% 3.424% basketballpass_240p.y4m 2.982% 4.09% 3.818% 4.622% 0.664% 1.232% 4.235% 5.248% blowingbubbles_240p.y4m 2.078% 1.681% 3.051% 3.028% 1.433% 1.731% 3.94% 4.205% blowing_cif.y4m 1.889% 1.737% 4.15% 4.523% 0.703% 0.34% 4.152% 4.588% bqsquare_240p.y4m 3.409% 3.579% 3.53% 3.619% 0.979% 0.359% 4.505% 4.409% bridge_close_cif.y4m 4.016% 4.434% 4.396% 5.059% 0.354% 0.619% 3.645% 4.052% bridge_far_cif.y4m 3.286% 2.959% 3.393% 3.276% 0.807% 0.688% 3.182% 3.044% bus_cif.y4m 3.018% 2.339% 3.854% 3.322% 1.504% 1.597% 4.842% 4.464% cheer_sif.y4m 2.669% 3.209% 3.225% 4.174% 0.441% 0.605% 3.409% 3.821% city_cif.y4m 2.853% 2.31% 3.393% 2.988% 1.667% 1.326% 4.664% 3.865% coastguard_cif.y4m 2.344% 2.212% 3.286% 3.414% 0.801% 0.932% 3.726% 4.074% container_cif.y4m 2.497% 1.685% 3.141% 2.351% 0.639% 0.871% 3.389% 2.844% crew_cif.y4m 1.72% 1.545% 3.158% 3.19% 0.732% 0.547% 3.274% 3.093% deadline_cif.y4m 3.566% 3.382% 4.713% 4.826% 0.893% 0.78% 5.18% 5.523% flower_cif.y4m 2.836% 2.881% 3.724% 4.66% 2.071% 2.41% 4.95% 6.117% flowervase_240p.y4m 2.728% 2.314% 4.07% 4.038% 1.803% 1.369% 5.007% 4.641% football_cif.y4m 1.657% 1.19% 2.44% 2.048% 0.44% 0.38% 2.605% 2.043% foreman_cif.y4m 2.829% 2.707% 3.679% 3.665% 1.119% 1.182% 4.404% 4.744% garden_sif.y4m 2.659% 2.823% 3.07% 3.632% 1.394% 1.436% 4.058% 4.993% hallmonitor_cif.y4m 1.519% 0.442% 2.389% 1.625% 0.58% 0.58% 2.014% 1.505% harbour_cif.y4m 3.499% 3.048% 4.776% 4.835% 0.89% 0.931% 5.092% 5.282% highway_cif.y4m 1.653% 1.227% 2.454% 1.945% 0.612% 1.302% 1.63% 1.486% husky_cif.y4m 3.637% 3.366% 4.189% 4.271% 0.751% 0.576% 4.704% 4.827% ice_cif.y4m 3.303% 3.652% 3.526% 4.109% 0.611% 0.247% 3.897% 4.31% keiba_240p.y4m 2.737% 1.984% 3.558% 3.034% 0.433% 0.432% 3.504% 2.702% mobile_cif.y4m 3.149% 3.659% 3.434% 4.44% 1.341% 1.49% 4.508% 5.435% mobisode2_240p.y4m 3.136% 2.78% 4.726% 4.661% 0.884% 0.669% 5.081% 5.285% motherdaughter_cif.y4m 1.78% 2.056% 2.57% 2.678% 1.437% 1.753% 3.429% 3.578% news_cif.y4m 3.454% 3.627% 4.236% 4.518% 0.862% 0.907% 4.859% 5.549% pamphlet_cif.y4m 2.426% 2.228% 3.353% 3.546% 0.382% 0.155% 3.374% 3.21% Proc. of SPIE Vol. 9971 997119-6

paris_cif.y4m 3.838% 3.452% 4.621% 4.644% 0.786% 0.754% 4.949% 4.751% racehorses_240p.y4m 1.847% 1.645% 2.458% 2.494% 0.694% 0.529% 2.691% 2.576% signirene_cif.y4m 2.018% 1.884% 2.725% 2.65% 1.014% 0.728% 3.656% 3.553% silent_cif.y4m 2.473% 2.18% 3.309% 2.983% 1.186% 1.487% 4.037% 4.039% soccer_cif.y4m 3.419% 2.947% 4.354% 3.953% 0.696% 0.848% 4.495% 4.354% stefan_sif.y4m 4.658% 5.264% 5.393% 6.513% 1.079% 0.942% 5.884% 6.671% students_cif.y4m 2.815% 2.649% 3.55% 3.485% 1.217% 1.62% 4.468% 4.629% tempete_cif.y4m 3.923% 4.078% 4.395% 4.705% 1.197% 1.485% 5.203% 5.757% tennis_sif.y4m 3.285% 2.614% 3.819% 3.181% 1.063% 1.396% 4.566% 4.056% waterfall_cif.y4m 3.383% 2.391% 4.465% 3.6% 2.385% 3.362% 6.235% 6.673% OVERALL 2.841% 2.657% 3.637% 3.653% 0.991% 1.037% 4.293% 4.235% Table 2. VP10 results on midres set (VP9+ baseline) Extended Transform Set (Ext Tx) Ext Tx and Rectangular Transforms Super Transform All New Transform Tools Video E E E E E E E E BQMall_832x480_60.y4m 2.631% 2.581% 3.79% 3.797% 1.9% 1.815% 5.263% 5.309% BasketballDrillText_832x480_50.y4m 2.106% 1.828% 3.115% 2.516% 0.793% 0.697% 3.773% 3.168% BasketballDrill_832x480_50.y4m 1.787% 1.809% 2.732% 2.676% 1.228% 1.412% 3.224% 3.022% Flowervase_832x480_30.y4m 4.08% 3.629% 5.738% 6.134% 0.812% 0.847% 6.091% 6.49% Keiba_832x480_30.y4m 5.077% 3.58% 6.513% 5.205% 0.096% 0.316% 6.597% 5.54% Mobisode2_832x480_30.y4m 2.364% 1.973% 3.635% 3.322% 0.228% 0.05% 3.745% 3.611% PartyScene_832x480_50.y4m 2.473% 2.29% 3.064% 3.105% 0.978% 0.75% 3.812% 3.69% RaceHorses_832x480_30.yrm 1.209% 1.177% 1.61% 2.242% 0.407% 1.024% 2.735% 3.261% aspen_480p.y4m 1.627% 1.647% 1.881% 1.903% 0.681% 0.497% 2.349% 2.337% city_4cif_30fps.y4m 1.504% 1.664% 2.243% 2.507% 1.638% 1.658% 3.384% 3.348% controlled_burn_480p.y4m 1.625% 1.534% 2.166% 2.201% 0.623% 0.489% 2.898% 2.628% crew_4cif_30fps.y4m 0.773% 0.871% 2.173% 2.564% 0.266% 0.184% 2.073% 2.419% crowd_run_480p.y4m 2.423% 2.121% 2.458% 2.554% 0.425% 0.318% 2.696% 2.265% ducks_take_off_480p.y4m 2.655% 2.235% 4.625% 4.179% 0.499% 0.646% 4.819% 4.51% harbour_4cif_30fps.y4m 1.381% 1.262% 3.252% 3.489% 0.274% 0.191% 3.25% 3.216% ice_4cif_30fps.y4m 2.697% 3.609% 3.146% 4.447% 0.222% 0.253% 3.446% 4.501% into_tree_480p.y4m 1.938% 1.551% 2.71% 2.494% 0.349% 0.215% 2.896% 2.507% old_town_cross_480p.y4m 1.825% 1.899% 2.171% 2.174% 1.171% 0.963% 3.032% 2.644% park_joy_480p.y4m 2.555% 1.725% 2.916% 2.393% 0.673% 0.723% 3.43% 3.417% red_kayak_480p.y4m 1.393% 1.504% 2.049% 2.463% 0.124% 0.24% 2.062% 2.383% rush_field_cuts_480p.y4m 2.318% 1.993% 2.506% 2.438% 0.469% 0.086% 2.672% 2.34% sintel_trailer_2k_480p24.y4m 7.238% 2.588% 9.368% 3.717% 0.6% 0.107% 9.06% 3.706% snow_mnt_480p.y4m 2.729% 2.074% 3.298% 2.695% 0.456% 0.034% 3.484% 2.858% Proc. of SPIE Vol. 9971 997119-7

soccer_4cif_30fps.y4m 2.835% 2.399% 4.698% 4.169% 0.442% 0.327% 4.557% 4.136% speed_bag_480p.y4m 1.366% 1.396% 2.6% 3.238% 0.546% 0.23% 2.793% 3.39% station2_480p25.y4m 2.082% 1.704% 2.666% 2.01% 3.886% 4.536% 5.778% 6.066% tears_of_steel1_480p.y4m 1.113% 0.923% 2.699% 2.349% 0.999% 0.621% 3.293% 2.952% tears_of_steel2_480p.y4m 2.336% 2.412% 3.698% 3.882% 0.732% 0.974% 3.754% 3.378% touchdown_pass_480p.y4m 1.242% 0.739% 1.82% 1.011% 0.553% 0.651% 2.447% 1.836% west_wind_easy_480p.y4m 2.013% 1.64% 2.524% 2.439% 0.013% 0.044% 2.428% 2.248% OVERALL 2.313% 1.945% 3.262% 3.011% 0.735% 0.659% 3.728% 3.439% Table 3. VP10 results on hdres set (VP9+ baseline) Extended Transform Set (Ext Tx) Ext Tx and Rectangular Transforms Super Transform All New Transform Tools Video basketballdrive_1080p50.y4m 2.675% 2.993% 4.386% 5.034% 0.693% 0.947% 4.611% 5.362% blue_sky_1080p30.y4m 1.625% 1.72% 2.048% 1.999% 1.219% 1.224% 2.757% 3.323% bqterrace_1080p60.y4m 2.506% 2.366% 3.308% 3.732% 0.628% 0.942% 3.695% 4.261% cactus_1080p50.y4m 2.054% 2.256% 3.087% 3.432% 1.426% 1.286% 3.989% 4.163% chinaspeed_xga.y4m 11.902% 7.894% 12.627% 9.447% 0.453% 0.123% 12.864% 9.146% city_720p30.y4m 1.449% 1.3% 2.422% 2.318% 0.723% 0.548% 3.127% 3.157% crew_720p30.y4m 0.766% 0.751% 2.3% 2.381% 0.087% 0.014% 2.595% 2.782% crowd_run_1080p50.y4m 1.516% 1.49% 1.97% 2.109% 0.542% 0.465% 2.19% 2.32% cyclists_720p30.y4m 1.573% 1.778% 2.433% 2.169% 0.335% 0.573% 2.405% 2.066% dinner_1080p30.y4m 1.98% 1.69% 3.378% 3.858% 0.541% 0.534% 3.555% 4.044% ducks_take_off_1080p50.y4m 1.12% 1.077% 2.72% 2.857% 0.336% 0.546% 2.697% 2.955% factory_1080p30.y4m 1.407% 1.145% 2.485% 2.031% 1.716% 1.687% 3.483% 2.874% fourpeople_720p60.y4m 3.416% 3.625% 4.459% 4.744% 0.669% 0.823% 5.048% 5.264% in_to_tree_1080p50.y4m 1.925% 2.117% 2.335% 2.403% 0.596% 0.555% 2.682% 2.807% jets_720p30.y4m 4.027% 4.82% 5.712% 7.18% 1.175% 1.358% 5.212% 6.813% johnny_720p60.y4m 4.414% 4.674% 6.109% 6.446% 0.915% 1.209% 6.466% 6.883% kimono1_1080p24.y4m 0.92% 0.951% 1.237% 1.147% 0.192% 0.111% 1.316% 1.254% kristenandsara_720p60.y4m 3.952% 3.549% 5.656% 5.426% 0.644% 0.644% 5.769% 5.543% life_1080p30.y4m 4.289% 2.928% 5.631% 4.05% 0.759% 0.57% 6.111% 4.685% mobcal_720p50.y4m 1.478% 1.055% 2.383% 1.963% 1.274% 1.029% 2.927% 2.54% night_720p30.y4m 2.09% 2.009% 3.14% 3.207% 0.334% 0.236% 3.412% 3.411% old_town_cross_720p50.y4m 2.195% 2.041% 2.802% 2.956% 0.778% 0.941% 3.245% 3.603% parkjoy_1080p50.y4m 1.783% 1.379% 2.102% 1.974% 0.573% 0.654% 2.536% 2.851% parkrun_720p50.y4m 2.376% 1.953% 3.11% 3.2% 0.659% 0.859% 3.4% 3.553% parkscene_1080p24.y4m 2.069% 2.004% 2.797% 2.929% 0.591% 0.718% 2.933% 3.024% ped_1080p25.y4m 1.642% 2.045% 3.001% 3.533% 0.003% 0.089% 2.857% 3.467% riverbed_1080p25.y4m 0.399% 0.338% 0.922% 0.784% 0.062% 0.069% 0.85% 0.693% Proc. of SPIE Vol. 9971 997119-8

rush_hour_1080p25.y4m 1.052% 1.438% 1.77% 2.111% 0.042% 0.203% 1.677% 2.19% sheriff_720p30.y4m 1.907% 1.799% 3.257% 3.375% 0.279% 0.149% 3.41% 3.355% shields_720p50.y4m 2.022% 2.01% 3.029% 3.453% 0.97% 0.93% 3.46% 3.909% station2_1080p25.y4m 1.599% 1.55% 2.293% 2.127% 2.099% 2.78% 3.94% 4.459% stockholm_ter_720p60.y4m 2.983% 3.151% 3.8% 4.285% 0.869% 0.999% 4.007% 4.51% sunflower_720p25.y4m 1.415% 1.885% 2.191% 2.785% 0.068% 0.227% 1.669% 1.766% tennis_1080p24.y4m 1.192% 1.152% 3.397% 3.381% 1.081% 1.363% 3.926% 4.214% tractor_1080p25.y4m 1.539% 1.601% 2.845% 3.052% 0.149% 0.269% 2.625% 2.684% vidyo1_720p60.y4m 3.748% 3.984% 4.735% 5.321% 0.682% 0.948% 4.851% 5.657% vidyo3_720p60.y4m 3.794% 4.491% 5.565% 6.523% 1.79% 1.662% 6.535% 7.302% vidyo4_720p60.y4m 2.791% 2.116% 3.946% 3.381% 1.076% 1.242% 4.433% 4.094% OVERALL 2.41% 2.293% 3.458% 3.503% 0.708% 0.768% 3.77% 3.868% We observe at least a 3% reduction in both PSNR and SSIM in all 3 video sets, confirming the advantage of a rich veriety of transforms. Note that for comparison against VP9 (as opposed to VP9+ baseline), these results could be expected to be 0.5 0.6 % better, however that is hard to verify given the current structure of our codebase. 5. CONCLUSION In this paper we have presented a brief overview of the new transform coding tools that are being explored as part of VP10 development. Preliminary results indicate that increasing transform flexibility can achieve at least a 3% decrease in for both average PSNR and SSIM. Although this is an encouraging improvement, we are left with several avenues to explore within the space of transform flexibility, and still have a ways to go before we reach a viable next generation codec. VP10 development is an open source project, and we invite the rest of the video coding community to join the effort to create tomorrow s royalty free codec. REFERENCES [1] http://www.webmproject.org/ [2] J. Bankoski, J. Koleszar, L. Quillio, J. Salonen, P. Wilkins, Y. Xu, VP8 Data Format and Decoding Guide, RFC 6386, http://datatracker.ietf.org/doc/rfc6386/ [3] D. Mukherjee, J. Bankoski, R. S. Bultje, A. Grange, J. Han, J. Koleszar, P. Wlkins, Y. Xu, The latest open source video codec VP9 an overview and preliminary results, Proc. IEEE Picture Coding Symp., pp. 390 93, San Jose, Dec. 2013. [4] D. Mukherjee, J. Bankoski, R. S. Bultje, A. Grange, J. Han, J. Koleszar, P. Wlkins, Y. Xu, A Technical Overview of VP9 the latest open source video codec, SMPTE Motion Imaging Journal, Jan/Feb 2015. [5] Gary J. Sullivan, JensRainer Ohm, WooJin Han, and Thomas Wiegand, Overview of the High Efficiency Video Coding (HEVC) Standard, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 22, No. 12, Dec 2012. [6] Thomas Wiegand, Gary J. Sullivan, Gisle Bjøntegaard; Ajay Luthra. "Overview of the H.264/AVC Video Coding Standard," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13 No. 7, Jan 2011. [7] D. Mukherjee, H. Su, J. Bankoski, A. Converse, J. Han, Z. Liu, Y. Xu, An overview of video coding tools under consideration for VP10: the successor to VP9, Proc. SPIE, Applications of Digital Image Processing XXXVIII, vol. 9599, Sep 2015. [8] Y. Chen, K. Rose, J. Han, and D. Mukherjee, "A Pre filtering Approach to Exploit Decoupled Prediction and Transform Block Structures in Video Coding", Proc. IEEE International Conference on Image Processing (ICIP), Oct. 2014. Proc. of SPIE Vol. 9971 997119-9

[9] J. Han, A. Saxena, and K. Rose, Towards jointly optimal spatial prediction and adaptive transform in video/image coding, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), pp. 726 729, March 2010. [10] J. Han, A. Saxena, V. Melkote, and K. Rose, Jointly optimized spatial prediction and block transform for video and image coding, IEEE Transactions on Image Processing, vol. 21, pp. 1874 1884, April 2012. [11] J. Han, Y. Xu, D. Mukherjee, A butterfly structured design of the hybrid coding scheme, Proc. IEEE Picture Coding Symp., pp. 1 4, San Jose, Dec. 2013. [12] C. L. Chang, Mina Makar, Sam S. Tsai, B. Girod, Direction adaptive partitioned block transform for color image coding, IEEE Transactions on Image Processing, vol. 19, no. 7, July 2010. [13] F. Kamisli and J. S. Lim, 1 D transforms for the motion compensated residual, IEEE Transactions on Image Processing, vol. 20, no. 4, April 2011. [14] Wang, Zhou; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. (2004 04 01). Image quality assessment: from error visibility to structural similarity, IEEE Transactions on Image Processing, vol. 13, No. 4, pp. 600 612, April 2004. [15] G. Bjøntegaard, Calculation of average psnr differences between rdcurves, VCEGM33, 13th VCEG meeting, Austin, Texas, March 2001. Proc. of SPIE Vol. 9971 997119-10