Technische Universität Berlin, Institut für Fernmeldetechnik Three-Dimensional Subband Coding with Motion Compensation

Size: px
Start display at page:

Download "Technische Universität Berlin, Institut für Fernmeldetechnik Three-Dimensional Subband Coding with Motion Compensation"

Transcription

1 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC/JTC1/SC29/WG11 M0333 MPEG95/ Nov 1995 Source : Title : Status : Jens-Rainer Ohm Technische Universität Berlin, Institut für Fernmeldetechnik Three-Dimensional Subband Coding with Motion Compensation Proposal 1 Functionalities of the Coder Improved compression. The motion-compensated 3D subband coder has good compression capability over a wide range of data rates. The main advantage is the lack of feedback structures (no prediction from frame to frame, no error feedback from badly decoded frames as in hybrid coders). Overlapping motion compensation and use of a lapped subband transform further improve the efficiency. Scalability. Spatial and temporal scalability are natural to the 3D subband approach. The coder is also quality-scalable without loss in efficiency due to the use of universal variable length entropy coding (UVLC). It is important to note that the motion-compensated temporal-axis subband system works independent of the subsequent parts of the encoding process. This means that scaling can freely be applied, almost as in intraframe coders. Scalability also includes temporal scalability of motion parameters, while spatial scalability of motion parameters must be further investigated. Object scalability may be applicable as well, if separate temporal-axis subband transforms are run over background and moving objects. (The scalability functionalities were not provided for the subjective tests, see note on last page) Robustness in error-prone environments. Due to its highly hierarchical and scalable data structure, the encoded data stream can undergo an efficient error protection. Experiments have been performed with a 2-layer version ( 1/3 of the information in the base layer, 2/3 in the enhancement layer). This can be realized with only marginal increase in data rate, caused by the need to provide resynchronization information for both layers independently. Graceful degradation was observed even with severely affected enhancement information. (The error-robustness functionality was not provided for the subjective tests, see note on last page) This proposal contains the description of the coding algorithm used for the subjective test submissions. Two elements of the coder are herein proposed as TOOLS : 1. Motion-compensated interframe subband coding (see description in section 2.1) 2. Motion grid interpolation with contour adaptation (see description in section 2.2) - 1 -

2 2 Technical Description The core technologies of the coder, which are also provided as separate tools in this proposal, are Motion-compensated interframe subband coding (along the temporal axis) ; Motion compensation based on grid interpolation, with adaptation to object borders. 2.1 Motion-compensated Interframe Subband Coding This element of the coder is proposed as a TOOL Fig.1. Octave-band decomposition of a FDG ( length 16 frames ) into the temporal-axis frequency components (top) ; resulting frequency bands (bottom). We use a motion-compensated temporal subband filter approach. In this proposal, 2-tap Haar filter bases were used, but longer temporal filters are applicable as well [1]. The basic difference as - 2 -

3 compared to conventional coders (hybrid MC prediction types) comes from what is encoded. Instead of single "intra-coded" frames, we have a temporal lowpass band, which is a transformed representation containing as much information as possible from all frames within a frame decomposition group (called FDG here to mark the difference to MPEG1/2's GOF). Instead of predictive-coded frames, we have different temporal higher-frequency bands, which represent the fastness of changes within the FDG. Refer to figure 1 to see what is happening. This is an example with progressive video formats, e.g. SIF resolution. The use of the Haar filter base induces that the temporal lowpass (L) information is extracted from two frames A and B by motion-compensated averaging (++), while the temporal highpass (H) information is obtained from a difference operation (-+). Both averages and differences are normalized by factors of 0.5, such that the value range of the lowpass frames is always equal to that of the original frames. When motion-compensated averages and differences are again extracted from the H and L frames (which have the same sizes as A and B), the FDG length is enlarged and we obtain a finer resolution for the temporal-axis frequency decomposition. Figure 1 shows an example with an octave-band subband tree, as it is the best choice for progressively-sampled sequences. The FDG length is 16 frames. Only the lowpass bands are further decomposed in this case; if motion compensation is exact enough, they look very much like original image frames. The resulting temporal-axis frequency resolution is shown in the lower part of figure 1. During synthesis, the motion compensation is reversed, and the FDG is reconstructed starting from the root (LLLL/LLLH) of the subband tree. It is straightforward to perform frame rate scalings by power-of-2 factors. Figure 1 indicates which information would be omitted if a 15 Hz reconstruction is required from an original 30 Hz sequence. Remark that neither the frames "A" nor the frames "B" are actually reconstructed in this case. Instead, a sequence of motion-compensated averages, the first lowpass band L, is replayed. Very fast changing parts of the scene may become gradually smoothed, which is an alias suppression and reduces jerky movements when compared to the usual technique of simply skipping the "A" or "B" type frames. No frame recursion loops are present neither in the coder nor in the decoder, which allows temporal subsampling (replay of lowpass sequences) at any level of the subband decomposition tree. Figure 2 shows an appropriate decomposition structure for interlaced video, as it was actually used in the MPEG4 tests for the class C sequences. In this case, the "A" and "B" are the even and odd fields, respectively. Due to the aliasing effects inherent in interlaced sampling, a relatively high amount of information is present in the first highpass band "H". This can be counteracted by again decomposing the "H" frames in an octave-band tree throughout the FDG. Due to the frequency reversions occuring during the subsampling of high-frequency components in subband systems, we obtain the decomposition as shown in the lower part of figure 2, which we want to designate as "mirrored octave-band". The frequency resolution is narrow at low frequencies and near the frequency of the field rate, where most aliasing occurs. The information about the differences between adjacent fields throughout the FDG is now concentrated in the HLLL band. The subband decomposition frames in this case have the sizes of the fields. If a progressive SIF reconstruction is required, only the L frames of the first subband decomposition level at the decoder are encoded, after they were subsampled in the horizontal direction by a factor of 2 (see section 2.3). All temporal frequencies above 25/30 Hz are discarded. Again, neither the even nor the odd fields, but motion-compensated averages from both are replayed as SIF reconstruction. This is exactly what has been done in the video examples provided for the tests : While analysis was performed on ITU-R-601 sequences, only the SIF resolution information was written to the bit streams. To produce the 601 output format again, the decoder replaces both even and odd fields by the sequence of L-frames. Fig.3 illustrates the procedure of motion-compensated subband analysis at the first analysis level. The problem is to guarantee the inversibility of motion compensation even in the case of inhomogeneous motion. This can be solved as follows (a more detailed description including symbolic program code is given in [1]) : 1. Perform decomposition into subbands L and H at positions with a unique motion path between A and B. The values in L are placed at the position referring to B, while the values in the H get their positions from A

4 2. When no unique motion path exists : Insert original values from B at their proper positions into subband frame L, which happens in the case of uncovered areas. Insert a motion-compensated prediction from previous B into subband frame H, which happens in areas covered from A to B. Fig.2. Mirrored octave-band decomposition of a FDG ( length 8 frames / 16 fields ) into the temporalaxis frequency components (top) ; resulting frequency bands (bottom). Fig.3. a) motion paths, covered and uncovered areas in frames A/B b) positions of subband decomposition results and of inserted values in H and L - 4 -

5 This way, each position in both frames A and B can uniquely be recovered, either by inverse motioncompensated subband synthesis, or by simple replacement of the inserted original/predicted values. The same technique is applied over all levels of the subband tree, e.g. at the second level two subsequent L frames take over the role of A and B to produce LL and LH. The temporal axis subband decomposition as described can be combined with any motion compensation technique. Nevertheless, block matching was found less appropriate due to its inexact description of object borders. If the subband frames at the higher levels of the subband decomposition tree do not provide enough similarity to the original scene contents, artifacts appear in the reconstructed sequence in the case of low-rate encoding. The scheme exhibits enough universality to apply object-oriented techniques as well. The best solution in this case would be a separate subband decomposition of particular objects and background (see section 5). One crucial point in the system are the spatial interpolations necessary when sub-pel accuracy of motion compensation is used. If bilinear interpolation is applied, the signal gets blurred during analysis over several levels of the subband tree, and it gets even more blurred during subsequent synthesis, when inverse motion compensation is performed. In the tests, the frames were upsampled during processing by a factor of 2 horizontally and vertically with a blockoverlapping DCT/IDCT process, and pel values were then taken from these upsampled images by bilinear interpolation. Any other less-blurring interpolation technique, e.g. spline or higher-order linear, might be used as well. It is not mandatory to use the same type of spatial interpolation during analysis and synthesis (see section 4.2). To exploit the redundancy between adjacent FDGs, motion-compensated predictive encoding of temporal subband frames LLLL is an appropriate solution, though this redundancy can be low in the case of fast motion. Predictive encoding of the LLLL information has also to be set into relation with the demand for an error-robust transmission (see section 3). The motion-compensated temporal subband decomposition leaves the frame or field sizes unchanged. The video sequence is decomposed into several temporal subband sequences with different frame rates, e.g. the sequence of LLLL-frames in fig.1 has 1/16th of the original sequence's frame rate. If any intraframe (2D) subband coder is applied to those frames resulting from interframe subband analysis, the coder is a 3D subband device. To emphasize the good properties of the coder for spatio-temporal scalable applications, we performed this full 3D subband decomposition; the properties of spatial (2D) subband coding are further described in section 2.3. It is important to note that the motion-compensated interframe subband coder may freely be combined with other spatial encoding techniques. An eminent advantage, as compared to conventional hybrid coders, is the complete independence of the temporal-axis subband decomposition, including motion compensation, from the subsequent encoding process applied to the subband frames, which can then almost be regarded as single-frame coding operations. This allows for a great freedom to use spatially-scalable and qualityscalable techniques. What should be provided by MPEG4 syntax to define the system : 1. Type of temporal filter ( e.g. Haar or other ) 2. Decomposition structure ( e.g. octave-band, mirrored octave-band, full-band ) 3. FDG length 4. Predictive encoding / frequency of frame refresh in LLLL subband 5. Motion vector field ( best interface to arbitrary motion compensation would be pelwise description of the motion vector field, including description of occlusions, and possibly description of objects ) 6. Spatial interpolation technique with low blur effect - 5 -

6 2.2 Motion Grid Interpolation with Contour Adaptation This element of the coder is proposed as a TOOL The basic scheme of motion compensation used to encode the test sequences is the control grid interpolation approach from [2]. Fig.4 illustrates, how this scheme was applied in the 3D subband coder. The motion vectors are defined at control grid points in the "B" frames and point to references in the corresponding "A" frames. The control grid points in B are spaced in a regular fashion (in the experiments, a distance of GX=GY=16 pel in both horizontal and vertical directions was used). If motion is heterogeneous, the reference points in A will be irregularly distributed. The motion vector field in between the control grid points is interpolated from each 4 adjacent values. During motion estimation, the horizontal and vertical shifts are optimized separately for each grid point. Fig.4 shows the reference region, which is influenced by the central grid point; the displaced frame differences within this region must be taken into account for optimization. From the intention (ability to capture non-translational motion with the same number of motion parameters as with conventional block matching), the scheme is indeed very similar to the overlapped block MC (OBMC) of the H.263 standard. However, CGI is more universal in that the type of motion parameter interpolation is not a priori defined. In our experiments, we used a bilinear interpolation mapping, but e.g. a perspective warp would be applicable as well and might give still more natural motion vector fields. Fig.4. Control grid interpolation motion compensation between A and B frames Fig.5. Switching off interpolation. Control grid & reference points (top) and related motion vector fields (bottom) in the presence of a) covered area b) uncovered area - 6 -

7 One eminent disadvantage of CGI and OBMC is the disability to cope with fast motion and occlusions. Interpolation between adjacent grid values or adjacent blocks is always performed in the original schemes. This problem can be solved, if the interpolation in the region between adjacent points is switched off, whenever a discontinuity in the motion vector field is found. A good indicator for the presence of a discontinuity is the distance between the reference points in the first image (A). We switched off interpolation, whenever this distance was inferior than 0.75 G or exceeded 1.25 G (where G denotes the distance GX or GY between the corresponding control grid points). The effect of this procedure is outlined in fig. 5. Fig.5a is an example where an area is covered (present in frame A, not present in frame B), while fig.5b shows the case of an uncovered area (not present in frame A, present in frame B). Remark that interpolation is still performed where the distances between reference points (o) are within the prescribed limits. Switching off the interpolation, though already rendering a better description of the motion vector field and allowing faster motion, does not yet describe the real position of the covered or uncovered areas. In fig.5, the discontinuity in the motion vector field was assumed to be centered between the particular grid points. To give a more exact description, the scheme in fig.6 was employed, which was able to reduce artifacts in the decoded sequences at low rates. The necessary information is the shape position of the motion discontinuity (in the case of an area covered in frame B) or the shape of an uncovered area itself. For a raw approximation, it is sufficient for the decoder to know the intersection of this shape with the straight line between two diverging control grid points. In fig.5, the top left point's motion indicates a separation from the others. Hence, it is necessary to encode the intersection positions between the top left and bottom left point, and between the top left and top right point. If GX and GY are the grid spacings in the horizontal and vertical directions, the intersection can be at one out of GX-1 or GY-1 positions, respectively. Between those intersections, a straight line (polygon) approximation of the contour was used in our experiments, but other approximations, e.g. spline, might be used as well. For the regions uncovered in B, the contour approximation marks the center of the area. fig.6. Representation of the discontinuities within the motion vector field What should be provided by MPEG4 syntax to define the system : 1. Grid spacing ( e.g. uniform : GX, GY ; nonuniform ) 2. Type of interpolation ( e.g. binlinear, affine, perspective, quadratic warps ) 3. Upper and lower limits of motion discontinuity to switch off interpolation 4. Interface to contour descriptor ( number & position of contour points, type of contour interpolation : e.g. polygon,spline ) - 7 -

8 2.3 Spatial (2D) Subband Coding of the Temporal Subband Frames In the case of interlaced video encoding, the first step of spatial coding is a preprocessing of the frames resulting from temporal subband decomposition. These have the format of ITU-R-601 fields (720x288 or 720x240), and are decomposed into SIF-compatible sizes. This requires a downsampling by a factor of 2 in the horizontal direction for the luminance Y, and a downsampling by factors of 2 in both horizontal and vertical directions for the chrominance components U and V. Accordingly, a quadrature mirror filter (QMF) based decomposition into 2 horizontal frequency bands is applied to Y, while 4 horizontal/vertical frequency bands are generated from U and V (see fig. 7). An odd-length QMF (9-tap from[3]) was employed for this purpose. Fig.7. Preprocessing/decomposition of ITU-R-601 into SIF While the vertical high bands of the chrominance are always discarded in order to encode a 4:1:1 representation, the upper horizontal bands of both luminance and chrominance may be discarded in addition, if only SIF resolution is required after decoding. This is exactly what was done in the sequences provided for the tests. The same scheme is also applicable for full compatibility, maybe from HDTV down to QCIF formats. In this case, it would be useful to apply octave-band (wavelet) decomposition. The resulting SIF frames (352x288 or 352x240) with 4:1:1 components are then processed by a type of lapped transform, a 2D version of the subband approach called time domain aliasing cancellation (TDAC) [4]. The number of frequency bands is 64 in both horizontal and vertical directions, having the same frequency resolution as with a DCT of size 8x8. Any other spatial transform or subband decomposition might be applied as well, but TDAC provides better results than e.g. DCT. The TDAC coefficients remain organized along with those from the same analysis windows (simlilar to coefficient block ordering in DCT) for subsequent processing. What should be provided by MPEG4 syntax to define the system : 1. Type of transform ( e.g. DCT, LOT, TDAC, wavelet, including filter type / basis functions ) 2. Number and arrangement of frequency bands ( e.g. full-band, octave-band, wavelet-packet, 1D/2D, separable/non-separable ) 2.4 Quantization and Entropy Coding The pure quantization of the 3D subband information is more or less conventional. The spatial subbands of the temporal LLLL band were encoded using the intra_quantizer_matrix from MPEG1/2, while the higher frequency bands were processed with a 3/2 deadzone quantizer. Of higher importance is the global quantizer step size which was applied to particular temporal bands. The - 8 -

9 temporal subband decomposition employs nonorthonormal Haar filters (normalization factors 0.5 instead of 2 / 2). This is necessary to embed the uncovered areas into the lowpass frames without visible brightness changes. The consequence is, that the global quantizer step size has to be lowered by a factor of 2 / 2 with each level of the subband decomposition tree. Additionally, within one lowpass frame, the inserted uncovered areas can be quantized by a factor of 2 / 2 coarser than the surrounding "true" lowpass information. The other way round, within one highpass frame, the inserted covered areas, which are predicted from the previous frame, must be quantized by the same factor finer than their neighbors (for a detailed explanation, see [1]). This problem is solved by adjusting the quantizer step sizes locally, according to the number of covered/uncovered pixels under the analysis window of the TDAC transform. Remark that no additional information needs to be transmitted for this purpose, the necessary quantizer step sizes can be derived exactly only from the motion information. Fig.8. Superblock arrangement (64x64) used for 30 Hz sequences (240 lines) The entropy coder used to encode the quantizer output is the universal variable length coder (UVLC) described in [5]. This coder is a runlength coder working on the bitplanes of quantized frequency coefficients. A desirable advantage of this coder is the capability of quantizer scaling without any data overhead. A layered representation of the encoded information is obtained, if the code for the higher bit planes is transmitted as base information, the lower bit planes as enhancement information. For efficient UVLC, it is necessary to reorganize the coefficients before processing. Instead of "one block at a time" (as in MPEG1/2) they are encoded "one frequency at a time". In the original proposal of [5], this is performed in slices containing 90 DCT blocks of size 8x8. At first 90 DC coefficients, then 90 AC-first coefficients etc. are processed. This scheme was modified by using superblocks of size 64x64 pel instead of slices. Each superblock contains the coefficients from 64 TDAC analysis windows for the Y component, and from 2x16 analysis windows for the U&V components. Since the SIF image sizes are not dividable by 64, the superblock arrangements as in figs.8&9 were used for the 30 Hz and 25 Hz formats, respectively

10 Fig.9. Superblock arrangement (64x64) used for 25 Hz sequences (288 lines) The overhead information related to the superblocks is as described in [5]. Coefficients are ordered in eight classes, the quantizer range (necessary bit number) for each class is transmitted, and VLC codes are adapted according to the number of "1"s in each bit plane. In addition, it is possible in our realization to set complete superblocks to zero. This is frequently performed in the higher-frequency temporal subbands at low rates. The encoded bitstreams provided for the MPEG4 tests originated from an experimental encoder and do not yet provide resynchronization information. Presently, a scheme is investigated which allows a resync of the decoder once at the beginning of the frame for the first level of temporal subband decomposition, at each seventh superblock for the second level, at each third superblock for the third, and at each superblock for the fourth level (e.g. lowest-frequency subband LLLL) of the subband tree with a negligible amount of information overhead. What should be provided by MPEG4 syntax to define the system : 1. Type of quantizer ( e.g. deadzone, quantizer matrix ) 2. Arrangement of quantized information ( e.g. block structure, superblock structure, slice structure ) 3. Type of entropy coding ( e.g. UVLC, Huffman VLC, arithmetic ) 4. Specific adaptation of entropy coding ( e.g. coefficient classes, VLC codes ) 2.5 Encoding of Motion Parameters The number of motion parameters to be transmitted depends on the decomposition structure (octaveband or mirrored octave-band), but is generally lower than e.g. in MPEG1/2 with extensive use of B- frame structures. There are three cases : Motion information is directly related to image information. This is the case whenever a highpass frame of the subband decomposition forms the end of the subband tree and is encoded. In fig.1, all marked frames except LLLL, in fig.2 all marked frames except LLLL and HLLL are of this type. There is image information to transmit, but no motion information. This is the case for LLLL (unless MC prediction from the last LLLL is performed) and for HLLL. There is motion information to transmit, but no image information. For example, this is the case at the first level of subband decomposition in fig.2, when ITU-R-601 decoding is required

11 If the information about the particular highpass frame at any level of the temporal subband system is not available at the decoder, the motion information is not necessary as well. Return to fig.2. If SIF reconstruction is required, all motion parameters related to frames beginning with "H.." must not be known at the decoder, and for the whole FDG, only 7 sets of frame motion parameters are needed instead of 22 for the full 601 reconstruction. The same is true for the example in fig.1 : Here, only 7 sets of frame motion parameters need to be transmitted for the 15 Hz reconstruction instead of 15 for the 30 Hz case. The temporal subsampling procedure inherently includes the subsampling of motion parameters. Lossless encoding of the motion parameters (horizontal and vertical shifts for the control grid points) is performed within the same 64x64 superblock structure described in the previous section. When the source sequence has ITU-R-601 format, and the motion grid spacing is GX=GY=16, as applied in the MPEG4 tests, each superblock in the SIF-downsampled representation contains the vectors of 32 motion grid points (during horizontal downsampling, the virtual horizontal grid space is downsampled as well, such that 4 rows of each 8 grid points are contained within each superblock). We have not yet implemented spatial subsampling of the motion parameters. Encoding is performed separately for the horizontal and vertical shift components, with a spatial prediction, and the VLC table from MPEG1. Spatial prediction newly starts at the left grid point within the topmost row of each superblock; the points of the topmost row are predicted from their left neighbors, the points of the lefthand column from their top neighbors. All other values are predicted with factors 0.5/0.5 from both left and top neighbors. The accuracy of motion parameters is half pel. Hence, with the VLC used, it is necessary to transmit additional bits if the value range of the shifts exceeds 7.5, as indicated in the following table. value range <15 <30 <60 <120 <240 additional bits The maximum value range is tested within each superblock, and a 3-bit code is used independently for both horizontal and vertical shifts to inform the decoder about the required additional bit number in this superblock. The presence of shape parameters is determined by a divergence test on the motion parameters, as described in section 2.2 (fig.6). Because a discontinuity in the motion vector field may be situated in between two grid points belonging to different superblocks, the shape parameters are encoded at once for the whole frame. In our case with GX=GY=16, 15 different shape positions are possible, for which a binary 4-bit code is used. The rate necessary to encode the additional shape information is negligible (typically 1/4 of the rate for the motion vectors). For those areas within the highpass frames, which are predicted from the previous "B" frame (see fig.3), an additional 1-bit code indicates which edge of the motion discontinuity exhibits the best displacement vector. What should be provided by MPEG4 syntax to define the system : 1. Arrangement of motion information in the bit stream ( e.g. blockwise, super-blockwise, at once for each frame ) 2. Differential encoding ( type of prediction, 1D/2D ), resync points ( e.g. at each super block ) 3. Type of entropy coding ( e.g. Huffman VLC, arithmetic ) 4. Specific adaptation of entropy coding ( VLC codes ) 5. Relation between motion and shape information 2.6 Type of Subsampling Filter The subsampling filter (ITU-R-601 to SIF) is different from the one suggested in the MPEG4 PPD. A motion-compensated averaging filter is used for vertical subsampling instead of field skipping, because this operation is a natural part of the temporal-axis subband analysis (section 2.2). For horizontal subsampling, the 9-tap QMF from [3] was employed, which is also part of the spatial (2D) subband coder (section 2.3)

12 3 Flexibility of the Coder The 3D subband coder exhibits a wide range of flexibility. Most parts are freely exchangeable with other tools, e.g. one might replace the motion compensation by block matching or even objectoriented techniques, instead of the TDAC a LOT, any subband transform or even DCT might be used as well as any other spatial (intraframe) coder, the UVLC can be replaced by ordinary Huffman VLC or arithmetic coding. The coder is attractive for combination with layered coding techniques. From that, what's really new is the first proposed tool, the motion-compensated temporal axis subband processing. Even this may be combined in a flexible way with conventional hybrid coders working on the lowest temporal frequency frames (LLLL in our example) or also with object-oriented techniques. Remark that, with a FDG length of 1, and MC prediction applied, the scheme reduces to hybrid coding. As it is possible in MPEG1/2 to define the number of B frames and the GOF length, the FDG length for the 3D subband coder should be freely definable, depending on the specific requirements for encoding delay, and on the scene contents. It is even mandatory to allow for switching the length individually from one FDG to the next : A shorter FDG length is necessary before a scene change (this was indeed applied in the simulations for TABLE TENNIS); In the case of very fast changes, it can be suboptimum to use a large FDG length. For the last part of STEFAN (frames after #230), better results would be possible with shorter length. A utility to perform this action is available, but was not yet built in the simulation program. For the second tool proposed, the CGI motion compensation with border adaptation, I'm sure that similar ideas will come up in other proposals. The following points appear mandatory to me for definition in the MPEG4 syntax, to allow flexible implementation of motion parameter interpolation : Allow the definition of several warping techniques, e.g. bilinear, affine, perspective, maybe even inclusion of quadratic terms for nonplanar surfaces; Allow interdependence between motion and shape representation. Shape parameters are needed to know where the discontinuities in the motion vector field are. On the other hand, the motion information can bear information about the presence of any shapes, which may be utilized to reduce the data rate. 4 Complexity 4.1 Storage requirements A full storage of the whole FDG is usually necessary either at the coder or at the decoder. If in-place memory usage is applied (output of subband analysis or synthesis written to the same memory), L+1 (where L is the number of subband levels) frame stores are needed at that end (coder or decoder) that does not provide full FDG storage. Return to fig.1 to illustrate this point (L=4 in this case) : If the coder does not provide full FDG storage, the subband frames are transmitted in the following order : H0, H1, LH0, H2, H3, LH1, LLH0, H4, H5, LH2, H6, H7, LH3, LLH1, LLLH, LLLL. Two frame memories are needed for processing. The following frames must be stored intermediately, where "../.." indicates that these frames use one memory subsequently : L0/2/4/6 (until L1/3/5/7 are processed), LL0/2 (until LL1/3 are processed), LLL0 (until LLL1 is processed). The decoder must start synthesis at LLLL/LLLH, which are transmitted last, and hence would have to store all frames that have been transmitted up to this point. An alternative would be to store the bit stream, reorder it as needed (given under the next point), and wait with decoding until LLLL/LLLH are present; in this case, the decoder would also need only L+1 frame memories, but an additional decoding delay would be introduced. If the coder does provide full FDG storage, the whole FDG is first processed and transmitted in the sequence the frames are needed at the decoder : LLLL, LLLH, LLH0, LH0, H0, H1, LH1, H2, H3, LLH1, LH2, H4, H5, LH3, H6, H7. Intermediate storage would be necessary for LLL1 (until B3 is reconstructed), LL1/LL3 (until B1/B5 are reconstructed), L1/3/5/7 (until B0/2/4/6 are reconstructed)

13 Additional frame memories are recommended at coder and decoder to store the decoded motion shift parameters (horizontal/vertical) and along with these, the covered/uncovered information for each pel. This simplifies programming, and may be omitted only with block-based motion compensation. One memory of double frame size is needed for spatial interpolation, at least with the technique that was applied in the simulations. 4.2 Processing complexity At the coder, the most demanding task is motion estimation. For the MPEG4 tests, a procedure consisting of three steps was employed : 1. Hierarchical block matching (to obtain a smooth vector field) for initial estimation of CGI parameters; 2. Refinement of CGI parameters over the reference region in fig.4; 3. Determination of optimum shape intersections according to fig.6. The second point was only performed if adjacent grid points showed different motions, the third point only if adjacent grid points violated the continuity conditions given in section 2.2. Though a modified telescopic search was used in steps 1&2, the processing time is relatively long due to the large search ranges necessary at the higher levels of the temporal subband tree. Former investigations based on block matching have shown that the search range can be greatly reduced if the motion information from the next-lower level is utilized [1], but this fact was not yet exploited by the CGI-based coder used in the simulations. Further complexity considerations are unique at the coder and decoder sides. Our implementation of the TDAC algorithm takes about three times the processing power needed for a DCT with the same number of coefficients. A more crucial point are the high-quality spatial interpolations necessary to avoid a blurred reconstruction. For this purpose, we use a blockwise DCT of size 32x32 with 8 pel overlap at each side, blow it up to 64x64 by attaching zeros, and perform an IDCT of this size. The whole procedure, including bilinear interpolation which is still necessary unless the motion vector points exactly to a half pel site, costs approximately 40 multiplications per pel. Other interpolation techniques, like spline or higher-order linear, should be investigated at this point. Nevertheless, it is not mandatory to use the same type of spatial interpolation at the coder and decoder. With bilinear interpolation, decoding becomes faster approximately by a factor of 5, and storage requirements are reduced. Indeed, the reconstruction gets more blurred in this case, but no annoying artifacts are introduced. The remaining tasks, e.g. interpolation of the motion vector field from the CGI parameters, determination of covered and uncovered areas etc. are of marginal influence on processing time. 5 Possible Improvements 5.1 Known Bugs in the Simulation The decoder provided for the MPEG4 tests produces the full ITU-R-601 format (e.g. 720x480 for 30 Hz frame rate). Indeed, the first step of encoding, the temporal subband analysis, was performed on the full-size (720-pel rows) images. Encoding, as shown in fig. 7, cuts off 8 columns at each side. After decoding and temporal subband synthesis, we found that this may cause serious effects at the left and right borders of the image, because sometimes information from the omitted columns is expected by inverse motion compensation. This effect was found to be within an acceptable limit for the MOBILE and TABLE TENNIS sequences (it should still be possible to judge about the performance of the coder). With STEFAN, artifacts were detected in some cases even far from the border; here, we changed the spatial coder to rows of width 720 in an ad-hoc action (not documented), providing compatibility by using a spare bit in the bitstream headers. The solution for further experiments is to cut off the columns before temporal subband processing

14 5.2 Possible Improvements to the Present Coder The quantization criteria outlined in section 2.4 are derived from rate-distortion theory, i.e. the temporal subband signal is quantized more exactly whenever its value affects more reconstructed frames. Visual examination suggests, that sometimes fast-moving areas, as well as areas being covered or uncovered, are not handled in an optimum way. This becomes especially visible arond the ball in MOBILE, or near STEFAN himself. The reason is the invalidity of rate-distortion based quantizer assignments at low data rates. If high frequency components have a very low energy, they are set to a zero value and are in fact quantized with less distortion than other ones. Just this is the case here : In areas which cannot be exactly motion-compensated, the temporal high frequency component usually plays an important role and is not set to zero during quantization. Hence, the quantization error is indeed larger than in areas, which can be perfectly reconstructed only from lowpass information. Unlike intraframe coding, where this effect is desirable due to noise masking in detailed areas, artifacts become visible, because the viewer looks at those moving objects. A modified quantization according either to constant SNR or to psychovisual criteria appears necessary at this point. To enhance quality, especially at the lowest rates, it seems to be important to exploit the temporal redundancy of the motion information as well. In the coder used for the MPEG4 tests, motion parameters were encoded independently for all levels of the subband system. We have already performed experiments with a temporal-axis "differential pyramid" encoding of motion parameters, starting at the highest level. At the same time, such a technique is suitable to subdivide the motion parameters into several relevance levels for transmission in error-prone environments [6]. This, however, requires modifications to the bit stream, including possibly the transmission of all motion parameters for the whole FDG as one package. In order to provide a wider range of spatial scalability, maybe from HDTV down to QCIF formats, it would be appropriate to replace the subband decomposition described in section 2.3 (fig.7) by a wavelet pyramid. Further investigations at this point are necessary to realize a spatial scaling of motion parameters along with the image information. 5.3 Organization as an Object-Oriented Coder It is straightforward to combine the temporal-axis subband system with object-oriented encoding techniques. Each lowpass frame has exactly the same coordinate positions as a particular frame from the original sequence (refer to figs.1&3 ; the following frames show basically the same images : L0 B0, L1 B1, L2 B2,..., LL0 B1, LL1 B3,..., LLL0 B3, LLL1 B7, LLLL B7). Hence, if a technique is available that tracks objects from frame to frame, separate temporal subband analysis can be performed without any problem on objects and background. Positions of covered and uncovered areas are exactly known in this case, and the LLLL image of the background would contain information about areas that are visible at any frame within the FDG (remark that the background can be moving as well, or may even consist of several parts with different movements with our technique). For an object with arbitrary shape, it is recommended to use some region-oriented spatial encoder in combination with the temporal-axis subband system. Full object scalability is guaranteed : Even approaches may be realizable, where the background is encoded with the 3D subband system, and particular objects with any other technique. References [1] J.-R. Ohm : "Three-dimensional subband coding with motion compensation," IEEE Trans. Image Proc. 3 (1994), pp [2] G. J. Sullivan and R. L. Baker : "Motion compensation for video compression using control grid interpolation," Proc. IEEE ICASSP (1991), pp [3] E. P. Simoncelli and E. H. Adelson : "Subband transforms," in Subband Image Coding, J. W. Woods (ed.), Boston : Kluwer 1991, p

15 [4] J. P Princen and A. B. Bradley : "Analysis/synthesis filter bank design based on time domain aliasing cancellation," IEEE Trans. Acoust., Speech, Signal Proc. 34 (1986), pp [5] P. Delogne and B. Macq : "Universal variable length coding for an integrated approach to image coding," Annales Télécommunications 46 (1991), pp [6] J.-R. Ohm : "Motion-compensated 3-D subband coding with multiresolution representation of motion parameters," Proc. IEEE ICIP (1994), vol. III, pp [7] J.-R. Ohm : "Advanced packet-video coding based on layered VQ and SBC techniques," IEEE Trans. Circ. Syst. Video Tech. 3 (1993), pp

Pyramid Coding and Subband Coding

Pyramid Coding and Subband Coding Pyramid Coding and Subband Coding Predictive pyramids Transform pyramids Subband coding Perfect reconstruction filter banks Quadrature mirror filter banks Octave band splitting Transform coding as a special

More information

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures

Module 8: Video Coding Basics Lecture 42: Sub-band coding, Second generation coding, 3D coding. The Lecture Contains: Performance Measures The Lecture Contains: Performance Measures file:///d /...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2042/42_1.htm[12/31/2015 11:57:52 AM] 3) Subband Coding It

More information

Pyramid Coding and Subband Coding

Pyramid Coding and Subband Coding Pyramid Coding and Subband Coding! Predictive pyramids! Transform pyramids! Subband coding! Perfect reconstruction filter banks! Quadrature mirror filter banks! Octave band splitting! Transform coding

More information

Digital Video Processing

Digital Video Processing Video signal is basically any sequence of time varying images. In a digital video, the picture information is digitized both spatially and temporally and the resultant pixel intensities are quantized.

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

Scalable Multiresolution Video Coding using Subband Decomposition

Scalable Multiresolution Video Coding using Subband Decomposition 1 Scalable Multiresolution Video Coding using Subband Decomposition Ulrich Benzler Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung Universität Hannover Appelstr. 9A, D 30167 Hannover

More information

Video coding. Concepts and notations.

Video coding. Concepts and notations. TSBK06 video coding p.1/47 Video coding Concepts and notations. A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds. Each image is either

More information

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Jeffrey S. McVeigh 1 and Siu-Wai Wu 2 1 Carnegie Mellon University Department of Electrical and Computer Engineering

More information

Lecture 5: Error Resilience & Scalability

Lecture 5: Error Resilience & Scalability Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides

More information

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri MPEG MPEG video is broken up into a hierarchy of layer From the top level, the first layer is known as the video sequence layer, and is any self contained bitstream, for example a coded movie. The second

More information

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P SIGNAL COMPRESSION 9. Lossy image compression: SPIHT and S+P 9.1 SPIHT embedded coder 9.2 The reversible multiresolution transform S+P 9.3 Error resilience in embedded coding 178 9.1 Embedded Tree-Based

More information

MPEG-4: Simple Profile (SP)

MPEG-4: Simple Profile (SP) MPEG-4: Simple Profile (SP) I-VOP (Intra-coded rectangular VOP, progressive video format) P-VOP (Inter-coded rectangular VOP, progressive video format) Short Header mode (compatibility with H.263 codec)

More information

( ) ; For N=1: g 1. g n

( ) ; For N=1: g 1. g n L. Yaroslavsky Course 51.7211 Digital Image Processing: Applications Lect. 4. Principles of signal and image coding. General principles General digitization. Epsilon-entropy (rate distortion function).

More information

Mesh Based Interpolative Coding (MBIC)

Mesh Based Interpolative Coding (MBIC) Mesh Based Interpolative Coding (MBIC) Eckhart Baum, Joachim Speidel Institut für Nachrichtenübertragung, University of Stuttgart An alternative method to H.6 encoding of moving images at bit rates below

More information

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression Habibollah Danyali and Alfred Mertins University of Wollongong School of Electrical, Computer and Telecommunications Engineering

More information

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang Video Transcoding Architectures and Techniques: An Overview IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang Outline Background & Introduction Bit-rate Reduction Spatial Resolution

More information

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad

Motion-Compensated Subband Coding. Patrick Waldemar, Michael Rauth and Tor A. Ramstad Video Compression by Three-dimensional Motion-Compensated Subband Coding Patrick Waldemar, Michael Rauth and Tor A. Ramstad Department of telecommunications, The Norwegian Institute of Technology, N-7034

More information

Multimedia Standards

Multimedia Standards Multimedia Standards SS 2017 Lecture 5 Prof. Dr.-Ing. Karlheinz Brandenburg Karlheinz.Brandenburg@tu-ilmenau.de Contact: Dipl.-Inf. Thomas Köllmer thomas.koellmer@tu-ilmenau.de 1 Organisational issues

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

Standard Codecs. Image compression to advanced video coding. Mohammed Ghanbari. 3rd Edition. The Institution of Engineering and Technology

Standard Codecs. Image compression to advanced video coding. Mohammed Ghanbari. 3rd Edition. The Institution of Engineering and Technology Standard Codecs Image compression to advanced video coding 3rd Edition Mohammed Ghanbari The Institution of Engineering and Technology Contents Preface to first edition Preface to second edition Preface

More information

SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION

SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION Marek Domański, Łukasz Błaszak, Sławomir Maćkowiak, Adam Łuczak Poznań University of Technology, Institute of Electronics and Telecommunications,

More information

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Week 14. Video Compression. Ref: Fundamentals of Multimedia Week 14 Video Compression Ref: Fundamentals of Multimedia Last lecture review Prediction from the previous frame is called forward prediction Prediction from the next frame is called forward prediction

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

The Scope of Picture and Video Coding Standardization

The Scope of Picture and Video Coding Standardization H.120 H.261 Video Coding Standards MPEG-1 and MPEG-2/H.262 H.263 MPEG-4 H.264 / MPEG-4 AVC Thomas Wiegand: Digital Image Communication Video Coding Standards 1 The Scope of Picture and Video Coding Standardization

More information

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

2014 Summer School on MPEG/VCEG Video. Video Coding Concept 2014 Summer School on MPEG/VCEG Video 1 Video Coding Concept Outline 2 Introduction Capture and representation of digital video Fundamentals of video coding Summary Outline 3 Introduction Capture and representation

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 MPEG2011/N12559 February 2012,

More information

Georgios Tziritas Computer Science Department

Georgios Tziritas Computer Science Department New Video Coding standards MPEG-4, HEVC Georgios Tziritas Computer Science Department http://www.csd.uoc.gr/~tziritas 1 MPEG-4 : introduction Motion Picture Expert Group Publication 1998 (Intern. Standardization

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Video Coding Standards

Video Coding Standards Based on: Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and Communications, Prentice Hall, 2002. Video Coding Standards Yao Wang Polytechnic University, Brooklyn, NY11201 http://eeweb.poly.edu/~yao

More information

10.2 Video Compression with Motion Compensation 10.4 H H.263

10.2 Video Compression with Motion Compensation 10.4 H H.263 Chapter 10 Basic Video Compression Techniques 10.11 Introduction to Video Compression 10.2 Video Compression with Motion Compensation 10.3 Search for Motion Vectors 10.4 H.261 10.5 H.263 10.6 Further Exploration

More information

signal-to-noise ratio (PSNR), 2

signal-to-noise ratio (PSNR), 2 u m " The Integration in Optics, Mechanics, and Electronics of Digital Versatile Disc Systems (1/3) ---(IV) Digital Video and Audio Signal Processing ƒf NSC87-2218-E-009-036 86 8 1 --- 87 7 31 p m o This

More information

In the name of Allah. the compassionate, the merciful

In the name of Allah. the compassionate, the merciful In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei Room: CE 315 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage:

More information

Lecture 6: Compression II. This Week s Schedule

Lecture 6: Compression II. This Week s Schedule Lecture 6: Compression II Reading: book chapter 8, Section 1, 2, 3, 4 Monday This Week s Schedule The concept behind compression Rate distortion theory Image compression via DCT Today Speech compression

More information

Audio-coding standards

Audio-coding standards Audio-coding standards The goal is to provide CD-quality audio over telecommunications networks. Almost all CD audio coders are based on the so-called psychoacoustic model of the human auditory system.

More information

Introduction to Video Compression

Introduction to Video Compression Insight, Analysis, and Advice on Signal Processing Technology Introduction to Video Compression Jeff Bier Berkeley Design Technology, Inc. info@bdti.com http://www.bdti.com Outline Motivation and scope

More information

Video Compression Standards (II) A/Prof. Jian Zhang

Video Compression Standards (II) A/Prof. Jian Zhang Video Compression Standards (II) A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2009 jzhang@cse.unsw.edu.au Tutorial 2 : Image/video Coding Techniques Basic Transform coding Tutorial

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation Chapter 10 Basic Video Compression Techniques 10.1 Introduction to Video Compression 10.2 Video Compression with Motion Compensation 10.3 Search for Motion Vectors 10.4 H.261 10.5 H.263 10.6 Further Exploration

More information

Overview: motion-compensated coding

Overview: motion-compensated coding Overview: motion-compensated coding Motion-compensated prediction Motion-compensated hybrid coding Motion estimation by block-matching Motion estimation with sub-pixel accuracy Power spectral density of

More information

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A.

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A. An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering Rensselaer

More information

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) 1 Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering

More information

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC Ralf Geiger 1, Gerald Schuller 1, Jürgen Herre 2, Ralph Sperschneider 2, Thomas Sporer 1 1 Fraunhofer IIS AEMT, Ilmenau, Germany 2 Fraunhofer

More information

An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT)

An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (SPIHT) Beong-Jo Kim and William A. Pearlman Department of Electrical, Computer, and Systems Engineering Rensselaer

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

VIDEO COMPRESSION STANDARDS

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS Family of standards: the evolution of the coding model state of the art (and implementation technology support): H.261: videoconference x64 (1988) MPEG-1: CD storage (up to

More information

Advances of MPEG Scalable Video Coding Standard

Advances of MPEG Scalable Video Coding Standard Advances of MPEG Scalable Video Coding Standard Wen-Hsiao Peng, Chia-Yang Tsai, Tihao Chiang, and Hsueh-Ming Hang National Chiao-Tung University 1001 Ta-Hsueh Rd., HsinChu 30010, Taiwan pawn@mail.si2lab.org,

More information

Multiresolution motion compensation coding for video compression

Multiresolution motion compensation coding for video compression Title Multiresolution motion compensation coding for video compression Author(s) Choi, KT; Chan, SC; Ng, TS Citation International Conference On Signal Processing Proceedings, Icsp, 1996, v. 2, p. 1059-1061

More information

High Efficiency Video Coding. Li Li 2016/10/18

High Efficiency Video Coding. Li Li 2016/10/18 High Efficiency Video Coding Li Li 2016/10/18 Email: lili90th@gmail.com Outline Video coding basics High Efficiency Video Coding Conclusion Digital Video A video is nothing but a number of frames Attributes

More information

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur Module 9 AUDIO CODING Lesson 29 Transform and Filter banks Instructional Objectives At the end of this lesson, the students should be able to: 1. Define the three layers of MPEG-1 audio coding. 2. Define

More information

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT Wai Chong Chia, Li-Minn Ang, and Kah Phooi Seng Abstract The 3D Set Partitioning In Hierarchical Trees (SPIHT) is a video

More information

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM 74 CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM Many data embedding methods use procedures that in which the original image is distorted by quite a small

More information

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING ABSTRACT Xiangyang Ji *1, Jizheng Xu 2, Debin Zhao 1, Feng Wu 2 1 Institute of Computing Technology, Chinese Academy

More information

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 1, JANUARY 2001 111 A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

More information

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao Video Coding Standards Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao Outline Overview of Standards and Their Applications ITU-T Standards for Audio-Visual Communications

More information

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala Tampere University of Technology Korkeakoulunkatu 1, 720 Tampere, Finland ABSTRACT In

More information

Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding

Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding Detlev Marpe 1, Thomas Wiegand 1, and Hans L. Cycon 2 1 Image Processing

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

JPEG 2000 compression

JPEG 2000 compression 14.9 JPEG and MPEG image compression 31 14.9.2 JPEG 2000 compression DCT compression basis for JPEG wavelet compression basis for JPEG 2000 JPEG 2000 new international standard for still image compression

More information

Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI)

Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI) Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) 9 th Meeting: 2-5 September 2003, San Diego Document: JVT-I032d1 Filename: JVT-I032d5.doc Title: Status:

More information

Optimum Quantization Parameters for Mode Decision in Scalable Extension of H.264/AVC Video Codec

Optimum Quantization Parameters for Mode Decision in Scalable Extension of H.264/AVC Video Codec Optimum Quantization Parameters for Mode Decision in Scalable Extension of H.264/AVC Video Codec Seung-Hwan Kim and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST), 1 Oryong-dong Buk-gu,

More information

JPEG 2000 vs. JPEG in MPEG Encoding

JPEG 2000 vs. JPEG in MPEG Encoding JPEG 2000 vs. JPEG in MPEG Encoding V.G. Ruiz, M.F. López, I. García and E.M.T. Hendrix Dept. Computer Architecture and Electronics University of Almería. 04120 Almería. Spain. E-mail: vruiz@ual.es, mflopez@ace.ual.es,

More information

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About

More information

Variable Temporal-Length 3-D Discrete Cosine Transform Coding

Variable Temporal-Length 3-D Discrete Cosine Transform Coding 758 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 5, MAY 1997 [13] T. R. Fischer, A pyramid vector quantizer, IEEE Trans. Inform. Theory, pp. 568 583, July 1986. [14] R. Rinaldo and G. Calvagno, Coding

More information

Very Low Bit Rate Color Video

Very Low Bit Rate Color Video 1 Very Low Bit Rate Color Video Coding Using Adaptive Subband Vector Quantization with Dynamic Bit Allocation Stathis P. Voukelatos and John J. Soraghan This work was supported by the GEC-Marconi Hirst

More information

CSEP 521 Applied Algorithms Spring Lossy Image Compression

CSEP 521 Applied Algorithms Spring Lossy Image Compression CSEP 521 Applied Algorithms Spring 2005 Lossy Image Compression Lossy Image Compression Methods Scalar quantization (SQ). Vector quantization (VQ). DCT Compression JPEG Wavelet Compression SPIHT UWIC (University

More information

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA ĐIỆN-ĐIỆN TỬ BỘ MÔN KỸ THUẬT ĐIỆN TỬ VIDEO AND IMAGE PROCESSING USING DSP AND PFGA Chapter 3: Video Processing 3.1 Video Formats 3.2 Video

More information

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Module 6 STILL IMAGE COMPRESSION STANDARDS Lesson 19 JPEG-2000 Error Resiliency Instructional Objectives At the end of this lesson, the students should be able to: 1. Name two different types of lossy

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn Basic Video Compression Techniques Chapter 10 10.1 Introduction to Video Compression

More information

Lecture 3: Image & Video Coding Techniques (II) & Standards (I) A/Prof. Jian Zhang

Lecture 3: Image & Video Coding Techniques (II) & Standards (I) A/Prof. Jian Zhang Lecture 3: Image & Video Coding Techniques (II) & Standards (I) A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2009 jzhang@cse.unsw.edu.au 3.1 Subband Coding The fundamental concept

More information

A deblocking filter with two separate modes in block-based video coding

A deblocking filter with two separate modes in block-based video coding A deblocing filter with two separate modes in bloc-based video coding Sung Deu Kim Jaeyoun Yi and Jong Beom Ra Dept. of Electrical Engineering Korea Advanced Institute of Science and Technology 7- Kusongdong

More information

Lecture 5: Compression I. This Week s Schedule

Lecture 5: Compression I. This Week s Schedule Lecture 5: Compression I Reading: book chapter 6, section 3 &5 chapter 7, section 1, 2, 3, 4, 8 Today: This Week s Schedule The concept behind compression Rate distortion theory Image compression via DCT

More information

JPEG Descrizione ed applicazioni. Arcangelo Bruna. Advanced System Technology

JPEG Descrizione ed applicazioni. Arcangelo Bruna. Advanced System Technology JPEG 2000 Descrizione ed applicazioni Arcangelo Bruna Market s requirements for still compression standard Application s dependent Digital Still Cameras (High / mid / low bit rate) Mobile multimedia (Low

More information

Fully scalable texture coding of arbitrarily shaped video objects

Fully scalable texture coding of arbitrarily shaped video objects University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2003 Fully scalable texture coding of arbitrarily shaped video objects

More information

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of

More information

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover

CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING. domain. In spatial domain the watermark bits directly added to the pixels of the cover 38 CHAPTER 3 DIFFERENT DOMAINS OF WATERMARKING Digital image watermarking can be done in both spatial domain and transform domain. In spatial domain the watermark bits directly added to the pixels of the

More information

Lecture 12 Video Coding Cascade Transforms H264, Wavelets

Lecture 12 Video Coding Cascade Transforms H264, Wavelets Lecture 12 Video Coding Cascade Transforms H264, Wavelets H.264 features different block sizes, including a so-called macro block, which can be seen in following picture: (Aus: Al Bovik, Ed., "The Essential

More information

Wireless Communication

Wireless Communication Wireless Communication Systems @CS.NCTU Lecture 6: Image Instructor: Kate Ching-Ju Lin ( 林靖茹 ) Chap. 9 of Fundamentals of Multimedia Some reference from http://media.ee.ntu.edu.tw/courses/dvt/15f/ 1 Outline

More information

Three-Dimensional Subband Coding with Motion Compensation

Three-Dimensional Subband Coding with Motion Compensation Three-Dimensional Subband Coding with Motion Compensation Jens-Rainer Ohm, MEMBER, IEEE 1 IP EDICS category : 1.1 Abstract Three-dimensional (3-D) frequency coding is an alternative approach to hybrid

More information

Image Compression. CS 6640 School of Computing University of Utah

Image Compression. CS 6640 School of Computing University of Utah Image Compression CS 6640 School of Computing University of Utah Compression What Reduce the amount of information (bits) needed to represent image Why Transmission Storage Preprocessing Redundant & Irrelevant

More information

5LSE0 - Mod 10 Part 1. MPEG Motion Compensation and Video Coding. MPEG Video / Temporal Prediction (1)

5LSE0 - Mod 10 Part 1. MPEG Motion Compensation and Video Coding. MPEG Video / Temporal Prediction (1) 1 Multimedia Video Coding & Architectures (5LSE), Module 1 MPEG-1/ Standards: Motioncompensated video coding 5LSE - Mod 1 Part 1 MPEG Motion Compensation and Video Coding Peter H.N. de With (p.h.n.de.with@tue.nl

More information

MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING

MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING MULTIDIMENSIONAL SIGNAL, IMAGE, AND VIDEO PROCESSING AND CODING JOHN W. WOODS Rensselaer Polytechnic Institute Troy, New York»iBllfllfiii.. i. ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD

More information

Lecture 10 Video Coding Cascade Transforms H264, Wavelets

Lecture 10 Video Coding Cascade Transforms H264, Wavelets Lecture 10 Video Coding Cascade Transforms H264, Wavelets H.264 features different block sizes, including a so-called macro block, which can be seen in following picture: (Aus: Al Bovik, Ed., "The Essential

More information

OPTIMIZATION OF LOW DELAY WAVELET VIDEO CODECS

OPTIMIZATION OF LOW DELAY WAVELET VIDEO CODECS OPTIMIZATION OF LOW DELAY WAVELET VIDEO CODECS Andrzej Popławski, Marek Domański 2 Uniwersity of Zielona Góra, Institute of Computer Engineering and Electronics, Poland 2 Poznań University of Technology,

More information

ADVANCES IN VIDEO COMPRESSION

ADVANCES IN VIDEO COMPRESSION ADVANCES IN VIDEO COMPRESSION Jens-Rainer Ohm Chair and Institute of Communications Engineering, RWTH Aachen University Melatener Str. 23, 52074 Aachen, Germany phone: + (49) 2-80-27671, fax: + (49) 2-80-22196,

More information

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm International Journal of Engineering Research and General Science Volume 3, Issue 4, July-August, 15 ISSN 91-2730 A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

More information

Motion-Compensated Wavelet Video Coding Using Adaptive Mode Selection. Fan Zhai Thrasyvoulos N. Pappas

Motion-Compensated Wavelet Video Coding Using Adaptive Mode Selection. Fan Zhai Thrasyvoulos N. Pappas Visual Communications and Image Processing, 2004 Motion-Compensated Wavelet Video Coding Using Adaptive Mode Selection Fan Zhai Thrasyvoulos N. Pappas Dept. Electrical & Computer Engineering, USA Wavelet-Based

More information

Network Image Coding for Multicast

Network Image Coding for Multicast Network Image Coding for Multicast David Varodayan, David Chen and Bernd Girod Information Systems Laboratory, Stanford University Stanford, California, USA {varodayan, dmchen, bgirod}@stanford.edu Abstract

More information

Multiresolution Image Processing

Multiresolution Image Processing Multiresolution Image Processing 2 Processing and Analysis of Images at Multiple Scales What is Multiscale Decompostion? Why use Multiscale Processing? How to use Multiscale Processing? Related Concepts:

More information

Video Compression MPEG-4. Market s requirements for Video compression standard

Video Compression MPEG-4. Market s requirements for Video compression standard Video Compression MPEG-4 Catania 10/04/2008 Arcangelo Bruna Market s requirements for Video compression standard Application s dependent Set Top Boxes (High bit rate) Digital Still Cameras (High / mid

More information

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) 5 MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) Contents 5.1 Introduction.128 5.2 Vector Quantization in MRT Domain Using Isometric Transformations and Scaling.130 5.2.1

More information

Lecture 5: Video Compression Standards (Part2) Tutorial 3 : Introduction to Histogram

Lecture 5: Video Compression Standards (Part2) Tutorial 3 : Introduction to Histogram Lecture 5: Video Compression Standards (Part) Tutorial 3 : Dr. Jian Zhang Conjoint Associate Professor NICTA & CSE UNSW COMP9519 Multimedia Systems S 006 jzhang@cse.unsw.edu.au Introduction to Histogram

More information

Video Coding in H.26L

Video Coding in H.26L Royal Institute of Technology MASTER OF SCIENCE THESIS Video Coding in H.26L by Kristofer Dovstam April 2000 Work done at Ericsson Radio Systems AB, Kista, Sweden, Ericsson Research, Department of Audio

More information

EFFICIENT METHODS FOR ENCODING REGIONS OF INTEREST IN THE UPCOMING JPEG2000 STILL IMAGE CODING STANDARD

EFFICIENT METHODS FOR ENCODING REGIONS OF INTEREST IN THE UPCOMING JPEG2000 STILL IMAGE CODING STANDARD EFFICIENT METHODS FOR ENCODING REGIONS OF INTEREST IN THE UPCOMING JPEG2000 STILL IMAGE CODING STANDARD Charilaos Christopoulos, Joel Askelöf and Mathias Larsson Ericsson Research Corporate Unit Ericsson

More information

Thanks for slides preparation of Dr. Shawmin Lei, Sharp Labs of America And, Mei-Yun Hsu February Material Sources

Thanks for slides preparation of Dr. Shawmin Lei, Sharp Labs of America And, Mei-Yun Hsu February Material Sources An Overview of MPEG4 Thanks for slides preparation of Dr. Shawmin Lei, Sharp Labs of America And, Mei-Yun Hsu February 1999 1 Material Sources The MPEG-4 Tutuorial, San Jose, March 1998 MPEG-4: Context

More information

ECE 533 Digital Image Processing- Fall Group Project Embedded Image coding using zero-trees of Wavelet Transform

ECE 533 Digital Image Processing- Fall Group Project Embedded Image coding using zero-trees of Wavelet Transform ECE 533 Digital Image Processing- Fall 2003 Group Project Embedded Image coding using zero-trees of Wavelet Transform Harish Rajagopal Brett Buehl 12/11/03 Contributions Tasks Harish Rajagopal (%) Brett

More information

Lecture 13 Video Coding H.264 / MPEG4 AVC

Lecture 13 Video Coding H.264 / MPEG4 AVC Lecture 13 Video Coding H.264 / MPEG4 AVC Last time we saw the macro block partition of H.264, the integer DCT transform, and the cascade using the DC coefficients with the WHT. H.264 has more interesting

More information

MPEG-2. ISO/IEC (or ITU-T H.262)

MPEG-2. ISO/IEC (or ITU-T H.262) MPEG-2 1 MPEG-2 ISO/IEC 13818-2 (or ITU-T H.262) High quality encoding of interlaced video at 4-15 Mbps for digital video broadcast TV and digital storage media Applications Broadcast TV, Satellite TV,

More information

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size Overview Videos are everywhere But can take up large amounts of resources Disk space Memory Network bandwidth Exploit redundancy to reduce file size Spatial Temporal General lossless compression Huffman

More information

The Standardization process

The Standardization process JPEG2000 The Standardization process International Organization for Standardization (ISO) 75 Member Nations 150+ Technical Committees 600+ Subcommittees 1500+ Working Groups International Electrotechnical

More information