An Efficient Use of MPEG-4 FAP Interpolation for Facial Animation at 70 bits/frame

Size: px
Start display at page:

Download "An Efficient Use of MPEG-4 FAP Interpolation for Facial Animation at 70 bits/frame"

Transcription

1 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER An Efficient Use of MPEG-4 FAP Interpolation for Facial Animation at 70 bits/frame Fabio Lavagetto and Roberto Pockaj Abstract An efficient algorithm is proposed to exploit the facial animation parameter (FAP) interpolation modality specified by the MPEG-4 standard in order to allow very low bit-rate transmission of the animation parameters. The proposed algorithm is based on a comprehensive analysis of the cross-correlation properties that characterize FAPs, which is here reported and discussed extensively. Based on this analysis, a subset of ten almost independent FAPs has been selected from the full set of 66 low-level FAPs to be transmitted and used at the decoder to interpolate the remaining ones. The performances achievable through the proposed algorithm have been evaluated objectively by means of conventional PSNR measures and compared to an alternative solution based on the increase of the quantization scale factor used for FAP encoding. The subjective evaluation and comparison of the results has also been made possible by uploading mpg movies on a freely accessible web site (referenced in the bibliography). Experimental results demonstrate that the proposed FAP interpolation algorithm allows efficient parameter encoding at around 70 bits/frame or, in other words, at less than 2 kbits/s for smooth synthetic video at 25 frames/s. Index Terms Avatars, facial animation, MPEG-4. I. INTRODUCTION THE OBJECTIVE of the present paper is to define appropriate criteria for allowing MPEG-4 facial animation at very low bit-rate and to provide, at the same time, a detailed explanation on a particular part of the facial animation specification. As in any conventional approach to lossy video compression, in facial animation, the straightforward solution to reduce the bitrate due to the transmission of facial animation parameters (FAPs) is that of increasing the quantization scaling factor used for parameter encoding. The evident disadvantage of this approach is that of progressively degrading the quality of the animation with the introduction of jerky facial movements that are usually very annoying. In synthetic facial animation, different from natural video coding, the coarse quantization of parameters does not affect the rendering quality of each individual frame, but the smoothness of facial movements rendering. However, another possibility exists to reduce the bit-rate of the stream encoding the FAPs according to MPEG-4 specifications. This method is based on exploiting the a priori knowledge about the object that is animated, namely a human face. Through the analysis of the time correlation characterizing the Manuscript received June 1, 2000; revised July 18, This work was supported in part by the European Union under the ACTS Research Project VIDAS, and by the IST Research Project Interface, both coordinated under DIST. This paper was recommended by Associate Editor E. Petajan. The authors are with DIST, Università of Genova, Genova, Italy ( fabio@dist.unige.it; pok@dist.unige.it). Publisher Item Identifier S (01)09156-X. facial movements of a person while talking, it is possible to reduce much of the FAP redundancy. Instead of coding and transmitting the complete set of 66 FAP at each frame, it is possible to encode only a significant subset of them and let the decoder generate the missing parameters, thus achieving very low bit-rate animation. This procedure, named FAP interpolation, represents the core issue discussed in this paper. In Section II, we introduce the specifications of MPEG-4 concerned with FAP interpolation, and explain how the techniques we propose are compliant with the standard and oriented to fully exploit its potentiality. In Section III, we present the methodology and the setup used for high-precision data acquisition of real facial movements. In Section IV, a description is given of the techniques adopted to post-process the acquired data and to define an appropriate subset of FAPs assumed as a minimum basis capable of approximating all the possible human facial movements. The mechanism used by the decoder to interpolate the missing FAPs is explained in Section V, while preliminary objective and subjective results are reported in Section VI. II. FAP STATUS AND INTERPOLATION MPEG-4 specifications concerning facial animation [1] adopt the term FAP interpolation to indicate the procedure used by a generic facial animation decoder to autonomously define the value of the FAPs that have not been encoded and transmitted. Based on the knowledge of only a limited number of FAPs, the objective of the interpolation procedure is to estimate a variable number of those missing. In this context, since both the known and the estimated FAPs belong to the same frame, the estimation procedure is performed intra-frame without taking into account any inter-frame FAP prediction. In this respect, therefore, rather than an interpolation, we should more properly call the procedure an actual FAP extrapolation. After having stressed this terminology ambiguity, let us now examine the related MPEG-4 specifications and focus the attention on some issues that are not of clear and immediate interpretation. FAPs have been subdivided in a few groups, depending on the region of the face they are applied to, with the objective of optimizing the compression efficiency. These groups are listed in Table I. Together with I-Frames, two different hierarchical masks are transmitted, being fap_mask_type and fap_group_mask, with the objective of selecting the subset of the complete set of 68 defined FAPs that will be transmitted in the present I-frame and in the following P-frames. The fap_mask_type is a 2-bit mask whose meaning is described in Table II /01$ IEEE

2 1086 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 TABLE I FAP GROUPS AND NUMBER OF FAPS PER GROUP TABLE II FAP_MASK_TYPE TABLE III LENGTH IN BITS OF FAP_GROUP_MASK VERSUS THE GROUP NUMBER The fap_group_mask, which is encoded only in case the value associated to fap_mask_type is 01 or 10, specifies which FAPs among those in the group are actually transmitted. The size of this mask is variable (see Table III), depending on the specific group it is referred to. The value of this mask can be interpreted as a composition of 1 bit fields: if the bit value is 1, the corresponding FAP is transmitted; otherwise, it is not. As described before, these masks are used to specify which FAPs are transmitted. Moreover, masks are also used to encode the so-called FAP status. At each transmission of frame parameters to the decoder, the FAP status can be one of the following. SET: if the FAP was transmitted by the encoder; LOCK: if the FAP was not transmitted and maintains the value of the previous frame; INTERP: if the FAP was not transmitted and the decoder may define its new value (i.e., it can interpolate its value). SET, LOCK, and INTERP are terms not explicitly mentioned in the standard, but have been introduced by the authors and will be used in the sequel to define the FAP status. The status of non-transmitted FAPs is determined by the decoder according to the value of the fap_mask_type and of the fap_group_mask. In fact, if the fap_mask_type has value 01, non-transmitted FAPs (represented by 0 bit value for the corresponding fields of the fap_group_mask) must maintain the same value of the previous frame (FAPs are in LOCK status); if fap_mask_type has value 10, non-transmitted FAPs can be interpolated by the decoder (FAPs are in INTERP status). The only exception is when the encoder has never transmitted a FAP. In this case, since no past reference value is available for this FAP, the only solution is to force its initial status to INTERP. It is worth figuring out some problems that could be originated by this specification in the case of broadband transmission, at least as far as it concerns the authors interpretation of the standard. When different decoders start to decode the broadcasted bitstream at different instants, an unresolved ambiguity is generated about the correct recognition of not transmitted FAPs. Let us consider the example of a FAP being transmitted for some time at the beginning of the communication and, after a while, no more transmitted. In case the decoder is activated from the beginning (i.e., when the FAP is still transmitted), as soon as its transmission is stopped, it enters a LOCK status. Conversely, in case the decoder is activated after the FAP transmission is stopped, it enters an INTERP status. It must be noticed also that when a FAP is in INTERP status, the decoder is not always free to fix its value. The standard, in fact, defines two default criteria for interpolating FAPs, called left right interpolation and inner outer lip contour interpolation, respectively. These criteria have been defined to exploit two evident characteristic of facial motion: the vertical symmetry of facial movements and the strong correlation between the movements of inner and outer lip contours. As a practical consequence, in case only the FAPs of the right part of the face are transmitted while those of the left part are in INTERP status, the decoder is forced to reproduce those received also on the left half of the face, and vice-versa. The same process is applied with respect to the lip contours: if only the FAPs related to the inner contour are transmitted while those of the outer contour are in INTERP status, the decoder is forced to reproduce those received also on the outer contour, and vice-versa. Another FAP interpolation method is included in the MPEG-4 standard, applicable to all the profiles including facial animation except to the simplest one, Simple FA. This method makes use of the Facial Interpolation Table (FIT) [2] to allow the encoder to define inter-fap dependence criteria in polynomial form. After downloading the FIT parameters, the encoder activity can be limited to the transmission of a subset of FAPs, leaving to the decoder the task of interpolating the missing ones, on the base of the inter-fap relations specified by the FIT. For each FAP to be interpolated, FIT allows the definition of relations such as

3 LAVAGETTO AND POCKAJ: MPEG-4 FAP INTERPOLATION FOR FACIAL ANIMATION AT 70 BITS/FRAME 1087 Each interpolation function I( ) is in a rational polynomial form (this is not true for FAP1 and FAP2; please refer to [2] for more details) (1) An example of FIT can, therefore, be as follows: (2) (3) In the opinion of the authors, however, the use of FIT is mainly oriented to guarantee the predictability of the animation when used together with the Facial Animation Table (FAT). In this way, the encoder will be able to guarantee a minimum level of quality, since the results of the animation would be fully predictable, and to achieve in the meantime a significant savings of bandwidth. It is reasonable to imagine that, for most applications, the inter-fap dependence criteria should be very simple. Thanks to the strong symmetries characterizing a human face, it should be possible to express the majority of these dependencies in terms of simple direct proportions, without the need of adopting complex polynomial functions. Based on these considerations, it turns out that almost all FIT information is, in general, needed at the decoder, provided that, besides implementing the default interpolation functions left right and inner outer lip, it also includes such kind of simple proportional relations. III. ACQUISITION SETUP The Test Data Set has been acquired by the Elite [3] system that is composed of four synchronized cameras, one video, and three IR cameras, together with a microphone. The 3-D position of small IR reflecting markers distributed on the speaker s face is estimated and tracked by the system at 100 Hz by means of suitable triangulation algorithms applied to the trinocular IR images with sub-millimeter precision. Once the 3-D trajectories of each marker have been automatically computed by the system, suitable post-processing is applied to convert them into MPEG-4 compliant FAPs. After having estimated the neutral position of the speaker (as defined by the standard) and the Facial Animation Parameter Units FAPUs), the rigid motion of the speaker s head is computed for each frame by analyzing the position of three reference markers located on almost non deformable face structures, like on the tip of the nose (see Fig. 1). The 3-D coordinates of each marker are normalized with respect to the neutral position (by compensating the rigid motion of the head) and the FAPs associated to the present frame are estimated by comparing the actual compensated positions of the markers with their coordinates in the neutral position. Twenty markers have been distributed on the speaker s face, as shown in Fig. 1. Thanks to this marker configuration, 30 FAPs Fig. 1. Location of feature points defined by MPEG-4 (left) and location of markers in the data acquisition phase (right). have been estimated, specifically: group 4 (eyebrow), group 5 (cheek), group 7 (head), group 8 (outer lip), and FAPs 3, 14, 15, 16, and 17 of group 2 (inner lip, jaw). The acquisition procedure described above has been applied to record a few sequences for a total of about frames with three different speakers, two males and one female. The acquired data have been subdivided into two parts: the Training Data Set obtained by selecting the first 2/3 of each sequence and the Test Data Set composed by the remaining 1/3 of each sequence. The former Data Set has been used to analyze the FAP correlation, while the latter has been used for the performance evaluation. IV. ANALYSIS OF FAP CORRELATION For the analysis of FAP correlation, only a few of the 10 FAP groups defined in MPEG-4 have been considered (see Table I). The analysis we have carried out was oriented to model the FAP s trajectories related to the human facial movements associated to normal speech production. The objective is in reducing the number of FAPs to encode and transmit and providing the decoder with a suitable FAP interpolation mechanism. Should particular nonhuman facial movements be rendered for specific applications (like for the animation of a cartoon-like character), it will be enough to modify the FAP mask associated to the transmission of an I-frame and to transmit all the FAPs describing that particular animation for the time needed. As far as the correlation analysis is concerned, FAPs in group 1 (visemes and expressions) have not been considered since they do not simply encode the scalar value of a specific facial feature like the other FAPs but, on the contrary, they encode high-level information such as the global posture of the mouth and the whole facial expression. No measurement of FAPs in group 6 was possible for the limitations of the data acquisition system that has been used, and the same happened for some FAPs of group 2 (inner lip contour). FAPs in groups 9 and 10 have been excluded from the analysis since human facial movements associated to the nose and ears are usually negligible and of limited interest. FAPs in

4 1088 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 group 7 (head) have been considered, even if the three rotations of the head have been assumed decorrelated. In conclusion, for the correlation analysis, three sets of FAPs have been considered: 1) some of those in group 2 together with those in group 5 and in group 8; 2) those in group 3; and 3) those in group 4. The reason for analyzing together, in the first set, FAPs coming from different groups (while the other two sets include only homogeneous FAPs) is due to the a priori knowledge about the anatomy of human faces. Bone and muscle structures of a human head, in fact, indicate that some regions of the face are affected by correlated motion while others can be reasonably assumed to be independent. As an example, movements of cheeks and lips are strongly correlated, while movements of the eyebrows can be considered rather independent from those of the jaw. This consideration is very evident if facial movements are analyzed at a low level and instant by instant, while it becomes less and less valid if the analysis is shifted to the semantic level and applied on longer time intervals. In this last case, for instance, it might be possible to identify long-term correlations between the emphasis of pronunciation (with relation to lip movements) and movements of the head and eyebrows [4]. Anyway, these considerations go far beyond the scope of the paper, and our investigation here will be limited only to low-level FAP analysis even if, in the opinion of the authors, this represents a promising and fertile field of research for future applications. A. Computation of FAP Correlations Our analysis has been based on the FAP correlation matrix [5], computed as follows: where is the number of FAPs under investigation and represents the correlation coefficient between the th and the th FAPs, defined as The term represents the variance of the th FAP and the coefficient of the covariance matrix defined as being the mean value of the th FAP. The matrix is symmetric by definition; that is, with. Though in rigorous mathematical terms is a signed real number, we have chosen to consider its absolute value to simplify the graphical interpretation of the results. The more approximate the value 1, the more the th and the th FAPs are correlated. It is worth mentioning the fact that other methods, different from the approach based on the correlation matrix described above, have been considered by the authors to the end of identifying a set of uncorrelated FAPs among the 66 defined by (4) (5) (6) Fig. 2. Graphical representation of matrix R for group 4 (white squares indicate high correlation). MPEG-4. The most suitable among them might seem to be the Principal Component Analysis (PCA), of widespread use in this class of problems. However, they are of no use in this application and, therefore, have been discarded, since the -vector basis they provide for given vectors ( ) almost never coincides with a subset of the given vectors. On the contrary, this is an obvious constraint for an MPEG-4 compliant codec, where only the 66 FAPs or possibly subset of them are allowed to be transmitted. After defining the criterion for the correlation estimation, let us examine the various FAP groups. In the following, it will be discussed how to choose the minimum set of FAPs that must be transmitted. In Section V, on the contrary, criteria will be defined to interpolate non-transmitted parameters. B. Group 4 (Eyebrows) Matrix is represented with gray levels in Fig. 2. The graphic representation of allows a faster estimation of correlations. Bright blocks indicate a high value of while dark blocks identify totally uncorrelated FAPs. The visualization of allows a first data validation by comparing the correlation of FAPs on the right side of the face with that of FAPs on the left side. By inspection, it turns out that an unexpected very low correlation between FAP 38 and the other FAPs. By analyzing the temporal trajectory of FAP 38, it was discovered that its intrinsically low dynamics (squeeze_r_eyebrow), typically in the order of a few tens of Eye Separation Unit (ES), has been completely obscured by the acquisition noise, whose standard deviations was comparable with the signal. Because of this systematic error in the measurement process, all the FAPs affected by this kind of acquisition noise have been excluded from further analysis and a second matrix has been computed where the columns corresponding to symmetric FAPs have been merged. Since MPEG-4 forces identical values for symmetric FAPs in case of interpolation, it is convenient to con-

5 LAVAGETTO AND POCKAJ: MPEG-4 FAP INTERPOLATION FOR FACIAL ANIMATION AT 70 BITS/FRAME 1089 Fig. 3. Graphical representation of matrix R for group 4 (white squares indicate high correlation). TABLE IV MATRIX R FOR GROUP 3(LEFT) AND VALUES OF THE s COEFFICIENT (RIGHT) Fig. 4. Graphical representation of matrix R for group 2 (partial), 5, and 8 (white squares mean high correlation). sider symmetric FAPs as a single parameter (in our case, we have chosen to represent only the left FAPs). On the basis of the above considerations, measures related to FAP 38 have been replaced with those of the symmetric FAP 37, whose acquisition was significantly less noisy. Matrix is visualized in Fig. 3, while the values of its coefficients are reported in Table IV. The information associated to Table IV provides good indications for selecting the specific FAP that will be used to interpolate the other ones. The criterion we have adopted is that of selecting the th FAP having the highest correlation with respect to the other FAPs The th FAP can now be used to interpolate any other th FAP with. The threshold value equal to 0.75 has been determined experimentally, seeking for a reasonable tradeoff between rate and distortion. Once this operation has been completed, rows and columns corresponding to the th and th FAPs that have been selected are removed from matrix and the coefficients are recomputed. This process is then iterated until all the examined FAPs have been analyzed. As far as FAPs in group 4 are concerned, the values of and allow an easy choice for FAP 33 (or for its symmetric FAP 34) as the best and unique FAP to transmit, since. In Section V, the interpolation criteria will be discussed and defined. (7) C. Groups 2, 5, and 8 (Jaw, Chin, Lip Protrusion, Cheeks, Outer lip, Cornerlip) The corresponding matrix is visualized in Fig. 4. By inspecting matrix, a few significant considerations can be drawn. Also in this case, an unexpected low correlation is found between FAPs 53 and 54 (stretch outer corner lips). If we then consider FAPs corresponding to the upper lip, it is evident to conclude that they are substantially uncorrelated from all the other FAPs. Also, FAP 15 (shift_jaw) seems to be totally uncorrelated from the other FAPs. This conclusion is confirmed by a deeper analysis of its time trajectory, revealing small random variations due, very likely, to the acquisition noise, since during speech this movement is substantially absent. Let us now consider the matrix, obtained from by removing noisy FAP and by merging left and right pairs. When inspecting the coefficients of matrix (see Fig. 5 and Table V), choosing the FAPs to transmit seems to be more complex than in group 4. The first FAP selected for transmission is FAP 52, used then to interpolate FAPs 3, 14 and 57 (having ). Therefore, four rows and four columns can be removed from and the new matrix is obtained by recomputing coefficients (superscript 1 indicates that the first FAP to transmit has been selected or, in other terms, that the first simplification step has been completed). As evidenced in Table VI and Fig. 6, FAPs with the highest values are FAP 41 (lift_l_cheek) and FAP 59 (raise_l_cornerlip_o). The decision of selecting FAP 59 as the second parameter to transmit, though was slightly greater than, is due to evident technical reasons: experiments show that the automatic extraction of lip corners from real video sequences is far easier than that of cheek coordinates. The comparable

6 1090 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 TABLE V MATRIX R FOR GROUP 2(PARTIAL), 5 AND 8(LEFT), AND VALUES OF THE s COEFFICIENT (RIGHT) TABLE VI MATRIX R FOR GROUP 2(PARTIAL), 5, AND 8(LEFT), AND VALUES OF THE s COEFFICIENT (RIGHT) Fig. 5. Graphical representation of matrix R for group 2 (partial), 5, and 8 (white squares indicate high correlation). values of and, together with the high cross correlation ( ), should guarantee good interpolation anyway. Since the only value over threshold ( ) is just met for, it can be reasonably assumed that FAP 41 can be interpolated effectively from FAP 59. Fig. 6. Graphical representation of matrix R for group 2 (partial), 5, and 8 (white squares indicate high correlation). Let us now erase these two FAPs and recompute,as shown in Table VII and Fig. 7. Now the highest value is reached by, even if it is only slightly higher than. Also in this case, only FAP 39 has

7 LAVAGETTO AND POCKAJ: MPEG-4 FAP INTERPOLATION FOR FACIAL ANIMATION AT 70 BITS/FRAME 1091 TABLE VII MATRIX R FOR GROUP 2(PARTIAL), 5, AND 8(LEFT), AND VALUES OF THE s COEFFICIENT (RIGHT) Fig. 8. Graphical representation of matrix R for group 2 (partial), 5, and 8 (white squares indicate high correlation). Fig. 7. Graphical representation of matrix R for group 2 (partial), 5, and 8 (white squares indicate high correlation). MATRIX R TABLE VIII FOR GROUP 2(PARTIAL), 5, AND 8(LEFT), AND VALUES OF THE s COEFFICIENT (RIGHT) openings and to the movements of the mouth corners (see the complete matrix ). Anyway, experimental results prove that these relations cannot be modeled easily, and in particular, that no linear dependence can be formalized among FAPs 16 and 17, on one side, and the other FAPs of groups 2, 8, and 5, on the other side. D. Group 3 (Eyeballs, Pupils, Eyelids) and results as the only parameter that can be interpolated from FAP 53. The next step is the computation of, as described in Table VIII and Fig. 8. The next parameter to transmit would be FAP 55, though with a value only slightly greater than. Also in this case, as in the first step, we have preferred to transmit FAP 51 (lower_top_midlip_o), since it is highly correlated to FAP 55 and is more easily trackable from real video analysis. FAPs 16 and 17 are the last ones to be transmitted, being those maximally uncorrelated from the others. Some comments are worth drawing about lip protrusion. Since these FAPs describe variations along the axis (with positive orientation out of the screen) of the mid point of the upper and lower lip, their values are partially correlated to lip There is no need for experimental evidence to state that movements of the eyeballs and pupils are maximally correlated, since they are affected by the same rigid motion. Lower eyelids are most of time static and rarely affected by an almost unperceivable motion. Movements of the upper eyelids, on the other hand, are substantially uncorrelated from eyeballs and pupils. Therefore, as defined in the MPEG-4 specifications through the left right interpolation criterion, all the right FAPs in this group can be interpolated from the corresponding left ones (or vice versa). Based on the previous considerations, we can conclude that only three FAPs (19, 23, and 25) are necessary for animating the entire group; the remaining FAPs are not necessary or can be interpolated as indicated in Table IX. Their transmission, if needed, can be omitted by simulating eye blinking at the decoder and by synthesizing the movements of the eyeballs based on the parameters encoding the head rotation. In Section V-D, some criteria are explained to synthetically generate FAPs 19, 23, and 25 and, therefore, to completely avoid the transmission of any FAP in group 3.

8 1092 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 TABLE IX FAP INTERPOLATION COEFFICIENTS (VALUES OF ) FOR GROUP 3 (TX MEANS TRANSMITTED FAP, EMPTY ROWS INDICATE FAP NOT NECESSARY FOR TYPICAL ANIMATIONS): 1) INDICATE THAT THOSE FAP CAN POSSIBLY BE TOTALLY SYNTHESIZED AND 2) INDICATES THAT THOSE FAP CAN BE INTERPOLATED BY HEAD MOVEMENTS TABLE X FAP INTERPOLATION COEFFICIENTS (VALUES OF ) FOR GROUP 3 (TX MEANS TRANSMITTED FAP) V. FAP INTERPOLATION CRITERIA After having selected the FAPs that optimize the estimation of not transmitted parameters (see Section IV), their specific mutual dependencies must be formalized and implemented. A. Computation of FAP Interpolation Criteria For the sake of simplicity, let us assume that each of the FAPs to be interpolated is linearly dependent from a single FAP of those actually transmitted. As it will be evidenced by the experimental results reported in the following, this hypothesis is very close to reality. By comparing the trajectories of the estimated and measured FAPs, it turns out that they are quite similar, except in correspondence of intervals where the FAP amplitude is very high. It is reasonable to suppose that this effect is due to nonlinear saturation distortion affecting the estimates, which is difficult to face, even by modeling FAP dependences through more complex relations. As an example, let us consider FAP 3 (open_jaw) and FAP 52 (raise_b_midlip_o): when the jaw is completely closed or open, the lower lip still has some residual possibility to move independently from the jaw itself. Despite this annoying but, fortunately, rare and scarcely perceivable distortion, let us formalize the linear inter-fap dependence as follows: (8) where represents the value of the FAP in correspondence of the th frame of a sequence with frames, is the parameter to be interpolated and the parameter actually transmitted. The problem consists of determining the interpolation coefficient that minimizes the mean square error (MSE), defined as The optimal coefficient results as being (9) (10) Fig. 9. Trajectories of FAP 31 (raised left inner eyebrow, solid line) estimated by FAP 33 (raised left middle eyebrow) and its actual value (dashed line). Fig. 10. Trajectories of FAP 37 (squeezed left eyebrow, solid line) estimated by FAP 33 (raised left middle eyebrow) and its actual value (dashed line). B. Group 4 (Eyebrows) Table X summarizes the values computed for the FAPs of group 4. In Figs. 9 and 10, the trajectories of the estimated FAPs (solid line) are compared to the actual measured values (dashed line). Besides the substantially correct reproduction of the trajectories,

9 LAVAGETTO AND POCKAJ: MPEG-4 FAP INTERPOLATION FOR FACIAL ANIMATION AT 70 BITS/FRAME 1093 TABLE XI FAP INTERPOLATION COEFFICIENTS (VALUES OF ) FOR GROUP 2, 5 AND 8 (TX MEANS TRANSMITTED FAP, EMPTY ROWS INDICATE FAP NOT NECESSARY FOR TYPICAL ANIMATIONS) it must be noticed how, in the case of FAP 37, affected by significant acquisition noise, estimates are somehow the low-pass filtered replica of the original parameters. Instead of degrading the quality of the synthesis, this low-pass filtering has the positive effect of gracefully smoothing the animation, thus making it more natural and realistic. C. Groups 2, 5, and 8 (Jaw, Chin, Lip Protrusion, Cheeks, Outer Lip, Cornerlip) Table XI indicates the values of computed for FAPs of groups 2, 5, and 8. In Figs , the trajectories are reported of estimated FAPs (solid line) compared with the actual FAP values (dashed line). D. Group 3 (Eyeballs, Pupils, Eyelids) Various studies, both in medicine and in psychology, have computed the typical value for the frequency and duration of eye blinking. In [6], as an example, it is evidenced how this frequency ranges between Hz, with an average duration of eye closure equal to ms. Based on these experimental evidences, it is easy to simulate, at the decoder, the eye blinking by means of FAP 19. Some experiments and subjective evaluations carried out by the authors have proven that, for many applications, it can be acceptable to interpolate FAP 23 and 25 based Fig. 11. Trajectory of FAP 3 (open jaw, solid line) estimated by FAP 52 (raised bottom midlip outer) and its actual value (dashed line). only on the head rotation, doing it in a way to maintain the sight of the virtual character as frontal as possible for meeting the sight of the interacting human, supposed to be seated frontally to the monitor. The typical values that we have used to interpo-

10 1094 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 Fig. 12. Trajectory of FAP 57 (raised bottom lip left middle outer, solid line) estimated by FAP 52 (raised bottom midlip outer) and its actual value (dashed line). Fig. 13. Trajectory of FAP 39 (puffed left cheek, solid line) estimated by FAP 53 (stretched left cornerlip outer) and its actual value (dashed line). late the movements of the eyes based on the movements of the head, obtained experimentally, are VI. EXPERIMENTAL RESULTS In conclusion, from that described in the previous sections, it turns out that a subset of only 10 FAPs, suitably chosen from the complete set of 66 FAPs, can be used to guarantee the efficient encoding of MPEG-4 facial animation sequences. In particular, one FAP for the eyebrow movements (FAP 33), six FAPs for mouth and cheeks movements (FAP 16, 17, 51, 51, 53, and 59), and three FAPs for head rotation (FAP 48, 49, and 50). As is proven experimentally, the remaining FAPs can be easily interpolated from those actually transmitted or result being superfluous in typical sequences with continuous speech produced by Fig. 14. Trajectory of FAP 41 (lifted left cheek, solid line) estimated by FAP 59 (raised left cornerlip outer) and its actual value (dashed line). natural faces. In the opinion of the authors, at least another FAP of group 6 should be transmitted for controlling the movements of the tongue. However, the evident difficulties in tracking the tongue movements have prevented up to now from its analysis and modeling. In the remainder of this section, we provide a quality evaluation both objective and subjective of the animation obtained by using only this subset of 10 FAPs, compared to what is achievable by exploiting the full set of 46 FAPs captured with the acquisition system described in Section III. For running the experiments, three different sets of FAPs have been generated. The first, set A, has been generated by interpolating the FAP Test Data Set (see the description in Section III) starting from only 10 FAPs encoded with a quantization scaling factor of 1 and then decoded. The second, set B, has been obtained by encoding, and then decoding, the entire Test Data Set with a quantization scaling factor of 16 in a way to maintain the same bitrate (around 1.4 kbits/s) associated with set A. The third, set C, has been obtained by encoding, and then decoding, the entire Test Data Set with a quantization scaling factor of 1 and, therefore, with the same quality of set A, but with a higher bitr ate. Each of the three sets A, B, and C achieves a frame rate of 25 frames/s. Fig. 15 provides a graphical description of the three sets. The results have been compared to the original Test Data Set before parameter encoding. In addition to the 30 FAPs actually captured through the acquisition system, the Test Data Set includes some FAPs whose value has been synthesized by fantasy, like the ten missing FAPs in group 2, those controlling the upper eyelids, and those responsible for the eyeball rotation. This is with the purpose of simulating a more realistic situation, obtainable in case the complete set of FAPs could be captured through a more sophisticated acquisition system, where the maximum available information for facial animation is used, and for better comparing FAP encoding with and without FAP interpolation. In Table XII the bit rate achieved for the three sets A, B, and C is reported. Let us notice that the bit rate for set A is signif-

11 LAVAGETTO AND POCKAJ: MPEG-4 FAP INTERPOLATION FOR FACIAL ANIMATION AT 70 BITS/FRAME 1095 Fig. 15. The three test data sets used in the experiments. TABLE XII BIT RATES FOR THE THREE TEST DATA SETS TABLE XIII PSNR FOR SOME FAP OBTAINED WITH THE THREE DIFFERENT CODING SCHEMES, WITH (Y) OR WITHOUT (N) INTERPOLATION Fig. 16. FAP 35 (raise_l_o_eyebrow) encoded with qsf = 1 (solid line marked with ), encoded with qsf = 16 (dotted line) and interpolated (solid line marked with ); note that, though the FAP35 encoded with qsf = 16 has a better PSNR, its step-wise behavior results in a subjectively worse animation than the interpolated FAP. icantly lower than for set C while maintaining high quality in FAP reproduction (as evidenced in Table XIII). Table XII reports the PSNR values computed for a few FAPs, part of them interpolated and part of them transmitted, as specified in column Interp. The results reported in Table XIII suggest some important considerations. In the case of set A, the interpolated FAPs are obviously characterized by a PSNR lower than in the other two cases. Nevertheless, the temporal trajectory of these FAPs, unlike for set B, is not substantially affected by quantization distortion. Fig. 16 evidences clearly this phenomenon: when FAP 35 (raise_l_o_eyebrow) is interpolated, its value differs from the actual measure more than in the case of qsf equal to 16. How- Fig. 17. FAP 48 (head_pitch) encoded with qsf = 1 (solid line) and encoded with qsf = 16 (dotted line); note that the FAP48 encoded with qsf = 16 has both a worse PSNR and a step-wise behavior if compared with the FAP 48 encoded with qsf = 1. ever, the step-wise characteristics of the quantization noise turn out to be subjectively more annoying during the animation.

12 1096 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 TABLE XIV ENCODING PARAMETERS OF THE WOW SEQUENCE A second consideration concerns the 10 FAPs transmitted in case of set A and whose quality results far higher than in case of set B, where all the FAPs are subject to coarse quantization. In particular, the movements of the head are among those most sensitive to quantization noise, as shown in Fig. 17: the dark line represents the trajectory of FAP 48 (head_pitch) either in case of set A and of set C, while the gray line is referred to case B. Based on these experimental evidences, it is important to notice that the reproduction of head movements must be sufficiently smooth to avoid severe subjective artifacts, like annoying jerky head motion. Differently from many other movements of the face, FAPs controlling the head motion must be quantized with very small values of qsf. The third and last consideration suggested by the analysis of Table XIII concerns the negligible information associated to the average PSNR for the mutual comparison of sets A, B, and C. The above considerations on FAP quantization, and the fact that, in set A, the PSNR computed over the interpolated FAPs differs significantly from the PSNR associated with the transmitted FAPs, make almost meaningless the use of the average PSNR as an objective evaluation criterion. The achieved results put in evidence how, toward the objective of reducing as much as possible the bit rate for transmitting a FAP stream, it is preferable to employ FAP interpolation rather than increasing the quantization scaling factor. In order to allow a more reliable subjective evaluation of the quality improvements that can be achieved by exploiting FAP interpolation, two movies are available on our web site, 1 based on the Facial Animation Engine (FAE) developed at the DSP Lab of DIST. In the first movie, the stream wow.fap (donated by DIST to the MPEG Face and Body Animation Ad Hoc Group) encoded with qsf 1 (case A) is compared to the same stream encoded at very low bitrate with qsf 16 (case C). The second movie compares case A with the same stream encoded at very low bitrate by using FAP interpolation (case B). The subjective evaluation is left to the readers. In Table XIV, the characteristics of the sequences are summarized. As a final consideration, it is important to notice how the conventional objective evaluation based on the PSNR, 1 computed on each static frame, here has almost no meaning since the quality of the synthetic images is good all the time. On the contrary, the increase of the quantization factor affects significantly the movement smoothness that, as evidenced by the movie WowQ.mpg, can only be appreciated and evaluated by playing the video. VII. CONCLUSION The cross-correlation analysis between MPEG-4 FAPs reported in this paper, together with the proposed algorithm for FAP interpolation, represent a key reference for any study oriented to exploit this FAP encoding modality for achieving efficient transmission of FAP streams. The innovative contribution of this study consists both of the specific technical solution that is proposed and of the experimental evidences that are produced. Up to now, to the knowledge of the authors, no investigation has been reported in the scientific literature on the exploitation of FAP interpolation modality, nor any concrete proposal on procedural solutions. The experimental results here reported provide clear indication of the performances level that FAP interpolation can guarantee and they suggest, therefore, a large variety of possible applications of MPEG-4 facial animation technologies within Internet based services, mobile interpersonal communications, etc. REFERENCES [1] Text for ISO/IEC FDIS Visual, ISO/IEC JTC1/SC29/WG11 N2502, Nov [2] Text for ISO/IEC FDIS Systems, ISO/IEC JTC1/SC29/WG11 N2501, Nov [3] G. Ferrigno and A. Pedrotti, ELITE: A digital dedicated hardware system for movement analysis via real-time TV signal processing, IEEE Trans. Biomed. Eng., vol. BME-32, pp , [4] C. Pelachaud, N. Badler, and M. Steedman, Generating facial expressions for speech, Cog. Sci., vol. 20, no. 1, pp. 1 46, [5] A. Papoulis, Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill, [6] J. A. Stern, D. Boyer, D. J. Schroeder, R. M. Touchstone, and N. Stoliarov, Blinks, saccades, and fixation pauses during vigilance task performance: II. Gender and time of day, ADA FAA Office of Aviation Medicine Civil Aeromedical Institute Publications, Aviation Medicine Reports, [7] F. Lavagetto and R. Pockaj, The facial animation engine: Toward a high-level interface for the design of MPEG-4 compliant animated faces, IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp , Mar

13 LAVAGETTO AND POCKAJ: MPEG-4 FAP INTERPOLATION FOR FACIAL ANIMATION AT 70 BITS/FRAME 1097 Fabio Lavagetto was born in Genoa, Italy, in He received the Laurea degree in electrical engineering from the University of Genoa, Genoa, Italy, in March 1987, and the Ph.D. degree from the Department of Communication, Computer and System Sciences (DIST), University of Genoa, in From 1987 to 1988, he was with the Marconi Group, Genova, Italy, working on real-time image processing. He was a visiting researcher with AT&T Bell Laboratories, Holmdel, NJ, during 1990 and a Contract Professor in digital signal processing at the University of Parma, Parma, Italy, in Presently, he is an Associate Professor with DIST, where he teaches a course on radio communication systems, and is responsible for many national and international research projects. During , he coordinated the European ACTS project VIDAS, concerned with the application of MPEG-4 technologies in multimedia telecommunication products. Since January 2000, he has been coordinating the IST European project INTERFACE, which is oriented to speech/image emotional analysis/synthesis. He is the author of more than 70 scientific papers in the area of multimedia data management and coding. Roberto Pockaj was born in Genova, Italy, in He received the Masters degree in electronic engineering in 1993 from the University of Genova, Genova, Italy, and the Ph.D. degree in computer engineering and computer science from the Department of Communications, Computer and System Sciences (DIST), University of Genova, in From June 1992 to June 1996, he was a software designer with the Marconi Group, Genova, Italy, working in the field of real-time image and signal processing for optoelectronic applications (active and passive laser sensors). Between 1996 and 2001, he collborated on the management of the European projects ACTS-VIDAS and IST-INTERFACE, and participated in the definition of the new standard MPEG-4 for the coding of multimedia contents within the Ad Hoc Group on Face and Body Animation. He is currently a Contract Researcher at DIST. He has authored many papers on image processing and multimedia management.

VISEME SPACE FOR REALISTIC SPEECH ANIMATION

VISEME SPACE FOR REALISTIC SPEECH ANIMATION VISEME SPACE FOR REALISTIC SPEECH ANIMATION Sumedha Kshirsagar, Nadia Magnenat-Thalmann MIRALab CUI, University of Geneva {sumedha, thalmann}@miralab.unige.ch http://www.miralab.unige.ch ABSTRACT For realistic

More information

Facial Deformations for MPEG-4

Facial Deformations for MPEG-4 Facial Deformations for MPEG-4 Marc Escher, Igor Pandzic, Nadia Magnenat Thalmann MIRALab - CUI University of Geneva 24 rue du Général-Dufour CH1211 Geneva 4, Switzerland {Marc.Escher, Igor.Pandzic, Nadia.Thalmann}@cui.unige.ch

More information

Speech Driven Synthesis of Talking Head Sequences

Speech Driven Synthesis of Talking Head Sequences 3D Image Analysis and Synthesis, pp. 5-56, Erlangen, November 997. Speech Driven Synthesis of Talking Head Sequences Peter Eisert, Subhasis Chaudhuri,andBerndGirod Telecommunications Laboratory, University

More information

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies?

M I RA Lab. Speech Animation. Where do we stand today? Speech Animation : Hierarchy. What are the technologies? MIRALab Where Research means Creativity Where do we stand today? M I RA Lab Nadia Magnenat-Thalmann MIRALab, University of Geneva thalmann@miralab.unige.ch Video Input (face) Audio Input (speech) FAP Extraction

More information

Face Synthesis in the VIDAS project

Face Synthesis in the VIDAS project Face Synthesis in the VIDAS project Marc Escher 1, Igor Pandzic 1, Nadia Magnenat Thalmann 1, Daniel Thalmann 2, Frank Bossen 3 Abstract 1 MIRALab - CUI University of Geneva 24 rue du Général-Dufour CH1211

More information

THE FAST evolution of digital technology in the last

THE FAST evolution of digital technology in the last 290 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 2, MARCH 1999 MPEG-4 Facial Animation Technology: Survey, Implementation, and Results Gabriel Antunes Abrantes, Student Member,

More information

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 2, APRIL 1997 429 Express Letters A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation Jianhua Lu and

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 2, MARCH

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 2, MARCH IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 2, MARCH 1999 277 The Facial Animation Engine: Toward a High-Level Interface for the Design of MPEG-4 Compliant Animated Faces

More information

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS Yen-Kuang Chen 1, Anthony Vetro 2, Huifang Sun 3, and S. Y. Kung 4 Intel Corp. 1, Mitsubishi Electric ITA 2 3, and Princeton University 1

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

Depth Estimation for View Synthesis in Multiview Video Coding

Depth Estimation for View Synthesis in Multiview Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract

More information

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Course Presentation Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Video Coding Correlation in Video Sequence Spatial correlation Similar pixels seem

More information

Blur Space Iterative De-blurring

Blur Space Iterative De-blurring Blur Space Iterative De-blurring RADU CIPRIAN BILCU 1, MEJDI TRIMECHE 2, SAKARI ALENIUS 3, MARKKU VEHVILAINEN 4 1,2,3,4 Multimedia Technologies Laboratory, Nokia Research Center Visiokatu 1, FIN-33720,

More information

Mesh Based Interpolative Coding (MBIC)

Mesh Based Interpolative Coding (MBIC) Mesh Based Interpolative Coding (MBIC) Eckhart Baum, Joachim Speidel Institut für Nachrichtenübertragung, University of Stuttgart An alternative method to H.6 encoding of moving images at bit rates below

More information

Module 7 VIDEO CODING AND MOTION ESTIMATION

Module 7 VIDEO CODING AND MOTION ESTIMATION Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

More information

Data-Driven Face Modeling and Animation

Data-Driven Face Modeling and Animation 1. Research Team Data-Driven Face Modeling and Animation Project Leader: Post Doc(s): Graduate Students: Undergraduate Students: Prof. Ulrich Neumann, IMSC and Computer Science John P. Lewis Zhigang Deng,

More information

An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June

An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June An Introduction to 3D Computer Graphics, Stereoscopic Image, and Animation in OpenGL and C/C++ Fore June Appendix B Video Compression using Graphics Models B.1 Introduction In the past two decades, a new

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Face analysis : identity vs. expressions

Face analysis : identity vs. expressions Face analysis : identity vs. expressions Hugo Mercier 1,2 Patrice Dalle 1 1 IRIT - Université Paul Sabatier 118 Route de Narbonne, F-31062 Toulouse Cedex 9, France 2 Websourd 3, passage André Maurois -

More information

A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS

A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS Xie Li and Wenjun Zhang Institute of Image Communication and Information Processing, Shanghai Jiaotong

More information

RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY. Peter Eisert and Jürgen Rurainsky

RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY. Peter Eisert and Jürgen Rurainsky RENDERING AND ANALYSIS OF FACES USING MULTIPLE IMAGES WITH 3D GEOMETRY Peter Eisert and Jürgen Rurainsky Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institute Image Processing Department

More information

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction Yongying Gao and Hayder Radha Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48823 email:

More information

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About

More information

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding

Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding 344 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 3, APRIL 2000 Model-Aided Coding: A New Approach to Incorporate Facial Animation into Motion-Compensated Video Coding Peter

More information

Model-Based Eye Detection and Animation

Model-Based Eye Detection and Animation Model-Based Eye Detection and Animation Dr. Aman Bhardwaj ABSTRACT In this thesis we present a system to extract the eye motion from a video stream containing a human face and applying this eye motion

More information

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Jeffrey S. McVeigh 1 and Siu-Wai Wu 2 1 Carnegie Mellon University Department of Electrical and Computer Engineering

More information

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami to MPEG Prof. Pratikgiri Goswami Electronics & Communication Department, Shree Swami Atmanand Saraswati Institute of Technology, Surat. Outline of Topics 1 2 Coding 3 Video Object Representation Outline

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Prashant Ramanathan and Bernd Girod Department of Electrical Engineering Stanford University Stanford CA 945

More information

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT

FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT FACE ANALYSIS AND SYNTHESIS FOR INTERACTIVE ENTERTAINMENT Shoichiro IWASAWA*I, Tatsuo YOTSUKURA*2, Shigeo MORISHIMA*2 */ Telecommunication Advancement Organization *2Facu!ty of Engineering, Seikei University

More information

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video

A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video 318 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 A Hybrid Temporal-SNR Fine-Granular Scalability for Internet Video Mihaela van der Schaar, Member, IEEE, and

More information

Compression of Stereo Images using a Huffman-Zip Scheme

Compression of Stereo Images using a Huffman-Zip Scheme Compression of Stereo Images using a Huffman-Zip Scheme John Hamann, Vickey Yeh Department of Electrical Engineering, Stanford University Stanford, CA 94304 jhamann@stanford.edu, vickey@stanford.edu Abstract

More information

Scalable Coding of Image Collections with Embedded Descriptors

Scalable Coding of Image Collections with Embedded Descriptors Scalable Coding of Image Collections with Embedded Descriptors N. Adami, A. Boschetti, R. Leonardi, P. Migliorati Department of Electronic for Automation, University of Brescia Via Branze, 38, Brescia,

More information

Semi-Supervised PCA-based Face Recognition Using Self-Training

Semi-Supervised PCA-based Face Recognition Using Self-Training Semi-Supervised PCA-based Face Recognition Using Self-Training Fabio Roli and Gian Luca Marcialis Dept. of Electrical and Electronic Engineering, University of Cagliari Piazza d Armi, 09123 Cagliari, Italy

More information

The Essential Guide to Video Processing

The Essential Guide to Video Processing The Essential Guide to Video Processing Second Edition EDITOR Al Bovik Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas AMSTERDAM BOSTON HEIDELBERG LONDON

More information

MANY applications in human computer interface, threedimensional

MANY applications in human computer interface, threedimensional 264 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 2, MARCH 1999 Compression of MPEG-4 Facial Animation Parameters for Transmission of Talking Heads Hai Tao, Member, IEEE,

More information

Reduced Frame Quantization in Video Coding

Reduced Frame Quantization in Video Coding Reduced Frame Quantization in Video Coding Tuukka Toivonen and Janne Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P. O. Box 500, FIN-900 University

More information

Facial Expression Analysis for Model-Based Coding of Video Sequences

Facial Expression Analysis for Model-Based Coding of Video Sequences Picture Coding Symposium, pp. 33-38, Berlin, September 1997. Facial Expression Analysis for Model-Based Coding of Video Sequences Peter Eisert and Bernd Girod Telecommunications Institute, University of

More information

Professor, CSE Department, Nirma University, Ahmedabad, India

Professor, CSE Department, Nirma University, Ahmedabad, India Bandwidth Optimization for Real Time Video Streaming Sarthak Trivedi 1, Priyanka Sharma 2 1 M.Tech Scholar, CSE Department, Nirma University, Ahmedabad, India 2 Professor, CSE Department, Nirma University,

More information

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding

Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 3, MAY 2004 265 Joint Matrix Quantization of Face Parameters and LPC Coefficients for Low Bit Rate Audiovisual Speech Coding Laurent Girin

More information

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations Prashant Ramanathan and Bernd Girod Department of Electrical Engineering Stanford University Stanford CA 945

More information

Modified SPIHT Image Coder For Wireless Communication

Modified SPIHT Image Coder For Wireless Communication Modified SPIHT Image Coder For Wireless Communication M. B. I. REAZ, M. AKTER, F. MOHD-YASIN Faculty of Engineering Multimedia University 63100 Cyberjaya, Selangor Malaysia Abstract: - The Set Partitioning

More information

Critique: Efficient Iris Recognition by Characterizing Key Local Variations

Critique: Efficient Iris Recognition by Characterizing Key Local Variations Critique: Efficient Iris Recognition by Characterizing Key Local Variations Authors: L. Ma, T. Tan, Y. Wang, D. Zhang Published: IEEE Transactions on Image Processing, Vol. 13, No. 6 Critique By: Christopher

More information

Advanced Encoding Features of the Sencore TXS Transcoder

Advanced Encoding Features of the Sencore TXS Transcoder Advanced Encoding Features of the Sencore TXS Transcoder White Paper November 2011 Page 1 (11) www.sencore.com 1.605.978.4600 Revision 1.0 Document Revision History Date Version Description Author 11/7/2011

More information

User Level QoS Assessment of a Multipoint to Multipoint TV Conferencing Application over IP Networks

User Level QoS Assessment of a Multipoint to Multipoint TV Conferencing Application over IP Networks User Level QoS Assessment of a Multipoint to Multipoint TV Conferencing Application over IP Networks Yoshihiro Ito and Shuji Tasaka Department of Computer Science and Engineering, Graduate School of Engineering

More information

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC Hamid Reza Tohidypour, Mahsa T. Pourazad 1,2, and Panos Nasiopoulos 1 1 Department of Electrical & Computer Engineering,

More information

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES P. Daras I. Kompatsiaris T. Raptis M. G. Strintzis Informatics and Telematics Institute 1,Kyvernidou str. 546 39 Thessaloniki, GREECE

More information

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears

More information

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform Torsten Palfner, Alexander Mali and Erika Müller Institute of Telecommunications and Information Technology, University of

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) Deformation BODY Simulation Discretization Spring-mass models difficult to model continuum properties Simple & fast to implement and understand Finite Element

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

White Paper: WiseStream II Technology. hanwhasecurity.com

White Paper: WiseStream II Technology. hanwhasecurity.com White Paper: WiseStream II Technology hanwhasecurity.com Contents 1. Introduction & Background p. 2 2. WiseStream II Technology p. 3 3. WiseStream II Setup p. 5 4. Conclusion p.7 1. Introduction & Background

More information

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework System Modeling and Implementation of MPEG-4 Encoder under Fine-Granular-Scalability Framework Final Report Embedded Software Systems Prof. B. L. Evans by Wei Li and Zhenxun Xiao May 8, 2002 Abstract Stream

More information

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain Author manuscript, published in "International Symposium on Broadband Multimedia Systems and Broadcasting, Bilbao : Spain (2009)" One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

Lip Tracking for MPEG-4 Facial Animation

Lip Tracking for MPEG-4 Facial Animation Lip Tracking for MPEG-4 Facial Animation Zhilin Wu, Petar S. Aleksic, and Aggelos. atsaggelos Department of Electrical and Computer Engineering Northwestern University 45 North Sheridan Road, Evanston,

More information

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Y. Vatis, B. Edler, I. Wassermann, D. T. Nguyen and J. Ostermann ABSTRACT Standard video compression techniques

More information

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 1, JANUARY 2001 111 A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

More information

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects.

Perceptual coding. A psychoacoustic model is used to identify those signals that are influenced by both these effects. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC Randa Atta, Rehab F. Abdel-Kader, and Amera Abd-AlRahem Electrical Engineering Department, Faculty of Engineering, Port

More information

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22)

(Refer Slide Time 00:17) Welcome to the course on Digital Image Processing. (Refer Slide Time 00:22) Digital Image Processing Prof. P. K. Biswas Department of Electronics and Electrical Communications Engineering Indian Institute of Technology, Kharagpur Module Number 01 Lecture Number 02 Application

More information

An Adaptable Neural-Network Model for Recursive Nonlinear Traffic Prediction and Modeling of MPEG Video Sources

An Adaptable Neural-Network Model for Recursive Nonlinear Traffic Prediction and Modeling of MPEG Video Sources 150 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 14, NO 1, JANUARY 2003 An Adaptable Neural-Network Model for Recursive Nonlinear Traffic Prediction and Modeling of MPEG Video Sources Anastasios D Doulamis,

More information

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS

ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS ADAPTIVE PICTURE SLICING FOR DISTORTION-BASED CLASSIFICATION OF VIDEO PACKETS E. Masala, D. Quaglia, J.C. De Martin Λ Dipartimento di Automatica e Informatica/ Λ IRITI-CNR Politecnico di Torino, Italy

More information

Image and Video Coding I: Fundamentals

Image and Video Coding I: Fundamentals Image and Video Coding I: Fundamentals Thomas Wiegand Technische Universität Berlin T. Wiegand (TU Berlin) Image and Video Coding Organization Vorlesung: Donnerstag 10:15-11:45 Raum EN-368 Material: http://www.ic.tu-berlin.de/menue/studium_und_lehre/

More information

MAXIMIZING BANDWIDTH EFFICIENCY

MAXIMIZING BANDWIDTH EFFICIENCY MAXIMIZING BANDWIDTH EFFICIENCY Benefits of Mezzanine Encoding Rev PA1 Ericsson AB 2016 1 (19) 1 Motivation 1.1 Consumption of Available Bandwidth Pressure on available fiber bandwidth continues to outpace

More information

CS 231. Deformation simulation (and faces)

CS 231. Deformation simulation (and faces) CS 231 Deformation simulation (and faces) 1 Cloth Simulation deformable surface model Represent cloth model as a triangular or rectangular grid Points of finite mass as vertices Forces or energies of points

More information

SURVEILLANCE VIDEO FOR MOBILE DEVICES

SURVEILLANCE VIDEO FOR MOBILE DEVICES SURVEILLANCE VIDEO FOR MOBILE DEVICES Olivier Steiger, Touradj Ebrahimi Signal Processing Institute Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne, Switzerland {olivier.steiger,touradj.ebrahimi}@epfl.ch

More information

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks 2011 Wireless Advanced On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks S. Colonnese, F. Cuomo, O. Damiano, V. De Pascalis and T. Melodia University of Rome, Sapienza, DIET,

More information

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec Proceedings of the International MultiConference of Engineers and Computer Scientists 8 Vol I IMECS 8, 19-1 March, 8, Hong Kong Fast Wavelet-based Macro-block Selection Algorithm for H.64 Video Codec Shi-Huang

More information

IST MPEG-4 Video Compliant Framework

IST MPEG-4 Video Compliant Framework IST MPEG-4 Video Compliant Framework João Valentim, Paulo Nunes, Fernando Pereira Instituto de Telecomunicações, Instituto Superior Técnico, Av. Rovisco Pais, 1049-001 Lisboa, Portugal Abstract This paper

More information

Topics for thesis. Automatic Speech-based Emotion Recognition

Topics for thesis. Automatic Speech-based Emotion Recognition Topics for thesis Bachelor: Automatic Speech-based Emotion Recognition Emotion recognition is an important part of Human-Computer Interaction (HCI). It has various applications in industrial and commercial

More information

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal.

Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual coding Both LPC and CELP are used primarily for telephony applications and hence the compression of a speech signal. Perceptual encoders, however, have been designed for the compression of general

More information

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 43 Content-Aware Modeling and Enhancing User Experience in Cloud Mobile Rendering and Streaming Yao Liu,

More information

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation Obviously, this is a very slow process and not suitable for dynamic scenes. To speed things up, we can use a laser that projects a vertical line of light onto the scene. This laser rotates around its vertical

More information

Facial Animation System Based on Image Warping Algorithm

Facial Animation System Based on Image Warping Algorithm Facial Animation System Based on Image Warping Algorithm Lanfang Dong 1, Yatao Wang 2, Kui Ni 3, Kuikui Lu 4 Vision Computing and Visualization Laboratory, School of Computer Science and Technology, University

More information

Accurate 3D Face and Body Modeling from a Single Fixed Kinect

Accurate 3D Face and Body Modeling from a Single Fixed Kinect Accurate 3D Face and Body Modeling from a Single Fixed Kinect Ruizhe Wang*, Matthias Hernandez*, Jongmoo Choi, Gérard Medioni Computer Vision Lab, IRIS University of Southern California Abstract In this

More information

coding of various parts showing different features, the possibility of rotation or of hiding covering parts of the object's surface to gain an insight

coding of various parts showing different features, the possibility of rotation or of hiding covering parts of the object's surface to gain an insight Three-Dimensional Object Reconstruction from Layered Spatial Data Michael Dangl and Robert Sablatnig Vienna University of Technology, Institute of Computer Aided Automation, Pattern Recognition and Image

More information

5LSH0 Advanced Topics Video & Analysis

5LSH0 Advanced Topics Video & Analysis 1 Multiview 3D video / Outline 2 Advanced Topics Multimedia Video (5LSH0), Module 02 3D Geometry, 3D Multiview Video Coding & Rendering Peter H.N. de With, Sveta Zinger & Y. Morvan ( p.h.n.de.with@tue.nl

More information

A METHOD TO MODELIZE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL

A METHOD TO MODELIZE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL A METHOD TO MODELIE THE OVERALL STIFFNESS OF A BUILDING IN A STICK MODEL FITTED TO A 3D MODEL Marc LEBELLE 1 SUMMARY The aseismic design of a building using the spectral analysis of a stick model presents

More information

VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING

VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING Engineering Review Vol. 32, Issue 2, 64-69, 2012. 64 VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING David BARTOVČAK Miroslav VRANKIĆ Abstract: This paper proposes a video denoising algorithm based

More information

Context-Adaptive Binary Arithmetic Coding with Precise Probability Estimation and Complexity Scalability for High- Efficiency Video Coding*

Context-Adaptive Binary Arithmetic Coding with Precise Probability Estimation and Complexity Scalability for High- Efficiency Video Coding* Context-Adaptive Binary Arithmetic Coding with Precise Probability Estimation and Complexity Scalability for High- Efficiency Video Coding* Damian Karwowski a, Marek Domański a a Poznan University of Technology,

More information

Animated Talking Head With Personalized 3D Head Model

Animated Talking Head With Personalized 3D Head Model Animated Talking Head With Personalized 3D Head Model L.S.Chen, T.S.Huang - Beckman Institute & CSL University of Illinois, Urbana, IL 61801, USA; lchen@ifp.uiuc.edu Jörn Ostermann, AT&T Labs-Research,

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

Investigation of the GoP Structure for H.26L Video Streams

Investigation of the GoP Structure for H.26L Video Streams Investigation of the GoP Structure for H.26L Video Streams F. Fitzek P. Seeling M. Reisslein M. Rossi M. Zorzi acticom GmbH mobile networks R & D Group Germany [fitzek seeling]@acticom.de Arizona State

More information

FACE RECOGNITION USING INDEPENDENT COMPONENT

FACE RECOGNITION USING INDEPENDENT COMPONENT Chapter 5 FACE RECOGNITION USING INDEPENDENT COMPONENT ANALYSIS OF GABORJET (GABORJET-ICA) 5.1 INTRODUCTION PCA is probably the most widely used subspace projection technique for face recognition. A major

More information

New Results in Low Bit Rate Speech Coding and Bandwidth Extension

New Results in Low Bit Rate Speech Coding and Bandwidth Extension Audio Engineering Society Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without

More information

FACIAL MOVEMENT BASED PERSON AUTHENTICATION

FACIAL MOVEMENT BASED PERSON AUTHENTICATION FACIAL MOVEMENT BASED PERSON AUTHENTICATION Pengqing Xie Yang Liu (Presenter) Yong Guan Iowa State University Department of Electrical and Computer Engineering OUTLINE Introduction Literature Review Methodology

More information

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction Compression of RADARSAT Data with Block Adaptive Wavelets Ian Cumming and Jing Wang Department of Electrical and Computer Engineering The University of British Columbia 2356 Main Mall, Vancouver, BC, Canada

More information

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 N15071 February 2015, Geneva,

More information

An Efficient Saliency Based Lossless Video Compression Based On Block-By-Block Basis Method

An Efficient Saliency Based Lossless Video Compression Based On Block-By-Block Basis Method An Efficient Saliency Based Lossless Video Compression Based On Block-By-Block Basis Method Ms. P.MUTHUSELVI, M.E(CSE), V.P.M.M Engineering College for Women, Krishnankoil, Virudhungar(dt),Tamil Nadu Sukirthanagarajan@gmail.com

More information

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation S. López, G.M. Callicó, J.F. López and R. Sarmiento Research Institute for Applied Microelectronics (IUMA) Department

More information

VIDEO streaming applications over the Internet are gaining. Brief Papers

VIDEO streaming applications over the Internet are gaining. Brief Papers 412 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 3, SEPTEMBER 2008 Brief Papers Redundancy Reduction Technique for Dual-Bitstream MPEG Video Streaming With VCR Functionalities Tak-Piu Ip, Yui-Lam Chan,

More information

3G Services Present New Challenges For Network Performance Evaluation

3G Services Present New Challenges For Network Performance Evaluation 3G Services Present New Challenges For Network Performance Evaluation 2004-29-09 1 Outline Synopsis of speech, audio, and video quality evaluation metrics Performance evaluation challenges related to 3G

More information

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions

JPEG compression of monochrome 2D-barcode images using DCT coefficient distributions Edith Cowan University Research Online ECU Publications Pre. JPEG compression of monochrome D-barcode images using DCT coefficient distributions Keng Teong Tan Hong Kong Baptist University Douglas Chai

More information

Performance Comparison between DWT-based and DCT-based Encoders

Performance Comparison between DWT-based and DCT-based Encoders , pp.83-87 http://dx.doi.org/10.14257/astl.2014.75.19 Performance Comparison between DWT-based and DCT-based Encoders Xin Lu 1 and Xuesong Jin 2 * 1 School of Electronics and Information Engineering, Harbin

More information

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Di Zhong a, Shih-Fu Chang a, John R. Smith b a Department of Electrical Engineering, Columbia University, NY,

More information

Efficient support for interactive operations in multi-resolution video servers

Efficient support for interactive operations in multi-resolution video servers Multimedia Systems 7: 241 253 (1999) Multimedia Systems c Springer-Verlag 1999 Efficient support for interactive operations in multi-resolution video servers Prashant J. Shenoy, Harrick M. Vin Distributed

More information

EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM

EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM TENCON 2000 explore2 Page:1/6 11/08/00 EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM S. Areepongsa, N. Kaewkamnerd, Y. F. Syed, and K. R. Rao The University

More information

Compression of Light Field Images using Projective 2-D Warping method and Block matching

Compression of Light Field Images using Projective 2-D Warping method and Block matching Compression of Light Field Images using Projective 2-D Warping method and Block matching A project Report for EE 398A Anand Kamat Tarcar Electrical Engineering Stanford University, CA (anandkt@stanford.edu)

More information

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Chuan-Yung Tsai, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute

More information