Adaptive Power Management of On-Chip Video Memory for Multiview Video Coding

Size: px
Start display at page:

Download "Adaptive Power Management of On-Chip Video Memory for Multiview Video Coding"

Transcription

1 Adaptive Power Management of On-Chip Video Memory for Multiview Video Coding Muhammad Shafique 1, Bruno Zatt 1,2, Fabio Leandro Walter 2, Sergio Bampi 2, Jörg Henkel 1 1 Karlsruhe Institute of Technology (KIT), Chair for Embedded Systems, Karlsruhe, Germany 2 Federal University of Rio Grande do Sul (UFRGS), Informatics Institute/PGMICRO, Porto Alegre, Brazil {muhammad.shafique, bruno.zatt, henkel}@kit.edu; {bzatt, bampi}@inf.ufrgs.br Abstract An adaptive power management of on-chip video memory for Multiview Video Coding is presented. It leverages texture, motion and disparity properties of objects and their correlations in the 3D-neighborhood. It groups different Macroblocks of a frame and predicts the highly-probable motion/disparity search direction in order to power-gate idle memory regions. Exploited are the statistical properties of Macroblock groups to predict idle sectors. Our approach achieves on average 32% and 61% energy reduction (averaged over various video sequences) compared to state-of-the-art DSW [7] and Level C [12], respectively. The Motion/Disparity Estimation architecture with video memory and power management scheme is implemented using an ASIC flow (IBM-65nm Low-Power technology) and it processes 4-view HD18p@33fps. Categories and Subject Descriptors: C.3 [Special-Purpose and Application-Based Systems]: Real-time and embedded systems; B.3.2 [Design Styles]: Cache Memories; I.4.2 [Compression (Coding)]: Approximate Methods General Terms: Algorithms, Design, Management Keywords: MVC, Video Coding, Motion Estimation, Disparity Estimation, Low-Power, Power-Management, On-Chip Memory, Video Memory, Adaptivity, Power-Gating I. INTRODUCTION AND RELATED WORK The Multiview Video Coding (MVC) standard [2] compresses the multiview video sequences (captured using multiple cameras) to realize emerging 3D-multimedia applications (like 3D-video recording/playback) on mobile devices [3][4]. MVC provides 2%-5% improved compression compared to simulcast H.264 (i.e. independent encoding of each view) by employing multiple block-sized Motion and Disparity Estimation (ME, DE) that exploit temporal and interview correlations at the cost of significantly high complexity and energy consumption [3]. Typically, ME/DE accounts for more than 9% of the total MVC energy consumption, out of which the major energy consuming part is the (on-chip and off-chip) memory [7]. Therefore, memory is the key focus for energy reduction in ME and DE in order to implement MVC on battery-powered devices. The high memory energy consumption is primarily due to the frequent access of reference pixel data used in SAD (Sum of Absolute Differences) computations during the block matching process [7]. ME and DE search the best match of a Macroblock (MB, 16x16 pixel block) in different search directions (i.e. neighboring reference frames in the left, right, top, and down directions). For a given search direction, the search is performed in a predefined search window such that the reference pixels in a search window can be used for Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 212, June 3-7, 212, San Francisco, California, USA Copyright 212 ACM /12/6...$1. multiple SAD computations (see Section S1 for ME/DE overview). The search direction that contains the best matching of an MB is denoted as the best search direction. State-of-the-art techniques employ an on-chip memory 1 to incorporate search window prefetching and data reuse [12]-[15], search window follower [11], or asymmetric search windows [16] for reducing the off-chip memory power. These on-chip memories suffer from non negligible leakage power due to their large footprint ( 2 Mbit memory is required for a search range of ±128 and 4 search directions). Furthermore, not all parts of the search window stored in the on-chip memory are accessed because of the adaptive nature of fast ME/DE algorithms (like TZ Search [5]) and diverse texture/motion properties of MBs (see memory usage analysis in Section II.A). To address this issue, the work of [14] employs adaptive window sizing. However, this work targets a fixed Four-Step Search and does not account for DE and leakage power, which is a crucial power component. To reduce the leakage power, advanced techniques in powergating switch-off parts of the on-chip memory using sleep transistors with multiple sleep modes [17][18][27]. Some sleep modes are dataretentive, i.e. data loss in the memory is avoided while providing relatively little leakage savings. For power management, state-of-theart techniques employ prediction techniques based on either hardware monitoring [24] or exploiting limited application knowledge at frame level []. As a result, these techniques perform power-wise inefficient due to severe miss-predictions under high variations of memory usage. The work [29][3] illustrates the feasibility of application-aware power management for power-gating idle ASIP cores in a multimedia pipelined processor. The work in [7][8][9] presents an MVC ME/DE architecture with an on-chip video memory and a dynamic search window formation algorithm. The power-gating scheme evaluates the predicted memory usage of consecutive MBs to make a power-gating decision for the idle memory sectors, but do not employ methods (like computation reordering) to increase sleep durations. The above-discussed techniques provide limited leakage savings as they do not exploit (i) the relationship of MB properties (texture, motion, and disparity) with the distribution of ME and DE as the best search direction, (ii) best search direction and memory usage correlation in the 3D-neighborhood (i.e. spatial, temporal, and view domains). These multiview video content characteristics may provide a higher potential for leakage savings. Summarizing: in order to realize ME/DE of real-time full-hd MVC with low-power consumption, an adaptive power management scheme for on-chip memory is required that leverages the multiview video content characteristics at various levels (search direction, frame, MB, etc.) to predict the memory requirements (number and duration of idle memory sectors) and to power-gate them in an appropriate sleep mode. Before proceeding to our novel contribution, we present an analysis of the best search direction and memory usages during ME/DE, which provides the motivation and foundation for this work. 1 An on-chip memory in this paper denotes an on-chip video memory. 866

2 II. MOTIVATION AND NOVEL CONTRIBUTION A. Motivational Analysis of Motion and Disparity Estimation Our experiments on the Rena test video sequence in Fig. 1 and Fig. 2 illustrate the distribution of ME and DE as the best search direction (see Section S2 for detailed analysis for other test video sequences). Fig. 1 depicts that the majority of the Macroblocks (MBs; 7%-9% cases) are encoded using ME as the best search direction. Note, in case of the first view V, all MBs are encoded using ME, because no neighboring views are available for prediction. Similarly, in case of other views (V1-V2), for the first frame of each GOP (Group of Pictures; i.e. T and T8 in Fig. 2), only DE is performed. When performing a detailed analysis of various frames in view V1 at QP=27 (Fig. 2), it can be observed that background objects (lowtexture, low-motion, static blocks) are mostly encoded using ME. In contrast, foreground objects (medium-high texture, high motion) are encoded using DE. Moreover, some background objects with medium-high texture may also be coded using DE (see curtains in, Fig. 2). The decision of ME and DE also depends upon the available correlation in the temporal domain. For instance, the number of DE-coded MBs is higher in compared to. Hint-1: If the best search direction (ME or DE) can be correctly predicted, significant energy savings can be obtained by avoiding ME/DE over unused search directions and power-gating the sectors storing the search windows for these unused search directions. It will also lead to a reduced amount of external memory transfers and computations. The key is to use the texture and motion/disparity properties of the MBs in the 3D-neighborhood for correct prediction of the best search direction. Mode Distribution [%] 1 5 V V1 V2 1 5 Rena QP ME V V1 V2 Ballroom DE QP Fig. 1 (a) ME and DE distribution for three views of Rena and Ballroom test multiview video sequences Disparity Estimation Motion Estimation T T3 T6 T7 T8 Fig. 2 ME/DE distribution in view V1 of Rena sequence In state-of-the-art schemes [7]-[16], ME/DE of MBs is processed in a raster scan order. However, our experiments (Fig. 2, see also Section S2) illustrate that objects often consist of MBs that do not lie on the raster scan order. Therefore, these schemes suffer from severe variations in the memory usage as MBs of different objects typically exhibit diverse memory requirements for ME/DE. This leads to reduced sleep durations and frequent wakeups of memory sectors, thus low leakage savings. Our analysis in Sections IV.A, S2 shows that different MBs sharing similar texture and motion/disparity properties have similar memory requirements. Hint-2: Longer sleep durations (thus higher leakage savings) can be achieved if the ME/DE processing of the MBs with similar properties is performed together, i.e. in a non-raster scan order. The key challenges are MB grouping and ME/DE computations reordering. Summarizing the analysis, the key research challenges for reducing the power of on-chip video memory of ME/DE are: a) Grouping the MBs of a frame w.r.t. their texture, motion, and disparity properties, b) Adaptively predicting the best search direction for MBs in different groups to power-gated on-chip memory sectors of unused search directions, c) Reordering the ME/DE processing computations to increase the sleep durations of on-chip memory sectors, d) Leveraging the multiview video content characteristics to enable a content-driven power management at various granularities (group-level, MB-level). B. Overview of Our Concept and Novel Contributions To address these challenges, a novel adaptive power management scheme is proposed for on-chip video memory that incorporates: 1. An MB-Group Formation Scheme (Section IV.A) that performs texture and activity (i.e. motion and disparity) classification for MBs considering the correlated neighboring MBs in their 3D-neighborhood (i.e. spatial, temporal, view domains). This classification is used to form groups of MBs that share similar texture and activity properties. 2. An Adaptive ME/DE Search Direction Prediction Algorithm (Section IV.A) that adaptively predicts the highlyprobable ME/DE search direction for MBs in different groups based on the best search direction correlation in the 3Dneighborhood and their respective texture differences. For each MB in a group, our scheme power-gates the memory sectors of the unused search direction. 3. A Content-Driven Power Management Scheme (Section IV.B) that leverages the multiview video content characteristics to manage the power at multiple levels (i.e. search direction, MB-group, MB). III. MEMORY AND POWER MODELS Now, we describe the model of our multibank on-chip video memory [7][8] and power-gate model of [17] which is used in this work to enable power-gating of multibank memory at a fine-granularity. Memory Model: The on-chip memory consists of N B number of banks. Each bank B k; k=1 NB contains N S number of sectors each having S L number of 128-bit memory lines (see Fig. 5 for an abstract view). The size of a sector is given as S=S L 128 bits. In order to provide parallel data access for SAD computing hardware accelerators, different rows of an MB are stored in different banks. The leakage energy is given as E Leak =T MEDE P Leak, where the T MEDE denotes the time for processing motion and disparity estimation. The miss energy is given as: E Miss = i=1 NMiss E Missi, where N Miss is the number of misses. Such a memory model can also be realized with multiple SRAM blocks, each having multiple sub-arrays [27] or considering the SRAM model of [26]. Power-Gate Model: We assume a power-gate model with three power modes: P ON, P DR, and P OFF. P ON is the Power-ON mode. P DR is the Data-Retentive (DR) low-leakage mode that preserves the data in SRAM cells. P OFF is the Power-OFF mode with data loss; it requires re-fetching of data from the external memory. Fig. 3 shows the power state machine with leakage energy savings and wakeup latency/energy overhead [17]. The wakeup latency of P DR is quite short compared to P OFF, therefore, it is beneficial for short sleep durations (see values in Table I; Section S3). Contrarily, P OFF is beneficial for long sleep durations. Multiple sleep modes facilitate different wakeup-overhead vs. leakage-saving tradeoff options. Since different collocated sectors in different banks store 867

3 µ M 1 M2 Group G1 G2 G3 G2 2σ Mem. Requirements Fig. 4 Rena test video sequence encoded at QP=22: (a) Distribution of ME and DE; (b) Macroblock grouping w/ computation reordering; (c) Distribution of memory usage for ME; (d) Memory requirement prediction using PDFs for ON, OFF, and DR mode. P ON σ PDR P OFF the data from the same MB, same sleep control is issued to these sectors. Power-gating at sector level enables a fine-grained power management control. Similar style of power-gating can be found in sub-array level power-gating [27] or even further fine-grained using wordline-level power gating [28]. P ON E ON =ΣV dd.i i.t i P DR E DR =E ON. Φ S1 P OFF E OFF = E DR ON =E wakeup ξ 1 E wakeup=½.c circuit V 2 dd T wakeup =C circuit V 2 dd /I Fig. 3 Power state machine with multiple sleep modes [17] IV. OUR ADAPTIVE POWER MANAGEMENT OF ON-CHIP VIDEO MEMORY FOR ME/DE IN MVC Fig. 5 shows an overview of our adaptive power management scheme (novel contribution in green boxes) for an on-chip multibank video memory integrated with an ME/DE architecture. Multiview Videos Offline Statistical Analysis (Search direction and memory usage analysis; Section II.A) Our Adaptive Power Management Scheme Macroblock Grouping (texture and motion classification; Section IV.A) Adaptive ME/DE Search Direction Prediction (Section IV.A) MB Group Memory Usage Prediction (Section IV.B) Content Driven Power Management (MB Grouplevel, MB level; Section IV.B) Monitoring (mem usage, etc.) SAD Accelerators Core Processor (Executing an ME/DE algorithm) V dd Bank 1 Bank 2 Bank n Ctrl. ST Sector ST MVC Video Encoder Fig. 5 ME/DE architecture with an on-chip memory and our power management scheme (novel contribution in green boxes) Our scheme works in five phases: i) Macroblock (MB) Grouping: First, the texture and activity classification of MBs is performed and MBs with similar texture and activity properties are grouped together, ii) Predicting the highly-probable best search direction based on the correlation in the 3D-neighborhood, iii) Predicting the memory usage of MB-groups using a statistical analysis of the memory usage of different groups and memory usage correlation of same groups in 3D-neighborhood, iv) Power-gating the unused memory sectors in appropriate sleep modes based on the predicted memory requirements, v) Computation-reordering and fine-tuning the power modes of different sectors at MB level: Since all MBs in a given group exhibit similar memory requirements for ME and DE, ME/DE processing computations of MBs are reordered in order to increase the sleep durations of on-chip memory sectors. Computation reordering is performed within a group, where the next MB for ME/DE processing is selected by evaluating its texture difference w.r.t. to the currently processed MB. A. Macroblock Grouping and Search Direction Prediction As discussed in Section II.A, in a conventional raster scan coding order, ME/DE is performed for all MBs in a row-wise fashion. Each row typically has MBs from different objects that typically span over many MBs both horizontally and vertically (see Rena dancing picture in Fig. 2 and Fig. 4). Since MBs from different objects exhibit distinct memory usage properties, that results in memory usage variations that lead to short sleep durations and frequent ON and OFF switching of the unused memory sectors. To avoid this, our scheme aggregates different MBs that share similar texture and activity (i.e. motion and disparity) properties in socalled MB-groups (see an example in Fig. 4a). Fig. 6 shows the algorithm for MB grouping. The input is the frame F (T,V), where T denotes the temporal location of the frame in view V. Other inputs are variance of the MB (σ, Eq. 2) as the lightweight texture approximation and texture difference (ξ, Eq. 3) w.r.t. the neighboring MBs in the 3D-neighborhood (i.e. spatial, temporal, and view domains). There are 4 spatial, 18 temporal, and 18 view neighboring MBs (see Fig. 15 in Section S1). First, the texture classification of the current MB is performed as low-texture (L), medium-texture (M), and high-texture (H); line 5. Afterwards, the matching neighbors (i.e. MBs in the 3D-neighborhood having similar texture properties as of the current MB) are found (lines 6-7). Since the MBs with similar texture properties most-probably belong to the same object, these MBs share the motion/disparity properties, i.e. so-called activity. Therefore, the activity of the current MB is predicted as low-motion (L), medium-motion (M), and high-motion (H) from the average activity of the matching neighbors (lines 8-9). Based on the texture and activity classification, an MB is assigned to a group, such that all the MBs in that group exhibit similar texture and activity properties (lines 1-11). The output is the composition of all three groups and the set of matching neighbors (line 13). 1. groupmbs(input: Frame F (T,V), Variance σ, Texture Difference ξ; Output: MB-Group G, Matching Neighbors N match ) 2. G ; N match ; 3. mb F (T,V) { 4. N MBMatch ; 5. T: = ( σmb τσ1)? L: ( σ MB > τσ2 )?H:M; 6. N mb.getneighbors( ); // see Fig. 15 in Section S1 7. n N if ( ξn < τξ) NMBMatch NMBMatch n; 8. M: = n N ( ν X + νy ) size ( N MBMatch ); MBMatch n 9. β : = ( M MB τ m1)?l:( M MB >τ m2 )?H:M; 1. G MB : = ( T = L& β= L )?G 1 :( T = H& β = H )?G 3 :G 2 ; 11. G.store(mb, G MB ); 12. N match.store(mb, NMBMatch ); } 13. return(g, N ma tch ); Fig. 6 Pseudo-code for macroblock grouping The thresholds (τ σ1, τ σ2, τ ξ, τ m1, τ m2 ) are obtained using the statistical distribution analysis of texture and activity properties of MBs of numerous background and foreground objects in various test video sequences (like Rena, Ballroom, Vassar, etc.) [1]. Highly-probable value of these thresholds are obtained as µ+3 σ (µ denotes the mean, σ denotes the standard deviation) using the probability density functions (PDF) following a Gaussian distribution; Eq

4 F(µ k +3σ k ; µ k, σ 2 k ) - F(; µ k, σ 2 k ) k=[variance, motion/disparity vectors].99 (1) σ ( ) 2 MB = i= 1 j= 1ρ(i,j) ρavg 6 (2) ξ n = σ CurrM B σ n σ CurrM B (3) MBs in a group share the best prediction direction due to their correlation as they most-probably belong to the same object. Fig. 4 illustrates an example scenario, where for the MBs of the dancing girl (group G3), DE is selected as the best search direction. In contrast, for the MBs of the background curtains (group G1), ME is selected as the best search direction. Therefore, grouping MBs also provides a potential for search direction prediction for the complete group (as we will discuss using Fig. 7). Note, in case of group G2, the decision becomes challenging as in case of mediumtexture nature with slow-medium motion, the best match can be found using ME or DE. Therefore, in case of group G2, our scheme adaptively selects the highly-probable search direction depending upon 3D-neighborhood. Adaptive Search Direction Prediction: Fig. 7 shows the algorithm for adaptively predicting the highly-probable best search direction for three groups (Fig. 6). As discussed in Section II.A and Fig. 4, background/low-textured MBs with low-motion (i.e. MBs in the group G1) are typically encoded using ME, and MBs with hightexture and high-motion (i.e. MBs in the group G3) are encoded using DE. Therefore, our algorithm predicts ME and DE as the best search directions for groups G1 and G3, respectively (lines 2-3). The decision about the MBs in the group G2 is made adaptively by taking into consideration the best search directions of the matching neighboring MBs (lines 4-12). If there are sufficient number of matching neighbors (for a high confidence of prediction), a prediction is performed considering the texture difference of the matching neighbors (lines 6-9). A cost cost ME is computed by accumulating the inverse of texture differences for all neighbors with ME as the best search direction (line 8). Similarly, cost DE is computed (line 9). If cost ME is greater than or equal to cost DE, ME is predicted as the best search direction, otherwise, DE is selected (lines 1-11). In case of insufficient correlation in the 3D-neighborhood, ME is predicted as the best search direction (line 12). Finally, the best search direction D Best is returned (line 13). 1. searchdirectionprediction(input: MB-Group G, Matching Neighbors N match, Texture Difference ξ; Output: Best Search Directions D Best ) 2. if ( G 1) mb G 1 mb.d Best : = ME; 3. if ( G 3) mb G 3 mb.d Best : = DE; 4. if ( G 2 ) { // adaptively select ME or DE for MBs in group G2 5. mb G2 6. if ( size (mb.n match ) >τmatch ) { 7. n mb.nmatc h 8. if ( n.d Best = ME ) cost ME : = cost ME + (1/ ξn ); else cost DE : = cost DE + (1/ ξn ); preddir : = (costme cost DE)? ME : DE; 11. mb.d Best : = preddir; } 12. else mb.d Best : = ME; } 13. return D Best; Fig. 7 Pseudo-code for adaptive search direction prediction For each MB, only one motion or disparity search in the selected search direction is performed. It leads to significant energy savings by avoiding external memory transfers and excessive computations. Furthermore, the sectors storing the search windows for the unused search directions are power-gated to reduce the leakage, which provides further energy savings (see Fig. 8). Note, Fig. 4a shows that in case of group G1 there are a few MBs that have DE as the best search direction. However, our scheme predicts ME as the best search direction for the group G1, so it might incur some video quality loss. Furthermore, a missprediction may also results in quality loss. Experiments in Section V, S3 show that this loss is visually imperceptible. B. Video Content-Driven Power Management Once the MB-groups are formed, the challenge is to accurately predict the memory usage requirements of an MB-group. The key is to leverage the multiview video content properties and the offline-statistical analysis of memory usage of different groups. Step-1: Memory Usage Prediction of MB-Groups: Fig. 4c shows the memory usage of different groups, where the memory usage of G1 is much lower than in other groups. Our scheme computes two different highly-probable memory requirement predictions (M 1 and M 2 ) from the probability density function (PDF obtained through an offline-analysis over various test video sequences, see details in Section S2). The M 1 amount of memory is kept in P ON mode as the probability of using these memory sectors is high. The memory requirement M 2 -M 1 is kept in the P DR mode, as others MBs of the same group may use this data and the wakeup overhead is minimal to avoid delay. Fig. 4d shows an abstract representation of obtaining these predictions. The memory requirements [M 1, M 2 ] of an MB-group can also be predicted with a high accuracy from the memory usages of the same MB-group in the neighboring frames or even views. (see experimental evidence in Section S2). These predicted memory requirement values are then forwarded to the power-management scheme to determine the number and mode of gated sectors. Step-2: MB-Group-Level Power Management: Fig. 8 presents the algorithm of our content-driven power-management. First the MB grouping is performed (line 2). Afterwards, each group is sequentially processed, i.e. ME/DE of the MBs from the group G1 is processed first followed by MBs from groups G2 and G3, respectively. It demonstrates the first reordering of the ME/DE computations, as MBs are now processed in a non-raster scan order (lines 3-28). The second reordering occurs when processing MBs within a group (lines 23-27). First, for each group, the best search direction is predicted and the memory sectors of the unused search directions are powergated in power state P OFF (line 4-5), as they will not be used during the complete ME/DE of this MB. Afterwards, the highly-probable memory usage is predicted from the PDF obtained by the offline statistical analysis (line 6); as also shown in Fig. 4. Based on this predicted memory usage, number of sectors that are candidate for power-gating in different power modes (P ON, P OFF, P DR ) are computed (lines 7-8). To cope with the potential misprediction, the correlation of the monitored memory usage of similar MB-group in the 3D-neighborhood is exploited (line 9). For G1 and G3, average memory usages of the same group in the temporal neighbors (Frame Left, Frame Right ) and in the disparity neighbors (Frame Top, Frame Down ) are considered, respectively. For G2, the average of all the four neighbors is computed. The candidate sectors for powergating in P ON and P DR power modes are determined considering this correlated memory usage (line 1). PDF-based and neighborhood-based predicted memory usages are averaged to obtain the number of sectors that are candidate for power-gating in P ON, P OFF, and P DR power modes (lines 11-12). To amortize the wakeup energy overhead, our scheme predicts the leakage energy benefit of gating sectors in different power modes. For this, first the sleep duration is predicted as the predicted ME/DE processing time of all the MBs in the group (line 13). The ME/DE of an MB is predicted as the average of the ME/DE processing time of all the matching neighbors in the 3Dneighborhood. Afterwards, the leakage savings are compared with the wakeup energy overhead and the sectors are set in their respective power modes (lines 14-17). In case of P OFF, additionally E MissGroup is considered as P OFF results in the loss of data in memory sectors and require a re-fetching (line 16). 869

5 1. ContentDrivenPM(Input: Frame F (T,V) ) 2. (G, N match ) groupmbs(f (T,V), σ, ξ ); // Fig g G { 4. DBest searchdirectionprediction(g, N match, ξ ); // Fig if (G 1 or G 3) PowerGate(D / D Best, P OFF,S(D/DBe st )); 6. [M 1,M 2] MemUsagePDF(g); // Fig S OFF(PDF) : = (S M 2) S Sector ; S DR(PDF) : = (M2 M 1) S Sector ; 8. S ON(PDF) : = S S Sector (SOFF + S DR ); 9. MNbs AVG d N N=[Left,Right,Top,Down] d.group(g).getmemusage( ); 1. S ON(Nbs) : = MNbs S Sector ; S DR(Nbs) : = AVG mb g mb.s DR ; S ON : = (SON(Nbs) + S ON(PDF) )/2; S DR : = (SDR(Nbs) + S DR(PDF) )/2; S OFF : = (S SON S DR ); 13. mb g mb.t pred : = AVG n Nmatch n.t MEDE; 14. if (( mb g mb.tpred PLeak(SDR) ) > Ewakeup(DR ON) ) 15. PowerGate(g,P DR,S DR ); 16. (( mb g mb.tpred PLeak(OFF) ) > ( wakeup(off ON) + E MissGroup )) 17. PowerGate(g,P OFF,S O FF); 18. e lse PowerGate(g, P DR,SDR ); 19. PowerON(g,S ON ); 2. g ' g; mb ' ; mb g.getfirstmb( ); 21. while(g ' ) { 22. if (G 2) PowerGate(D / mb.d Best, P OFF,S(D/mb.D Be st )); 23. mb ' g '.getcorrelatedmb(mb,mb n N N = [Left,Right,Top, Down] ); 24.. if (mb ' = ) mb ' g '.getnextmb(mb); MBLevelPM(mb',S ON,S OFF,S DR ); 26. [E MEDE,E Miss,E Leak,M mb] performsearch(mb',d B est ); } g' g'/mb'; } Fig. 8 Pseudo-code for our content-driven power management Step-3: Computation Reordering: In the next step, MBs of the group are processed one-by-one (lines 21-27, Fig. 8). As discussed earlier in Section I, processing ME/DE of MBs in a raster scan order results in frequent sleep and wakeup fluctuations, as MBs in a row may belong to different objects. Since an object typically spans over MBs of different rows (see Fig. 4, Section S2), sleep durations of the unused sectors can be lengthened (thus increasing the potential to put them in P OFF mode) by processing MBs group by group, as MBs of the same group exhibit similar memory requirements. This will reduce the sleep-wakeup fluctuations and lead to relatively higher leakage savings. Fig. 4 shows that MBgroups can be of non-rectangular shape and the ME/DE processing order of MBs in different groups is non-raster scan order; see Fig. 4b for a possible ME/DE processing order of MBs in group G1. To avoid sleep-wakeup fluctuations at fine-granularity, even the computations inside the MB-groups are reordered. Inside the group, the next MB for ME/DE processing is selected by evaluating its texture difference w.r.t. to the current MB, such that consecutively executing MBs exhibit similar memory requirements. The algorithm in Fig. 8 first determines a correlated MB in the spatial neighborhood (Left, Right, Top, Down); line 23. Then MB-level fine-grained power-management (see Fig. 9) is performed; line. Afterwards, the ME/DE is performed based on the decision of the best search direction and E MEDE, E Miss, E Leak, M mb are monitored as the ME/DE processing energy, miss energy, leakage energy, and actual memory usage, respectively (line 26). Step-4: Macroblock-Level Power Management: Fig. 9 shows the algorithm for MB-level power management. First the memory requirements of the MB are predicted from the matching neighbors (line 2) and the number of required memory sectors is computed (line 3). In case the number of required sectors is equal to the number of ON sectors, power modes of different sectors are not changed (line 5). In case the required memory is less than the P ON memory, the difference is put into data-retentive sleep mode P DR (lines 6, 8). Otherwise, more sectors are powered-on from P DR mode to P ON mode (lines 7, 8). 1. MBLevelPM(Macroblock mb, Group-Level number of memory sectors in different power modes S ON, S OFF, S DR ) 2. M pred : = AVG n mb.nmatch n.getmemusage( ); 3. S MB : = Mpred S Sector ; 4. Δ S: = SON S MB; 5. if ( Δ S == ) return ; 6. else if ( Δ S > ) SONmb : = SON Δ S; S DRmb : = SDR + ΔS; 7. else S ONmb : = SON +Δ S; S DRmb : = SDR Δ S; 8. PowerGate(g,P DR,S DRmb); PowerON(g,S ONmb); Fig. 9 Pseudo-code for macroblock-level power management V. RESULTS AND EVALUATION For energy and quality comparison, several multiview video sequences with different resolutions are used; VGA (48x64; Ballroom, Exit, Flamenco2, and Vassar ) and XGA (124x768; Breakdancers and Ballet ) [1]. Rena is a part of the training set, so we do not employ it for the evaluation to avoid biasing effects. Further test conditions are: TZ Search ME/DE algorithm, 193x193 search window, QP={22,27,32,37}. Note, the energy results include the overhead of our scheme. A. Comparison to State-of-the-Art We compare the on-chip memory energy savings of our adaptive power-management scheme with state-of-the-art memory energy reduction techniques like Level-C and Level-C+ [12] (search window-data reuse) and a memory power management scheme with dynamic search windows (DSW) [7]. For fairness of comparison same test videos and QP set are used for all schemes. Fig. 1 shows the on-chip energy consumption normalized to Level-C+ that presents the highest energy consumption among all schemes. Level- C+ and Level-C incur significant on-chip memory energy due to their large-sized search window that is active all the time, i.e. not exploiting the idle periods of memory to save power. Compared to Level-C and Level-C+, our scheme provides on average 61% and 67% on-chip memory energy reduction, respectively. Compared to the DSW scheme [7] that employs power-gating of memory sectors based on the memory requirements of consecutive MBs, our scheme provides on average 32% higher energy savings. Energy reduction is achieved by (i) increasing sleep duration using computation reordering, (ii) power-gating memory sectors due best search direction prediction, and (iii) leveraging the video content knowledge for a multi-level power management. On average, 51% of the sectors are in P OFF mode while 9.5% are in P DR mode (see further details in Section S3). Our experiments show that high motion/texture allow relatively less energy savings because more data from search area is accessed and less sectors are gated. On Chip Energy Normalized to Level C Level C+[12] Level C[12] DSW'11[7] Our Ballroom Exit Flamenco2 Vassar Bkdancers Ballet Fig. 1 On-chip memory energy savings comparison B. Overhead: Mispredictions and Memory Misses Fig. 11a shows that our scheme predicts the best search direction with an accuracy of 87% for high-activity sequences and 94% for low-activity sequences. This incurs a video quality loss of average.54 db BD-PSNR (Bjøntegaard Delta PSNR) with an average increase of 1.86% BD-BR (Bjøntegaard Delta Bitrate), compared to the exhaustive search of JMVC 6. [2]. However, this loss is 87

6 visually imperceptible (see Fig. 21, Section S3). Due to the predictive nature, our scheme incurs on average 8.5% on-chip memory misses compared to when storing the complete search window Fig. 11b. However, including all latency overhead due to misprediction and the power-management decision logic, our scheme still provides a minimum throughput of 33fps (see Fig. 12). ME/DE Prediction[%] Hits Misses Ballroom Exit FlamencoVassarBDancer Bkdancers Ballet On Chip Mem. Misses [%] Ballroom ExitFlamencoVassar BDancerBallet Bkdancers Ballet Fig. 11 Our ME/DE mispredictions and on-chip memory misses C. Hardware Implementation The hardware prototype is implemented using an ASIC flow using the Cadence tool chain for standard-cell synthesis with an IBM 65nm Low-Power technology. The IC layout and comparison table is shown in Fig. 12. The designed architecture employs 64x4-sample SAD operators and 21 SAD trees fed by the 16 on-chip memory banks. Compared to the state-of-the-art our architecture reduces the on-chip energy by 76% and % when compared to [13] and [7], respectively. Note, the work of [13] is implemented in 9nm technology. Considering a 3% power reduction (in case of SRAMs) when moving from 9nm to 65 nm technology node [6], our proposed scheme still provides >6% reduction in the on-chip energy. The provided throughput is capable of providing real-time HD18p ME and DE at 33fps. The performance increase in relation to [7] is mainly due to the complexity reduction resulting from the search direction prediction. Note the 8x increase in the number of on-chip bits in comparison to [13] is due to the different-sized search windows. Our scheme supports 193x193 search windows (which are mandatory for DE to provide good video quality), while the architecture of [13] supports 33x33 search windows. Memory Bank Memory Bank 1 Memory Bank 2 Memory Bank 3 Memory Bank 4 Memory Bank 5 Memory Bank 6 Memory Bank 7 Memory Bank 8 Memory Bank 9 Memory Bank 1 Memory Bank 11 Memory Bank 12 Memory Bank 13 Memory Bank 14 Memory Bank 5 SAD Units AGU ME/DE Ctrl Technology Tsung'9 [13] DSW 11 [7] TSMC 9nm Low Power LowK Cu ST 65nm LP 7 metal layer Our ST 65nm LP 7 metal layer Gate Count 23k 12k 14k SRAM 64 Kbits 512 Kbits 512 Kbits Frequency 3 MHz 3 MHz 3 MHz Power 265mW, 1.2v 74mW, 1.v 63mW, 1.v Throughput (Resolution, Frame Rate) 4-views 4-views 4-views APM SAD Units: Sum of Absolute Differences Operators ME/DE Ctrl: Motion/Disparity Estimation Control Others AGU: Address Generation Unit APM: Adaptive Power Management Fig. 12 (a) Chip Layout, (b) Hardware results comparison VI. CONCLUSIONS We propose a novel adaptive power management scheme for onchip video memory targeting MVC. It leverages the multiview video content knowledge and computation reordering to achieve high energy savings with an imperceptible video quality loss. Key enabling attributes are MB-grouping based on texture and activity classification, best search direction prediction, and a video contentdriven multi-level power management policy. Our scheme achieves on average 32%-61% on-chip energy reduction compared to state-of-the-art [7][12]. We demonstrate the potential of leveraging the multiview video properties for low-power MVC realization on battery-powered devices. REFERENCES [1] Y. Su, A. Vetro, A. Smolic, Common Test Conditions for Multiview Video Coding, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, Doc. JVT-7, July 26. [2] JMVC 6., garcon.ient.rwthaachen.de, Sep. 29; Joint Draft 8. on Multiview video coding, JVT-AB24, 28. [3] P. Merkle et al., " Efficient Prediction Structures for Multiview Video Coding" IEEE TCSVT, vol.17, no.11, pp , 27. [4] Lynx: [5] J. Yang et al., "Multiview video coding based on rectified epipolar lines", International CICSP, pp. 1-5, 29. [6] Cypress Seminconductor Corp., Advantages of 65 nm Technology over 9 nm Technology QDR Family of SRAMs, 21. [7] B. Zatt, M, Shafique, F. Sampaio, L. Agostini, S. Bampi, J. Henkel, "Run-time adaptive energy-aware motion and disparity estimation in multiview video coding", IEEE DAC, pp , 211. [8] B. Zatt, M, Shafique, S. Bampi, J. Henkel, "A Low-Power Memory Architecture with Application-Aware Power Management for Motion & Disparity Estimation in Multiview Video Coding", IEEE ICCAD, pp. 4-47, 211. [9] B. Zatt, M, Shafique, S. Bampi, J. Henkel, "Multi-Level Pipelined Parallel Hardware Architecture for High Throughput Motion and Disparity Estimation in Multiview Video Coding", IEEE DATE, pp , 211. [1] M, Shafique, B. Zatt, J. Henkel, "A Complexity Reduction Scheme with Adaptive Search Direction and Mode Elimination for Multiview Video Coding", Picture Coding Symposium, 212. [11] S. Saponara, L. Fanucci, "Data-adaptive motion estimation algorithm and VLSI architecture design for low-power video systems", IEE Comp. & Digital Tech., vol.151, no.1, pp , 24. [12] C.-Y. Chen et al., "Level C+ data reuse scheme for motion estimation with corresponding coding orders", IEEE TCSVT, vol.16, no.4, pp , 26. [13] P.-K. Tsung et al., "Cache-based integer motion/disparity estimation for quad-hd H.264/AVC and HD multiview video coding", IEEE ICASSP, pp , 29. [14] C.-Y. Tsai et al., "Low power cache algorithm and architecture design for fast motion estimation in H.264/AVC encoder system", IEEE ICASSP, vol. 2, pp. II-97-II-1, 27. [15] H. Shim, C.-M. Kyung, "Selective search area reuse algorithm for low external memory access motion estimation", IEEE TCSVT, vol.19, no.7, pp , 29. [16] X. Xu, Y. He, "Fast disparity motion estimation in MVC based on range prediction," IEEE ICIP, pp.2-23, 28. [17] H. Singh et al., "Enhanced leakage reduction techniques using intermediate strength power gating", IEEE TVLSI, vol. 15, no. 11, pp , 27. [18] S. Roy, N. Ranganathan, S. Katkoori, "State-retentive power gating of register files in multi-core processors featuring multithreaded inorder cores", IEEE Transaction on Computers, 21. [19] L. Shen et al., "View-adaptive motion estimation and disparity estimation for low complexity multiview video coding", IEEE TCSVT, vol.2, no.6, pp.9-93, 21. [2] H.-C. Chang et al., "A dynamic quality-adjustable H.264 video encoder for power-aware video applications", IEEE TCSVT, vol.19, no.12, pp.17-14, Dec. 29. [21] S.-H. Wang, S.-H. Tai, T. Chiang, "A low-power and bandwidthefficient motion estimation IP core design using binary search", IEEE TCSVT, vol.19, no.5, pp , 29. [22] T. Tuan et al., A 9nm low-power FPGA for battery-powered applications, ACM/SIGDA FPL, pp. 3-11, 26. [23] X. Xu, Y. He, "Fast disparity motion estimation in MVC based on range prediction", IEEE ICIP, pp.2-23, 28. [24] S. Mondal, S.O. Memik, Fine-grain leakage optimization in SRAM based FPGAs, IEEE GLSVLSI, pp ,. [] X. Liu, P. J. Shenoy, and M. D. Corner, Chameleon: application-level power management, IEEE TMC., vol. 7, no. 8, pp , 28. [26] G. Fukano et al., "A 65nm 1Mb SRAM Macro with Dynamic Voltage Scaling in Dual Power Supply Scheme for Low Power SoCs", NVSMW/ICMTD. pp.97-98, 28. [27] M. Khellah et al. "A 4.2GHz.3mm2 6kb Dual-V/sub cc/ SRAM Building Block in 65nm CMOS", IEEE ISSCC, pp.72-81, 26. [28] G. Gerosa et al., A Sub-2 W Low Power IA Processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS, IEEE ISSCC,73-82, 29. [29] H. Javaid, M, Shafique, S. Parameswaren, J. Henkel, "Low-power adaptive pipelined MPSoCs for multimedia: an H.264 video encoder case study", IEEE DAC, pp , 211. [3] H. Javaid, M, Shafique, J. Henkel, S. Parameswaren, "System-Level Application-Aware Dynamic Power Management in Adaptive Pipelined MPSoCs for Multimedia", IEEE ICCAD, pp ,

7 [31] Supplementary Material S1. Motion and Disparity Estimation in MVC MVC exploits the redundancies available in temporal and interview domains using multiple block-sized Motion Estimation (ME) and Disparity Estimation (DE), respectively. The ME/DE search is performed in previously encoded frames (i.e. reference frames) for finding a block that best matches the currently encoded Macroblock (MB) given a similarity criterion (like Sum of Absolute Differences, SAD). ME searches in temporal neighboring reference frames, while DE searches in frames of the neighboring views (see Fig. 13). Note, a search direction refers to the relative position of a reference frame with respect to the current frame. According to the MVC standard, multiple reference frames may be used to additionally improve the coding efficiency. However, in this work, we consider one reference frame per search direction, i.e. one forward and one backward reference in the temporal domain plus one forward and one backward reference in the view domain (if available). The search is performed by comparing a set of candidate blocks (selected depending upon given search patterns) inside a predefined search window (see Fig. 13) in order to find the best matching block. Temporal Reference Frame Motion Vector (MV) Best Matching Motion Estimation Disparity Reference Frame Best Matching Disparity Estimation Current MB Current Frame Disparity Vector (DV) Fig. 13 Overview of motion and disparity estimation T T3 T6 T7 T8... V V1 V2 V3 I B B B B B B B I B P P Anchor B B B B B B B B B B B B Non-Anchor B B B B B B B B B B P P Anchor Fig. 14 MVC Hierarchical Prediction Structure Fig. 15 Neighboring MBs in the 3D-neighborhood Once the best matching block is found, a Motion or Disparity Vector (MV, DV) is determined in order to represent the displacement between the current MB position and the best matching block position. Note, although ME and DE are conceptually similar, their search behavior and consequently the computational requirements, memory access pattern, and vector properties are distinct (see discussion in Section II.A). Fig. 14 illustrates the MVC prediction structure and coding sequence. Fig. 15 shows the neighboring MBs in spatial, temporal, and view domains (i.e. 3D-neighborhood). S2. Detailed Analysis of Multiview Videos A fast ME/DE TZ Search [5] algorithm is deployed for this analysis in order to represent a real-world scenario. Fast ME/DE algorithms are based on multiple search stages and patterns. These algorithms evaluate different number of search candidates for different MBs, thus exhibit highly-varying memory usage profile. A. Motion and Disparity Estimation Distribution This section reinforces our analysis of ME/DE search direction distribution (presented in Section II.A) by evaluating for different video sequences with diverse motion/disparity and texture properties. The distribution in Fig. 16 and Fig. 17 illustrates that most of the MBs (typically from the background objects with low-texture, low-motion, static blocks) are encoded using ME. While the MBs of foreground objects and object borders (with medium to high texture, high motion) are encoded using DE. It is noteworthy in Fig. 16 that the view V1 exhibits a higher number of DE encoded MBs compared to the other views. This is due to the fact that V1 has two references views available that increases the possibility to find a good match. The view V has no reference view available and consequently, all MBs are encoded using ME. Mode Distribution [%] Mode Distribution [%] 1 5 V V1 V2 V3 V V1 V2 V3 1 5 Rena Ballroom QP ME DE QP 1 5 V V1 Exit V2 V3 V V1 V2 V3 1 5 Vassar QP ME DE QP Fig. 16 (a) ME and DE distribution for four views of Rena, Ballroom, Exit and Vassar test video sequences The decision of ME and DE (as the best search direction) also depends upon the correlation available in the temporal domain. For instance, the number of DE-coded MBs is higher in compared to since is farther to the temporal references. Our memory usage analysis in Fig. 18 shows that the pattern of memory usage in ME is less scattered compared to that in DE, especially in case of low-motion sequences with smaller objects like Ballroom. The probability density function (PDF) in Fig. 18 shows that the distribution patterns of three groups are quite diverse. The PDF of the group G1 is quite centered in a low range (8-15Kpixels), while the PDF of the group G3 is quite dispersed over a big range (1-35Kpixels). Moreover, there is a minimal overlap between the PDFs different groups, which hints towards distinct memory predictions using PDFs. Therefore, based on this PDF analysis, our scheme computes two different highly-probable memory requirement predictions (M 1 and M 2 ) from the PDF 872

8 Furthermore, neighborhood correlation of memory usage can also be exploited to predict memory usage of a given MB-group, because MBs of the same groups typically contain same object, thus exhibiting similar memory requirements for ME and DE. Fig. 19 shows that there is an extensive correlation between the neighboring frames Æ Æ T3. Therefore, memory requirements [M1, M2] of an MB-group can also be predicted with a high accuracy from the memory usages of the same MB-group in the neighboring frames or even views. Similar observation can be made from the memory requirements correlation shown in Fig. 2. The regions that require more memory are located in the same region for different instants of time. It shows that it is possible to infer the memory behavior based on the neighborhood knowledge. Similar observation can be made for view neighbors..4.3 Group T ME Group 2.2 Group Required Memory [K samples] 2 3 DE Accessed Pixels (KPixel) 9 Group 1.3 Rena 6 Ballroom ME DE Accessed Pixels (KPixel) Fig. 18 (a) Probability density function (PDF) for the memory usage requirements of different groups for various test video sequences; (b) Histograms of memory usage during ME and DE processes for Rena and Ballroom sequences T3 T3 T3 Motion Estimation T3 Disparity Estimation T6 T7 T8 Fig. 19 MB-group correlation in different neighboring frames Correlated memory requirements behavior Memory Usage [Samples] Motion Estimation Memory Usage [Samples] Required Memory [K samples].4 These predicted memory requirement values are then forwarded to the power-management scheme to determine the number and mode of gated sectors (see details in Section IV.B). Disparity Estimation Group3 Occurences/Frame Group 1 Occurences/Frame (obtained through an offline-analysis over various test video sequences) considering a Gaussian distribution. M1 is obtained with a probability of.84 [(F(µ+σ; µ, σ2) - F(; µ, σ2)] and M2 is obtained with a probability of.9 [(F(µ+2σ; µ, σ2) - F(; µ, σ2)]. µ and σ are the mean and standard deviation, respectively. The M1 amount of memory is kept in PON mode as the probability of using these memory sectors is high. The memory requirement M2-M1 is kept in the PDR mode, as others MBs of the same group may use this data and the wakeup overhead is minimal to avoid delay. 9 x T3 T6 T7 T8 Memory Usage [Samples] Fig. 17 ME/DE distribution for different frames in the view V1 of the Rena test multiview video sequence Memory Usage [Samples] Correlated memory requirements behavior T x Fig. 2 3D-plots showing the correlation in the memory usage of MBs in the same frame and its temporal neighbors 873

9 PSNR [db] Ballroom 38 Exit Breakdancers Ballet Our JMVC Fig. 21 Comparing the objective video quality (rate-distortion curves) and subjective video quality (pictures) of our scheme with the exhaustive ME/DE search of JMVC 6. [2] S3. Additional Detailed Results The on-chip memory power reduction is achieved by applying the computing reordering (that increases the number and sleep durations of idle memory sectors) and power management at different levels (MB-groups, MBs, etc.). The power state machine parameters are provided in Table I, based on the model of [17] (see Section III for power model details). Fig. 22 shows that on average 51% of the sectors are on P OFF mode (up to 63%) while 9.5% are in P DR mode (up to 15%). These results highly depend on the accuracy of MB-level memory requirements prediction. Fig. 24 presents the comparison between our application-driven memory requirements predictor and traditional history-based median predictor. Note, our proposed predictor reacts better and faster to the sudden variations of memory requirements. The high prediction accuracy is achieved by taking into consideration the correlation on the 3Dneighborhood along with texture and activity properties of different MBs, frames, and views. Memory Blocks States Ballroom Exit ON DR OFF Flamenco Vassar Bkdancers Ballet Ballroom Exit Flamenco Vassar Bkdancers Ballet Fig. 22 Power modes distribution of the on-chip video memory Compared to search window-based schemes (like in [12]), our approach requires much less external memory access since only a part of the search window is prefetched. Fig. 23 shows that our approach reduces the off-chip energy by 89% and 95% (on average) compared to Level-C and Level-C+ [12], respectively. Due to the computation reordering, our scheme reduces on average 15% of external memory access compared to our previous work of [7]. Table I: Power state machine parameters Sleep Mode Leakage Energy Wakeup Energy Wakeup Latency P ON 1 P DR P OFF refetching 1 Level C+[12] Level C[12] DSW[7] Our Ballroom Exit Flamenco2 Vassar Bkdancers Ballet Fig. 23 Off-chip memory energy savings compared to state-ofthe-art search window prefetching techniques Chip Energy [%] Off Memory Requirement [Ksanples] Actual Memory Requirements History Based (Median) Our #MB Fig. 24 comparing the accuracy of our application-driven memory requirement predictor with the history-based median predictor at MB-level The detailed video quality results are shown in Fig. 21 and Fig.. The objective video quality (rate-distortion curves) and subjective video quality (decoded frames) results in Fig. 21 illustrate that our 874

Multi-Level Pipelined Parallel Hardware Architecture for High Throughput Motion and Disparity Estimation in Multiview Video Coding

Multi-Level Pipelined Parallel Hardware Architecture for High Throughput Motion and Disparity Estimation in Multiview Video Coding Multi-Level Pipelined Parallel Hardware Architecture for High Throughput Motion and Disparity Estimation in Multiview Video Coding Bruno Zatt, Muhammad Shafique, Sergio Bampi, Jörg Henkel Karlsruhe Institute

More information

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING Dieison Silveira, Guilherme Povala,

More information

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING 1 Michal Joachimiak, 2 Kemal Ugur 1 Dept. of Signal Processing, Tampere University of Technology, Tampere, Finland 2 Jani Lainema,

More information

Fast Encoding Techniques for Multiview Video Coding

Fast Encoding Techniques for Multiview Video Coding Fast Encoding Techniques for Multiview Video Coding S. Khattak a, R. Hamzaoui a,, S. Ahmad a, P. Frossard b a Centre for Electronic and Communications Engineering, De Montfort University, United Kingdom

More information

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Chuan-Yung Tsai, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute

More information

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Complexity Reduced Mode Selection of H.264/AVC Intra Coding Complexity Reduced Mode Selection of H.264/AVC Intra Coding Mohammed Golam Sarwer 1,2, Lai-Man Po 1, Jonathan Wu 2 1 Department of Electronic Engineering City University of Hong Kong Kowloon, Hong Kong

More information

View Synthesis Prediction for Rate-Overhead Reduction in FTV

View Synthesis Prediction for Rate-Overhead Reduction in FTV MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com View Synthesis Prediction for Rate-Overhead Reduction in FTV Sehoon Yea, Anthony Vetro TR2008-016 June 2008 Abstract This paper proposes the

More information

Fast Mode Decision for H.264/AVC Using Mode Prediction

Fast Mode Decision for H.264/AVC Using Mode Prediction Fast Mode Decision for H.264/AVC Using Mode Prediction Song-Hak Ri and Joern Ostermann Institut fuer Informationsverarbeitung, Appelstr 9A, D-30167 Hannover, Germany ri@tnt.uni-hannover.de ostermann@tnt.uni-hannover.de

More information

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC Hamid Reza Tohidypour, Mahsa T. Pourazad 1,2, and Panos Nasiopoulos 1 1 Department of Electrical & Computer Engineering,

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro email:{martinian,jxin,avetro}@merl.com, behrens@tnt.uni-hannover.de Mitsubishi Electric Research

More information

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for

More information

Extensions of H.264/AVC for Multiview Video Compression

Extensions of H.264/AVC for Multiview Video Compression MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Extensions of H.264/AVC for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, Anthony Vetro, Huifang Sun TR2006-048 June

More information

MultiFrame Fast Search Motion Estimation and VLSI Architecture

MultiFrame Fast Search Motion Estimation and VLSI Architecture International Journal of Scientific and Research Publications, Volume 2, Issue 7, July 2012 1 MultiFrame Fast Search Motion Estimation and VLSI Architecture Dr.D.Jackuline Moni ¹ K.Priyadarshini ² 1 Karunya

More information

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain Author manuscript, published in "International Symposium on Broadband Multimedia Systems and Broadcasting, Bilbao : Spain (2009)" One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

More information

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration , pp.517-521 http://dx.doi.org/10.14257/astl.2015.1 Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration Jooheung Lee 1 and Jungwon Cho 2, * 1 Dept. of

More information

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION Yen-Chieh Wang( 王彥傑 ), Zong-Yi Chen( 陳宗毅 ), Pao-Chi Chang( 張寶基 ) Dept. of Communication Engineering, National Central

More information

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Journal of Computational Information Systems 7: 8 (2011) 2843-2850 Available at http://www.jofcis.com High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Meihua GU 1,2, Ningmei

More information

DISPARITY-ADJUSTED 3D MULTI-VIEW VIDEO CODING WITH DYNAMIC BACKGROUND MODELLING

DISPARITY-ADJUSTED 3D MULTI-VIEW VIDEO CODING WITH DYNAMIC BACKGROUND MODELLING DISPARITY-ADJUSTED 3D MULTI-VIEW VIDEO CODING WITH DYNAMIC BACKGROUND MODELLING Manoranjan Paul and Christopher J. Evans School of Computing and Mathematics, Charles Sturt University, Australia Email:

More information

Depth Estimation for View Synthesis in Multiview Video Coding

Depth Estimation for View Synthesis in Multiview Video Coding MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Depth Estimation for View Synthesis in Multiview Video Coding Serdar Ince, Emin Martinian, Sehoon Yea, Anthony Vetro TR2007-025 June 2007 Abstract

More information

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu LBP-GUIDED DEPTH IMAGE FILTER Rui Zhong, Ruimin Hu National Engineering Research Center for Multimedia Software,School of Computer, Wuhan University,Wuhan, 430072, China zhongrui0824@126.com, hrm1964@163.com

More information

Next-Generation 3D Formats with Depth Map Support

Next-Generation 3D Formats with Depth Map Support MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Next-Generation 3D Formats with Depth Map Support Chen, Y.; Vetro, A. TR2014-016 April 2014 Abstract This article reviews the most recent extensions

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro TR2006-035 April 2006 Abstract

More information

Homogeneous Transcoding of HEVC for bit rate reduction

Homogeneous Transcoding of HEVC for bit rate reduction Homogeneous of HEVC for bit rate reduction Ninad Gorey Dept. of Electrical Engineering University of Texas at Arlington Arlington 7619, United States ninad.gorey@mavs.uta.edu Dr. K. R. Rao Fellow, IEEE

More information

Prediction-based Directional Search for Fast Block-Matching Motion Estimation

Prediction-based Directional Search for Fast Block-Matching Motion Estimation Prediction-based Directional Search for Fast Block-Matching Motion Estimation Binh P. Nguyen School of Information and Communication Technology, Hanoi University of Technology, Vietnam binhnp@it-hut.edu.vn

More information

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms RC24748 (W0902-063) February 12, 2009 Electrical Engineering IBM Research Report Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms Yuri Vatis Institut für Informationsverarbeitung

More information

dsvm: Energy-Efficient Distributed Scratchpad Video Memory Architecture for the Next-Generation High Efficiency Video Coding

dsvm: Energy-Efficient Distributed Scratchpad Video Memory Architecture for the Next-Generation High Efficiency Video Coding dsvm: Energy-Efficient Distributed Scratchpad Video Memory Architecture for the Next-Generation High Efficiency Video Coding Felipe Sampaio 1, Muhammad Shafique 2, Bruno Zatt 3, Sergio Bampi 1, Jörg Henkel

More information

POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION

POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION K.Priyadarshini, Research Scholar, Department Of ECE, Trichy Engineering College ; D.Jackuline Moni,Professor,Department Of ECE,Karunya

More information

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation th International Conference on Advanced Computing and Communications Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation Avishek Saha Department of Computer Science and Engineering,

More information

IN RECENT years, multimedia application has become more

IN RECENT years, multimedia application has become more 578 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 5, MAY 2007 A Fast Algorithm and Its VLSI Architecture for Fractional Motion Estimation for H.264/MPEG-4 AVC Video Coding

More information

New Motion Estimation Algorithms and its VLSI Architectures for Real Time High Definition Video Coding

New Motion Estimation Algorithms and its VLSI Architectures for Real Time High Definition Video Coding New Motion Estimation Algorithms and its VLSI Architectures for Real Time High Definition Video Coding Gustavo Sanchez 1, Marcelo Porto 1, Diego Noble 1, Sergio Bampi 2, Luciano Agostini 1 1 Federal University

More information

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks 2011 Wireless Advanced On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks S. Colonnese, F. Cuomo, O. Damiano, V. De Pascalis and T. Melodia University of Rome, Sapienza, DIET,

More information

Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays

Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Overview of Multiview Video Coding and Anti-Aliasing for 3D Displays Anthony Vetro, Sehoon Yea, Matthias Zwicker, Wojciech Matusik, Hanspeter

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames Ki-Kit Lai, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing Department of Electronic and Information Engineering

More information

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

Reducing/eliminating visual artifacts in HEVC by the deblocking filter. 1 Reducing/eliminating visual artifacts in HEVC by the deblocking filter. EE5359 Multimedia Processing Project Proposal Spring 2014 The University of Texas at Arlington Department of Electrical Engineering

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Course Presentation Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Video Coding Correlation in Video Sequence Spatial correlation Similar pixels seem

More information

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression An Optimized Template Matching Approach to Intra Coding in Video/Image Compression Hui Su, Jingning Han, and Yaowu Xu Chrome Media, Google Inc., 1950 Charleston Road, Mountain View, CA 94043 ABSTRACT The

More information

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR.

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR. 2015; 2(2): 201-209 IJMRD 2015; 2(2): 201-209 www.allsubjectjournal.com Received: 07-01-2015 Accepted: 10-02-2015 E-ISSN: 2349-4182 P-ISSN: 2349-5979 Impact factor: 3.762 Aiyar, Mani Laxman Dept. Of ECE,

More information

Sergio Sanz-Rodríguez, Fernando Díaz-de-María, Mehdi Rezaei Low-complexity VBR controller for spatialcgs and temporal scalable video coding

Sergio Sanz-Rodríguez, Fernando Díaz-de-María, Mehdi Rezaei Low-complexity VBR controller for spatialcgs and temporal scalable video coding Sergio Sanz-Rodríguez, Fernando Díaz-de-María, Mehdi Rezaei Low-complexity VBR controller for spatialcgs and temporal scalable video coding Conference obect, Postprint This version is available at http://dx.doi.org/10.14279/depositonce-5786.

More information

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical

More information

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC Randa Atta, Rehab F. Abdel-Kader, and Amera Abd-AlRahem Electrical Engineering Department, Faculty of Engineering, Port

More information

Implementation and analysis of Directional DCT in H.264

Implementation and analysis of Directional DCT in H.264 Implementation and analysis of Directional DCT in H.264 EE 5359 Multimedia Processing Guidance: Dr K R Rao Priyadarshini Anjanappa UTA ID: 1000730236 priyadarshini.anjanappa@mavs.uta.edu Introduction A

More information

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING Journal of the Chinese Institute of Engineers, Vol. 29, No. 7, pp. 1203-1214 (2006) 1203 STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING Hsiang-Chun Huang and Tihao Chiang* ABSTRACT A novel scalable

More information

Motion Vector Coding Algorithm Based on Adaptive Template Matching

Motion Vector Coding Algorithm Based on Adaptive Template Matching Motion Vector Coding Algorithm Based on Adaptive Template Matching Wen Yang #1, Oscar C. Au #2, Jingjing Dai #3, Feng Zou #4, Chao Pang #5,Yu Liu 6 # Electronic and Computer Engineering, The Hong Kong

More information

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS Theepan Moorthy and Andy Ye Department of Electrical and Computer Engineering Ryerson

More information

An Adaptive Cross Search Algorithm for Block Matching Motion Estimation

An Adaptive Cross Search Algorithm for Block Matching Motion Estimation An Adaptive Cross Search Algorithm for Block Matching Motion Estimation Jiancong Luo', Ishfaq Ahmad' and Xzhang Luo' 1 Department of Computer Science and Engineering, University of Texas at Arlington,

More information

RISPP: Rotating Instruction Set Processing Platform

RISPP: Rotating Instruction Set Processing Platform RISPP: Rotating Instruction Set Processing Platform Lars Bauer, Muhammad Shafique and Jörg Henkel Chair for Embedded Systems (CES) University of Karlsruhe (TH) Development of Embedded Systems Typical:

More information

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Jeffrey S. McVeigh 1 and Siu-Wai Wu 2 1 Carnegie Mellon University Department of Electrical and Computer Engineering

More information

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Jian Chen, Ruihua Peng, Yuzhuo Fu School of Micro-electronics, Shanghai Jiao Tong University, Shanghai 200030, China {chenjian,

More information

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Efficient MPEG- to H.64/AVC Transcoding in Transform-domain Yeping Su, Jun Xin, Anthony Vetro, Huifang Sun TR005-039 May 005 Abstract In this

More information

A Dedicated Hardware Solution for the HEVC Interpolation Unit

A Dedicated Hardware Solution for the HEVC Interpolation Unit XXVII SIM - South Symposium on Microelectronics 1 A Dedicated Hardware Solution for the HEVC Interpolation Unit 1 Vladimir Afonso, 1 Marcel Moscarelli Corrêa, 1 Luciano Volcan Agostini, 2 Denis Teixeira

More information

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 6T- SRAM for Low Power Consumption Mrs. J.N.Ingole 1, Ms.P.A.Mirge 2 Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1 PG Student [Digital Electronics], Dept. of ExTC, PRMIT&R,

More information

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Y. Vatis, B. Edler, I. Wassermann, D. T. Nguyen and J. Ostermann ABSTRACT Standard video compression techniques

More information

Recent, Current and Future Developments in Video Coding

Recent, Current and Future Developments in Video Coding Recent, Current and Future Developments in Video Coding Jens-Rainer Ohm Inst. of Commun. Engineering Outline Recent and current activities in MPEG Video and JVT Scalable Video Coding Multiview Video Coding

More information

A General Sign Bit Error Correction Scheme for Approximate Adders

A General Sign Bit Error Correction Scheme for Approximate Adders A General Sign Bit Error Correction Scheme for Approximate Adders Rui Zhou and Weikang Qian University of Michigan-Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai,

More information

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee

More information

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H. EE 5359 MULTIMEDIA PROCESSING SPRING 2011 Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.264 Under guidance of DR K R RAO DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY

More information

Parallel Processing Deblocking Filter Hardware for High Efficiency Video Coding

Parallel Processing Deblocking Filter Hardware for High Efficiency Video Coding International Journal of Latest Research in Engineering and Technology (IJLRET) ISSN: 2454-5031 www.ijlret.com ǁ PP. 52-58 Parallel Processing Deblocking Filter Hardware for High Efficiency Video Coding

More information

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments 2013 IEEE Workshop on Signal Processing Systems A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264 Vivienne Sze, Madhukar Budagavi Massachusetts Institute of Technology Texas Instruments ABSTRACT

More information

Adaptation of Scalable Video Coding to Packet Loss and its Performance Analysis

Adaptation of Scalable Video Coding to Packet Loss and its Performance Analysis Adaptation of Scalable Video Coding to Packet Loss and its Performance Analysis Euy-Doc Jang *, Jae-Gon Kim *, Truong Thang**,Jung-won Kang** *Korea Aerospace University, 100, Hanggongdae gil, Hwajeon-dong,

More information

A LOW-POWER VGA FULL-FRAME FEATURE EXTRACTION PROCESSOR. Dongsuk Jeon, Yejoong Kim, Inhee Lee, Zhengya Zhang, David Blaauw, and Dennis Sylvester

A LOW-POWER VGA FULL-FRAME FEATURE EXTRACTION PROCESSOR. Dongsuk Jeon, Yejoong Kim, Inhee Lee, Zhengya Zhang, David Blaauw, and Dennis Sylvester A LOW-POWER VGA FULL-FRAME FEATURE EXTRACTION PROCESSOR Dongsuk Jeon, Yejoong Kim, Inhee Lee, Zhengya Zhang, David Blaauw, and Dennis Sylvester University of Michigan, Ann Arbor ABSTRACT This paper proposes

More information

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo School of Electronic Engineering and Computer Science, Queen Mary University of London

More information

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING Md. Salah Uddin Yusuf 1, Mohiuddin Ahmad 2 Assistant Professor, Dept. of EEE, Khulna University of Engineering & Technology

More information

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation S. López, G.M. Callicó, J.F. López and R. Sarmiento Research Institute for Applied Microelectronics (IUMA) Department

More information

Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin

Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin Final report on coding algorithms for mobile 3DTV Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin MOBILE3DTV Project No. 216503 Final report on coding algorithms for mobile 3DTV Gerhard

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER Zong-Yi Chen, Jiunn-Tsair Fang 2, Tsai-Ling Liao, and Pao-Chi Chang Department of Communication Engineering, National Central

More information

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute of Electronics Engineering, National

More information

Motion estimation for video compression

Motion estimation for video compression Motion estimation for video compression Blockmatching Search strategies for block matching Block comparison speedups Hierarchical blockmatching Sub-pixel accuracy Motion estimation no. 1 Block-matching

More information

Fast frame memory access method for H.264/AVC

Fast frame memory access method for H.264/AVC Fast frame memory access method for H.264/AVC Tian Song 1a), Tomoyuki Kishida 2, and Takashi Shimamoto 1 1 Computer Systems Engineering, Department of Institute of Technology and Science, Graduate School

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder Huibo Zhong, Sha Shen, Yibo Fan a), and Xiaoyang Zeng State Key Lab of ASIC and System, Fudan University 825 Zhangheng Road, Shanghai,

More information

Stereo Image Compression

Stereo Image Compression Stereo Image Compression Deepa P. Sundar, Debabrata Sengupta, Divya Elayakumar {deepaps, dsgupta, divyae}@stanford.edu Electrical Engineering, Stanford University, CA. Abstract In this report we describe

More information

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding Jung-Ah Choi and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro, Buk-gu, Gwangju, 500-712, Korea

More information

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN BANDWIDTH REDUCTION SCHEMES FOR MPEG- TO H. TRANSCODER DESIGN Xianghui Wei, Wenqi You, Guifen Tian, Yan Zhuang, Takeshi Ikenaga, Satoshi Goto Graduate School of Information, Production and Systems, Waseda

More information

Analysis of 3D and Multiview Extensions of the Emerging HEVC Standard

Analysis of 3D and Multiview Extensions of the Emerging HEVC Standard MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Analysis of 3D and Multiview Extensions of the Emerging HEVC Standard Vetro, A.; Tian, D. TR2012-068 August 2012 Abstract Standardization of

More information

HEVC based Stereo Video codec

HEVC based Stereo Video codec based Stereo Video B Mallik*, A Sheikh Akbari*, P Bagheri Zadeh *School of Computing, Creative Technology & Engineering, Faculty of Arts, Environment & Technology, Leeds Beckett University, U.K. b.mallik6347@student.leedsbeckett.ac.uk,

More information

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications 46 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.3, March 2008 Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

More information

Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264

Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264 Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264 Jing Hu and Jerry D. Gibson Department of Electrical and Computer Engineering University of California, Santa Barbara, California

More information

Area Efficient SAD Architecture for Block Based Video Compression Standards

Area Efficient SAD Architecture for Block Based Video Compression Standards IJCAES ISSN: 2231-4946 Volume III, Special Issue, August 2013 International Journal of Computer Applications in Engineering Sciences Special Issue on National Conference on Information and Communication

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation 2009 Third International Conference on Multimedia and Ubiquitous Engineering A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation Yuan Li, Ning Han, Chen Chen Department of Automation,

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

Fast Block-Matching Motion Estimation Using Modified Diamond Search Algorithm

Fast Block-Matching Motion Estimation Using Modified Diamond Search Algorithm Fast Block-Matching Motion Estimation Using Modified Diamond Search Algorithm Bichu Vijay 1, Ganapathi Hegde 2, Sanju S 3 Amrita School of Engineering, Bangalore, India Email: vijaybichu.in@gmail.com 1,

More information

Development of Low Power ISDB-T One-Segment Decoder by Mobile Multi-Media Engine SoC (S1G)

Development of Low Power ISDB-T One-Segment Decoder by Mobile Multi-Media Engine SoC (S1G) Development of Low Power ISDB-T One-Segment r by Mobile Multi-Media Engine SoC (S1G) K. Mori, M. Suzuki *, Y. Ohara, S. Matsuo and A. Asano * Toshiba Corporation Semiconductor Company, 580-1 Horikawa-Cho,

More information

High Efficiency Video Coding. Li Li 2016/10/18

High Efficiency Video Coding. Li Li 2016/10/18 High Efficiency Video Coding Li Li 2016/10/18 Email: lili90th@gmail.com Outline Video coding basics High Efficiency Video Coding Conclusion Digital Video A video is nothing but a number of frames Attributes

More information

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC Jung-Ah Choi, Jin Heo, and Yo-Sung Ho Gwangju Institute of Science and Technology {jachoi, jinheo, hoyo}@gist.ac.kr

More information

Context based optimal shape coding

Context based optimal shape coding IEEE Signal Processing Society 1999 Workshop on Multimedia Signal Processing September 13-15, 1999, Copenhagen, Denmark Electronic Proceedings 1999 IEEE Context based optimal shape coding Gerry Melnikov,

More information

Video Coding Using Spatially Varying Transform

Video Coding Using Spatially Varying Transform Video Coding Using Spatially Varying Transform Cixun Zhang 1, Kemal Ugur 2, Jani Lainema 2, and Moncef Gabbouj 1 1 Tampere University of Technology, Tampere, Finland {cixun.zhang,moncef.gabbouj}@tut.fi

More information

Overview: motion-compensated coding

Overview: motion-compensated coding Overview: motion-compensated coding Motion-compensated prediction Motion-compensated hybrid coding Motion estimation by block-matching Motion estimation with sub-pixel accuracy Power spectral density of

More information

A full-pipelined 2-D IDCT/ IDST VLSI architecture with adaptive block-size for HEVC standard

A full-pipelined 2-D IDCT/ IDST VLSI architecture with adaptive block-size for HEVC standard LETTER IEICE Electronics Express, Vol.10, No.9, 1 11 A full-pipelined 2-D IDCT/ IDST VLSI architecture with adaptive block-size for HEVC standard Hong Liang a), He Weifeng b), Zhu Hui, and Mao Zhigang

More information

A Fast Intra/Inter Mode Decision Algorithm of H.264/AVC for Real-time Applications

A Fast Intra/Inter Mode Decision Algorithm of H.264/AVC for Real-time Applications Fast Intra/Inter Mode Decision lgorithm of H.64/VC for Real-time pplications Bin Zhan, Baochun Hou, and Reza Sotudeh School of Electronic, Communication and Electrical Engineering University of Hertfordshire

More information

ISSN: An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding

ISSN: An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding Ali Mohsin Kaittan*1 President of the Association of scientific research and development in Iraq Abstract

More information

Digital Image Stabilization and Its Integration with Video Encoder

Digital Image Stabilization and Its Integration with Video Encoder Digital Image Stabilization and Its Integration with Video Encoder Yu-Chun Peng, Hung-An Chang, Homer H. Chen Graduate Institute of Communication Engineering National Taiwan University Taipei, Taiwan {b889189,

More information

An Independent Motion and Disparity Vector Prediction Method for Multiview Video Coding

An Independent Motion and Disparity Vector Prediction Method for Multiview Video Coding Preprint Version (2011) An Independent Motion and Disparity Vector Prediction Method for Multiview Video Coding Seungchul Ryu a, Jungdong Seo a, Dong Hyun Kim a, Jin Young Lee b, Ho-Cheon Wey b, and Kwanghoon

More information

Vector Bank Based Multimedia Codec System-on-a-Chip (SoC) Design

Vector Bank Based Multimedia Codec System-on-a-Chip (SoC) Design 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks Vector Bank Based Multimedia Codec System-on-a-Chip (SoC) Design Ruei-Xi Chen, Wei Zhao, Jeffrey Fan andasaddavari Computer

More information

Low-cost Multi-hypothesis Motion Compensation for Video Coding

Low-cost Multi-hypothesis Motion Compensation for Video Coding Low-cost Multi-hypothesis Motion Compensation for Video Coding Lei Chen a, Shengfu Dong a, Ronggang Wang a, Zhenyu Wang a, Siwei Ma b, Wenmin Wang a, Wen Gao b a Peking University, Shenzhen Graduate School,

More information

RFCAVLC8t: a Reference Frame Compression Algorithm for Video Coding Systems

RFCAVLC8t: a Reference Frame Compression Algorithm for Video Coding Systems XXVII SIM - South Symposium on Microelectronics 1 RFCAVLC8t: a Reference Frame Compression Algorithm for Video Coding Systems Dieison Silveira, Mateus Grellert, Luciano Agostini {dssilveira, mgdsilva,

More information

By Charvi Dhoot*, Vincent J. Mooney &,

By Charvi Dhoot*, Vincent J. Mooney &, By Charvi Dhoot*, Vincent J. Mooney &, -Shubhajit Roy Chowdhury*, Lap Pui Chau # *International Institute of Information Technology, Hyderabad, India & School of Electrical and Computer Engineering, Georgia

More information