Xuena Bao, Dajiang Zhou, Peilin Liu, and Satoshi Goto, Fellow, IEEE

Size: px
Start display at page:

Download "Xuena Bao, Dajiang Zhou, Peilin Liu, and Satoshi Goto, Fellow, IEEE"

Transcription

1 An Advanced Hierarchical Motion Estimation Scheme with Lossless Frame Recompression and Early Level Termination for Beyond High Definition Video Coding Xuena Bao, Dajiang Zhou, Peilin Liu, and Satoshi Goto, Fellow, IEEE Abstract In this paper, we present a hardware-efficient fast algorithm with a lossless frame recompression scheme and early level termination strategy for large search range (SR) motion estimation (ME) utilized in beyond high definition video encoder. To achieve high ME quality for hierarchical motion search, we propose an advanced hierarchical ME scheme which processes the multi-resolution motion search with an efficient refining stage. This enables high data and hardware reuse for much lower bandwidth and memory cost, while achieving higher ME quality than previous works. In addition, a lossless frame recompression scheme based on this ME algorithm is presented to further reduce bandwidth. A hierarchical memory organization as well as a leveling two-step data fetching strategy is applied to meet constraint of random access for hierarchical motion search structure. And the leveling compression strategy by allowing a lower level to refer to a higher one for compression is proposed to efficiently reduce the bandwidth. Furthermore, an early level termination method suitable for hierarchical ME structure is also applied. This method terminates high level redundant motion searches by establishing thresholds based on current block mode and motion search level; it also applies the early refinement termination in order to avoid unnecessary refinement for high levels. Experimental results show that the total scheme has a much less bit rate increasing compared with previous works especially for high motion sequences, while achieving a considerable saving of memory and bandwidth cost for large SR of [-128, 127]. Index Terms Beyond high definition, early level termination, hierarchical motion estimation, lossless frame recompression, video coding. Manuscript received July 6th, This research was supported by Waseda University Ambient SoC Global COE Program of MEXT, Japan, by Knowledge Cluster Initiative (2nd Stage) of MEXT, Japan, and by CREST of Japan Science and Technology Agency. Xuena Bao was with the Graduate School of Information, Production and Systems, Waseda University, 2-7 Hibikino, Kitakyushu, , Japan. She is now with the Department of Electronic Engineering, Shanghai Jiao Tong University, No.800 Dong Chuan Road, Shanghai, , China (phone: ; fax: ; baoxuena@sjtu.edu.cn). Dajiang Zhou is with the Graduate School of Information, Production and Systems, Waseda University, 2-7 Hibikino, Kitakyushu, , Japan ( zhou@fuji.waseda.jp). Peilin Liu is with the Department of Electronic Engineering, Shanghai Jiao Tong University, No.800 Dong Chuan Road, Shanghai, , China ( liupeilin@sjtu.edu.cn). Satoshi Goto is with the Graduate School of Information, Production and Systems, Waseda University, 2-7 Hibikino, Kitakyushu, , Japan ( goto@waseda.jp). I I. INTRODUCTION n order to provide high quality perception, TV resolution grows dramatically, and beyond high definition (beyond HD) videos such as QFHD (quad full high definition, the definition is 4320x2160/2160p) and SHV (Super Hi-Vision, the definition is 7680x4320/4320p) sequences become a trend for real applications. Although H.264/AVC video coding standard has been widely adopted in many video devices which provide good coding efficiency, most of them only support high definition (HD) video or below because of high computational complexity of ME part. For the sequences beyond high definition, motions among neighboring frames are higher compared with the lower definition sequences with the same visual contents ([1]), therefore the required search range has to be up to [-128, 127] or even larger in order to capture the motions more accurately. For these beyond HD sized applications with large SR, the huge consumptions of resources for previous ME approaches become the bottleneck of the encoder chip design. First and foremost, since the reference data are stored in dynamic random access memory (DRAM) and they have to be accessed during ME process, the DRAM bandwidth requirement becomes rather huge for encoding beyond HD sequences, which goes beyond the limitation of current DDR2 and DDR3 techniques. Besides, huge DRAM traffic also means a significant consumption of the total system power. In addition, the area cost as well as chip pin count grows dramatically, which leads to high chip design cost. Furthermore, previous approaches consume too many computational cycles, which become not suitable for the real time encoder system. Recently there are some motion estimation algorithms such as [2, 3], whose estimation ranges are targeted for HD sequences or below. [2] proposes an update-type motion estimation scheme with a multi-resolution approach for motion compensated image interpolation. [3] presents a fast modified diamond search algorithm for motion estimation. In addition, some video coding strategies aiming to support the high definition sequences are also proposed ([4, 5, 6]). [4, 5] propose their intra prediction schemes suitable for high definition videos, while [6] comes up with a memory interface architecture for high definition video coding.

2 In order to ensure the performance for beyond high definition cases, [1, 7] propose some strategies that are suitable for beyond high definition video coding. However, the motion estimation strategy is FS with SR of 128 in [1], while [7] proposes a set of diagonal partition shapes for variable motion partitioning. Both of the algorithms consume much calculation time and bandwidth, which are not suitable to be directly hardware implemented. For the fast ME algorithms that have been proposed to support large search range with hardware implementations, there are two promising ME structures which include cache-based ME (CBME) in [8] integrated in a quad HDTV sized video encoder ([9]) and parallel multi-resolution ME (PMRME) in [10] integrated in [11]. For CBME, a [-16, +15] refinement is done based on the best motion vector selected from several predicting candidates. Although it performs well for small and uniform motions, the coding efficiency loss becomes rather large for high motions due to limited motion prediction range. PMRME uses three independent levels with two sub-sampling levels that cover large ranges and one fine level that covers small range. This architecture solves the dependency problem with significant savings of on chip memory and bandwidth. However, if the motion falls in the area of sub-sampling levels, the motion vector (MV) in coarse search will be directly passed on to the next stage without refinement, leading to non-ignorable quality loss. In addition, although large bandwidth saving can be achieved in integer ME part, the data in sub-sampling levels cannot be reused in fraction ME, which causes more bandwidth when reference data without sub-sampling have to be fetched for fraction motion search. To solve above problems, an advanced parallel multi-resolution ME algorithm with a coarse-to-fine strategy is proposed in this paper. In order to avoid the quality loss of CBME for high motion sequences, the proposed algorithm applies the hierarchical motion estimation strategy with the integration of large area sub-sampling searches. Then, different from PMRME that only searches sub-sampling values for high levels, the proposed algorithm applies the coarse search based on the average value of each sub-sampled block and then the motion vector with minimum cost will be refined inside the block. Since the refinement can reuse the low level hardware and the original reference data without sub-sampling in the refining stage can be reused in fraction ME, the area and bandwidth cost does not increase and the latency problem can also be solved by a suitable pipeline strategy, while the ME quality will be much better. Furthermore, frame recompression is applied to reduce the bandwidth. Until now there are some works concentrating on this technique such as [12, 13, 14, 15, 16, 17]. However, all of these works divide the reference picture into small blocks and compress each block based on a general suggestion that all of the pixels of the block will be needed for motion search, appearing not suitable for the hierarchical ME application. Therefore, we propose a lossless frame recompression scheme based on the proposed hierarchical ME algorithm. A hierarchical memory organization is proposed to meet constraint of random access for the hierarchical ME search structure. Then a leveling two-step data fetching strategy that fetches needed data separately from three levels is applied to extend random access to a lower level without leading to latency problems. Furthermore, in order to compress the reference data in each level, the differential values are calculated by referring current values to the average values stored in the higher level and an efficient coding method that is suitable for compressing these differential values is also applied. In order to terminate unnecessary large area searches for small motions, an early level termination method based on the hierarchical ME algorithm is applied. This method chooses different thresholds for different motion search levels based on the motion cost and estimated motion search improvement, thus it is able to terminate the redundant high level searches according to the thresholds. Furthermore, the early refinement termination is also applied to avoid redundant refining searches for high levels, which aims to save extra bandwidth cost by unnecessary refinement. Experimental results show that the proposed early level termination strategy can effectively reduce the bandwidth with little quality degradation. In section 2, we will introduce the hierarchical ME architecture, while the proposed frame recompression scheme will be explained in section 3. The proposed early level termination strategy will be presented in section 4. The experimental results are given in section 5. Finally, conclusions are drawn in section 6. II. PROPOSED HIERARCHICAL ME SCHEME A. The Basic Idea of Proposed Algorithm The proposed parallel multi-resolution ME algorithm includes three levels with one fine level and two sub-sampling levels. The fine level without data sub-sampling covers the search range with MV for small motion. On the other hand, the other two levels with data sub-sampling cover the large search range to find the large search motion vector. After the motion search, the search results of three levels are compared and the motion vector with minimum cost is chosen as the final ME result. This hierarchical searching strategy ensures the large prediction range for high motions, which is able to achieve better performance than previous strategies which only process small area searches (such as CBME). In the sub-sampling levels for hierarchical ME, one big problem is that the ME quality loss is inevitable compared with large area full search if only sub-sampling pixels are searched, such as the PMRME searching method proposed in [10]. In order to reduce the quality loss caused by the sub-sampling search, after the coarse search based on the average value of each block, the motion vector with minimum motion cost will be further refined to find the best matched position. The refining strategy results in a better coding performance than PMRME. In addition, since the original reference data for the refinement can be reused in fraction ME, the extra bandwidth consumption for fetching the original data based on the sub-sampled MV during the fraction ME process in [10] can

3 Level 0 SR=[-7~+6] Level 1 SR=[-32~+31] Level 2 also be avoided. Center: (0,0) Center: (0,0) SR=[-128~+127] Center: Pred MV Average value Average value Pred MV Fig. 1. Hierarchical ME algorithm. Fine search Best MV Fine search Best MV TABLE I MODES SUPPORTED FOR DIFFERENT LEVELS Level Block Size 0 16x16,16x8,8x16,8x8 1 16x16,16x8,8x16,8x8 2 16x16 B. The Three Levels of Proposed Algorithm The proposed hierarchical ME algorithm is illustrated in Fig. 1. In the lowest level, level 0, the SR is set to [-7, +6]. Since the predictive motion vector (PMV) has relatively high probability for final MV in small motion search, it is chosen as the search center. This level does fine search based on original reference data without sub-sampling. In addition, all variable block size modes are enabled (Table I). In order to save bandwidth and maintain the coding efficiency, we only support block sizes that are larger than or equal to 8x8, as is in [7, 18]. According to [18], this approach is able to maintain the coding performance in most cases, especially when RDO (rate distortion optimization) is off. In level 1, the SR is enlarged to [-32, +31]. It is centered on the current block position in the reference picture, which is defined as original point (0, 0). This enables regular memory reuse between successive MB processing, as is illustrated in [10]. In this level, the 4:1 sampling strategy is applied in which only the average value of each 2x2 block from level 0 is searched by comparing it to the average value of current corresponded block and all of the 16x16 to 8x8 modes are chosen. After the coarse search, the MV with minimum cost pointing to the center of the 2x2 block will be refined inside the block. The refinement calculates the motion cost based on the original reference data without sub-sampling. In level 2, the SR is the largest, [-128, +127], and also centered on (0, 0). In this level, the average value of each 2x2 block from level 1 (which is also the average value of each 4x4 block from level 0) is searched and only the 16x16 mode is chosen since other modes will contain too few average values for SAD calculation. Also, the coarse MV will be refined inside the 4x4 block. In the three parallel levels, level 2 provides a large search range for relatively high motions. According to [1], a higher definition sequence is generally more homogeneous than a lower definition sequence with a same video content. Therefore, the average values for 16:1 sampling are used for SAD calculation in order to predict the MV without causing much quality loss. And the refinement helps further locating the MV to the integer pixel level. Similarly level 1 provides a finer precision for medium motion and most low motion vectors will fall in level 0. C. The Calculation Structure of Proposed Algorithm The ME calculation scheme is shown in Fig. 2. Every calculation component is decomposed as the combinations of the primitive calculation modules for each 4x4 block. Each level composes exactly 16 primitive modules to balance the area costs of three levels, as is illustrated in [10]. Furthermore, in the proposed ME structure with coarse-to-fine strategy, the level 0 hardware is also applied to do the refining search for the coarse MVs resulting from high levels. Therefore, after the high levels coarse searches, the motion vectors with minimum costs are transmitted to level 0 calculation module for further refinement. Since the refinement for high levels has similar process with the level 0 search, the level 0 hardware can be directly reused for the refinement process. Finally, the level 0 search result, as well as the refining Current MB Level 2 buffer Average values for each 4x4 Level 0 buffer Original values Level 1 buffer Average values for each 2x Level 2 SAD module 0 Level 2 SAD module 1 Level 2 SAD module 14 Level 2 SAD module 15 Level 0 SAD module 0 (including refining search for level 1 and level 2) Level 1 SAD module 0 Level 1 SAD module 1 Level 1 SAD module 2 Level 1 SAD module 3 8x8 SAD tree 0 minimun 8x8 SAD tree 0 8x8 SAD tree 1 8x8 SAD tree 2 8x8 SAD tree 3 Fig. 2. Hierarchical ME structure. minimun Level 0 MV Level 1 MV Level 2 MV Best MV for each mode

4 MV results for the two high levels is compared and the motion vector with minimum motion cost is chosen as the final ME result. In order to ensure time balance of three levels calculation modules, the SR of level 0 is adjusted to [-7, +6]. Since the level 0 calculation module contains totally 16 primitive modules, it is able to calculate the SAD values of variable sub-blocks inside the MB for one search point in one clock cycle. As a result, the clock cycles for level 0 search are 196 cycles for SR of [-7, +6] (14x14). For the refining searches, it is possible that the search centers of each sub-block inside the MB with the sizes below 16x16 are different, when they are resulting from high level searches for the sub-block modes. As a result, they cannot be searched at the same time for calculating the SAD costs inside the MB. Hence the refining searches for high levels cost at most 52 cycles for SR of [-1, 0] (4 cycles of each block for 9 blocks (one 16x16 +two 8x16 +two 16x8 +four 8x8)) in level 1 and [-2, +1] (16 cycles for one 16x16 block) in level 2. As a result, the total calculation time including refining search (248 cycles) for the level 0 calculation module almost equals to the other two levels of 256 cycles, which ensures the balance of calculation cycles for three levels. And the total calculation process for three levels can be synchronized and pipelined by delaying the level 0 search for the first MB by 52 cycles for initializing, while this delay can be negligible compared with the total motion estimation time for one frame. Therefore, the calculation cycles for each MB don t increase compared with the scheme proposed in [10]. III. FRAME RECOMPRESSION SCHEME A. Variable Compression Ratio Based Lossless Frame Recompression Scheme Frame recompression (FRC) is a technique to compress the data before storing them into the frame memory, and decompress the data fetched back. The proposed frame recompression scheme applies variable compression ratio based compression strategy, which is to divide the reference picture into small blocks and compress each block with any ratio. This strategy is different from the fixed-compression ratio model that compresses each block into the same size ([14, 15, 16, 17]), and it is able to avoid the shortcomings of the fixed-compression ratio strategy. On one hand, some blocks have a higher potential for compression can be compressed with a relatively high compression ratio. On the other hand, for those blocks which have a lower compression potential, they don t have to be fitted into the designated compression ratio, thus no quality loss happens. In addition, the consequent drift error, which means the error propagation because of the quality loss of reference frames, can also be avoided. There are two published works based on variable compression ratio ([12, 13]). In their structure, the uncompressed reference frame is divided into groups and each group is further divided into partitions. The compression for the frame is processed based on each partition. In order to support variable compression rate, each partition can be compressed 2 partitions y y 4 partitions Group 0 Group 1 MB Group m Group m+1 Group 0 Group m Group m+1 Group 1 x x Address Initial Reference <mode 0> <mode 1> Fig. 4. DPCM scanning modes. P0 P1 P1 P1 P2 P2 P3 P3 P4 0 (bit) with any ratio and compressed partitions are stored compactly in their original groups, as is shown in Fig. 3. When compressing each partition, the length is also recorded into DRAM, which can be used to derive the offset of the compressed partition inside the group. As a result, a two-step data fetching strategy that fetches the length information and the compressed partition can be processed in order to fetch a compressed partition. To compress each partition, various scanning modes are utilized to calculate DPCM (Differential Pulse Code Modulation) values (as is shown in Fig. 4), and then, variable length coding (VLC) is applied to these values to express them in fewer bits. The mode with the shortest bit length is chosen as the final scanning order and the coded bits are stored into DRAM. B. Hierarchical ME Based Frame Recompression Scheme The proposed frame recompression scheme is based on the hierarchical ME scheme, thus it applies the hierarchical compression strategy. Although the frame recompression scheme based on variable compression ratio proposed in [12] and [13] proves to achieve considerable bandwidth reduction, it is based on single level motion estimation structure, which compresses and fetches all pixels of the block. So it appears not suitable to be integrated into the hierarchical ME scheme. The proposed scheme applies the hierarchical compression strategy by allowing a lower level to refer to a higher one for compression in order to minimize the total information for storing. During the hierarchical ME process, the compressed data are fetched by applying the leveling data fetching strategy and decompressed by referring to the high level average values AU 64 Length (AU) L0 = 0.75 L1 = 1.5 L2 = 1.5 L3 = 0.75 Start address S0 = 0 S1 = S0+L0 S2 = S1+L1 S3 = S2+L2 Fig. 3. Memory organization for reference picture; compressed reference picture; calculating the start address of a partition in a group. P# stands for partition #, L# stands for its length, and S# stands for its start address.

5 2 partitions y 2 partitions Group 0 Group m 8 (samples) Group 1 Group m+1 8 (samples) MB C. Hierarchical Memory Organization In the proposed scheme, the DRAM bus width is set to be 64 bits, which is considered as access unit (AU). For an uncompressed reference frame, it is divided into groups of 16x16 samples. Furthermore, each group is divided into 4 partitions of 8x8 samples. According to the hierarchical ME structure, the reference average values are stored in high level memories. As a result, level 0 stores the total pixels of each partition, while level 1 memory stores the average value of each 2x2 block in level 0 and level 2 stores the average value of each 2x2 block in level 1 (which is also the average value of each 4x4 block in level 0). The 5/16 more average values are stored and used to compress the data in the lower level at the recompression part, thus the total information for storing after applying the compression strategy doesn t increase in comparison with previous works. The hierarchical memory organization is shown in Fig. 5. The compression for the frame is processed based on each partition in each level (except level 2 that will not be compressed). After the frame is compressed, compressed partitions of each level (partitions of level 2 include original data) are stored compactly in their original groups, as is shown in Fig. 6. This structure helps randomly locating the compressed partition to the group level, with unused space not adding to DRAM data transfer. D. Leveling Two-Step Data Fetching Strategy In order to fetch the compressed partition, the leveling two-step data fetching strategy is applied in this work. After each partition is compressed in level 0 and level 1, the length as well as the content of the compressed partition is recorded and stored into DRAM. As a result, when it needs to access to a compressed partition in each level, two steps for fetching the length information and the compressed partition can be processed in a pipeline to solve the latency problem. And the start address of one partition can be obtained by accumulating the lengths of its previous partitions in the same group, as is shown in Fig. 6. Although the access to the length information may cause extra bandwidth requirement, if we take the 8 bpp (bits per pixel) picture format as an example, only 9 bits for level 0 and 7 bits for level 1 are needed to record the length of every compressed partition, compared with the original 512 bits (level 0) and 128 bits (level 1) data. The analyzing can also be extended to other formats, and for higher pixel depth, the ratio of the length bits compared with the original data becomes x Level 0 (1) Level 1 (1/4) Level 2 (1/16) Fig. 5. Hierarchical memory organization for reference picture. y Group 0 Group 1 Group m Length (AU) L0 = 2.25 L1 = 1.5 L2 = 1.25 Group m+1 Start address S0 = 0 S1 = S0+L0 S2 = S1+L1 x Address P0 P0 P1 lower. So the overhead can be nearly negligible in comparison with the bandwidth saved at the compression part. In addition, the overhead caused by the length table in each level can be further reduced, once the partition lengths of one group can be buffered in a cache. E. Differential Value Calculation and Variable Length Coding Since level 2 only includes the 1/16 data of a partition, the compression contributes too little to the bandwidth reduction so that this level s data will not be compressed. To compress each partition in level 0 and level 1, the difference value of each pixel is calculated by comparing it to the average value of the 2x2 block which it belongs to, and then, VLC is applied to these differential values to express them in fewer bits. For most cases, the pixel values of each block are normally distributed around the average value of the block. Therefore, the compression ratio will be relatively high by subtracting the average value from current pixel and encoding the differential value. Since the average values of current level are stored in its higher level, it will be not difficult to fetch the corresponded reference average values from the higher level when decompressing the data, without causing extra calculation time. To encode the differential values, a new method is chosen in this paper. Each partition is divided into 2x2 blocks (16 blocks for level 0 and 4 for level 1). As is shown in Table II, the category of each 2x2 block is selected according to the block s maximum absolute value, and the category indicator is encoded with variable length coding according to its popularity. Then, each value inside the block is encoded by using the corresponded method. From the table, we can see that the length of a coded value inside one block has only two possibilities, which contributes to a decoding algorithm with much less dependency in comparison with conventional variable length coding methods. If the maximum absolute value of one block is greater than 20, this block is expressed with the original 8-bit samples. Finally, the category indicators and coded differential values of a partition are stored into DRAM. However, if the total length of the compressed partition is 64 P0 P2 3 P1 P2 AU 0 0 (bit) Level 0 (compressed) 0 P0 P1 1 P2 Level 1 (compressed) P0 P1 Level 2 (original) Fig. 6. Hierarchical memory organization for compressed reference picture; calculating the start address of a partition in a group. P# stands for partition #, L# stands for its length, and S# stands for its start address.

6 TABLE II VARIABLE LENGTH CODING FOR DIFFERENTIAL VALUES, S STANDS FOR THE SIGN BIT OF DIFFERENTIAL VALUE Category A B C D E F Indicator Max. Abs ±1 0S 1S 00S 001S 0001S ±2 00S 01S 010S 0010S ±3 10S 011S 0011S ±4 000S 100S 0100S ±8 0000S 1000S ± S ± S ± S ± S TABLE III COMPRESSION EFFICIENCY COMPARISON FOR THE PROPOSED VLC METHOD AND EXPONENTIAL-GOLOMB METHOD Bandwidth Reduction (%) Sequence Proposed Exp-Golomb Exp-Golomb Exp-Golomb (order 0) (order 1) (order 2) -57.6% -44.2% -46.4% -44.8% -69.4% -55.5% -55.8% -51.7% greater than the original length, the original samples are directly stored into DRAM without compression. In order to test the performance of the VLC method, we utilize the VLC method to compress and write 10 frames of two sequences into DRAM, and the reduction of written data bandwidth is calculated by comparing the compressed data to the original frame data without compression. Table III shows the bandwidth reduction of the VLC method by comparing it to the Exponential-Golomb coding method with different orders. Exponential-Golomb is a universal code which does not take advantage from the redundancy between code words. But in real pictures, this redundancy (in our design, each code word represents a difference between samples) is very high. Since the VLC method utilized in the proposed FRC scheme can utilize this redundancy to improve compression efficiency, it is able to achieve better compression effect, as is shown in Table III. F. Summary Data flow of the proposed frame recompression scheme is summarized in Fig. 7. The reference information for three levels is recompressed, and then the length tables and the compressed data of two levels as well as the original data of level 2 are generated and stored into DRAM separately. In this scheme, the three levels data buffers store decompressed data to avoid cross accesses between buffers. As a result, in the process of parallel multi-resolution ME, the block of reference values needed is checked out for current level, and then the data buffer is checked whether it stores the needed partitions for the reference block. If the data buffer in level 2 does not contain the needed data for level 2 search, the data will be directly fetched Reconstructed picture Level 0 data Level 2 data Compressed data length data fetch fetch (comp. data) Length buffer 0 length info. Compressed fetch (comp. data) data Level 1 data Recompressor length fetch Length buffer 1 length info. fetch from DRAM and update the cache. However, if a miss is detected for level 1 data buffer, the length buffer is checked and updated to get length information and then compressed data will be fetched. To decompress the data, the data buffer in level 2 has to be checked and updated in order to get average values. Similarly for level 0, the level 0 length buffer and level 1 data buffer will be checked and updated if level 0 data buffer misses the needed data, and the updating of level 1 data buffer also needs the checking of the length buffer in level 1 as well as the data buffer in level 2, as is mentioned above. In this processing flow, there are two types of DRAM latency, for length fetching and compressed data fetching respectively. However, since there is no feedback from the fetched information to subsequent DRAM requests, it will not be difficult to pipeline the whole flow by putting the DRAM requests in a queue for latency concealment. In order to implement the FRC strategy for the hierarchical ME structure (Fig. 2), we have to implement and integrate several new modules into the ME structure (which are not included in Fig.2). They include the length buffers of level 0 and level 1 that store lengths of compressed partitions, the recompressor that processes the compression and generates the compressed data for DRAM storing, as well as the decompressor to decompress the fetched compressed partitions, according to the data flow in Fig. 7, while the data buffers for three levels to store the decompressed reference data are the same as original three levels data buffers in Fig. 2. IV. EARLY LEVEL TERMINATION STRATEGY A. Early Level Termination Method Level 0 Decompressor Level 1 Decompressor Ave. info. Fig. 7. Data flow of the proposed scheme. Level 0 Search Level 0 data buffer Level 1 Search Level 1 data buffer Ave. info. Level 2 Search Level 2 data buffer Ref. samples Ref. samples Ref. samples Since the proposed hierarchical ME scheme is based on the worst situation in which the motion is generally fast for beyond high definition videos, it applies the large prediction range as well as the refining strategy in order to maintain much better coding quality. However, for relatively lower motion sequences, only low level motion search with small search range can achieve similar ME performance for most cases. In this kind of situation, the applying of all three levels searches will lead to extra cost of bandwidth and calculation time. Therefore, an early level termination method is applied to terminate the large area search in high levels if the low level

7 search is predicted to generate an ideal MV. The proposed early level termination method firstly classifies each block based on both the motion cost and motion search improvement ([19]) to determine the small area search class. Furthermore, for the class that needs large area search, the proposed method further applies the threshold for the two high motion search levels, thus it is able to apply level 1 and level 2 searches accordingly. In the ME process, we define COST as the ME cost for designated MV: ( ) (1) where SAD is the Sum of Absolute Difference for the block matching error, R(MV) is the number of bits to code the MV, and is a constant factor, which has the same value as the lambda factor defined in H.264 for determining motion cost during the motion estimation. In order to apply the early level termination method, the predict motion cost is firstly checked. The is defined as: where is the COST of PMV. For each block, the classification strategy is defined as: (2) blocks appear to have uniform motions with small values, which can be motion predicted within a small search range centered on MVP, similarly as most previous early termination strategies ([20, 21, 22, 23]). If the predicted motion cost is larger than the threshold, the expected motion search improvement is considered. A small motion distance difference between the predicted final MV and MVP (which is corresponded to class 2) means that there is a high possibility that even the large area search results in the MV near MVP. Thus the large area search will not improve the ME performance a lot, and a small search range is generally enough to find a suitable nearby MV ([19]). Under this kind of situation, we also only apply the level 0 motion search. For the blocks that need large area search, they are further classified into two conditions. If the block falls into class 3, it is reasonable to consider that a relatively large area search is needed to find a more suitable MV away from MVP. Therefore, an addition of level 1 search is applied for the blocks of class 3. For the blocks of class 4 with high motion property, the necessary search range is the largest, thus all three levels searches are turned on in order to find the accurate MV with large distance. B. Thresholds Selection In the proposed method, three thresholds have to be decided, which are th, and. For th, different values are applied according to the motion estimation mode of current block. The th value of current block is defined as: { (3) { ( ) ( ) (4) where curblock is the current block, and is the final MV of the co-located block. th is the threshold to decide whether the is a reasonable motion cost for current block, while and are thresholds to decide the significance of motion distance between PMV and. According to the experimental results of detection rates in [19], has a high accuracy to estimate the real final MV of current block for most of the time. As it is impossible to obtain the real final MV of current block at the beginning of actual ME process, is applied to predict the actual final MV in order to calculate the motion distance error of the PMV. In addition, since there are two high sub-sampling levels according to the proposed hierarchical motion estimation strategy, two thresholds ( and ) are applied respectively to further classify the current block to decide the suitable search strategy. The setting of thresholds will be further discussed in the next subsection. The process of the proposed early level termination algorithm is shown in Fig. 8. Firstly is calculated according to the PMV of current block (Eqn. (1) & Eqn. (2)). Then the classification strategy (Eqn. (3)) is applied to current block. From Eqn. (3), it is easy to find out that the first class of where is the of co-located block in the previous frame, is the of the previous encoded neighboring block, and is the of the 16x16 mode for current MB. In addition, mode is the current motion estimation mode of the block. Only perform Level 0 search Compute COSTpred Apply Eqn. (3), classfy current block into four classes Class curblock =? 1&2 3 4 Level 0 & Level 1 search All three levels search Fig.8. Data flow of the proposed early level termination scheme.

8 TABLE IV BIT RATE AND TOTAL BANDWIDTH COMPARISON FOR DIFFERENT SELECTIONS OF Sequence =8 =12 =16 =24 =32 Sequence +3.11% +3.13% +3.18% +3.29% +3.36% +2.06% +2.24% +2.62% +3.75% +6.63% +0.43% +0.60% +0.83% +2.33% +2.97% +0.07% +0.09% +0.12% +3.17% +5.04% +0.49% +0.54% +0.56% +1.63% +3.45% According to the simulation results in [20], the motion cost of the best search point for the 16x16 mode remains similar between consecutive frames and provides a good basis for predicting the of current frame MB, while the of the 16x8, 8x16, and 8x8 modes are highly correlated with the 16x16 mode of the same MB. So the threshold value can be defined from a combination of the motion cost values of the most correlated blocks according to current block mode. Since the search range of level 0 is 7, the threshold for motion distance prediction can be set as 7 for. For level 1 with the sub-sampling search (SR 32) centered on (0, 0), a series of thresholds for are tested by encoding five beyond HD sequences. And the increasing of bit rate compared with full search, as well as the total bandwidth is examined, as is shown in Table IV. From this table, we can find that an increasing of will lead to a further reduction of total bandwidth but with a higher quality loss. When the value of is small, such as 8, an increasing of will avoid more redundant level 2 searches, which means that the quality loss is small and negligible. However, if the value of grows to be large, such as 32 in Table IV, more and more necessary level 2 searches will be terminated, which leads to a drastic increasing of the bit rate and quality loss. Although different values can be chosen according to the requirement, we chose 16 for in our experiment in order to achieve a considerable bandwidth reduction while ensuring the ME performance. C. Early Refinement Termination Total bandwidth (M Bytes) =8 =12 =16 =24 = In the proposed method, the early refinement termination is further applied to reduce the bandwidth. By applying the refining strategy for the sub-sampling levels in the proposed TABLE V THE HITTING RATES OF THE EARLY REFINEMENT TERMINATION ALGORITHM Hitting rate (%) Sequence Level 1 Level % 93.56% 90.48% 91.67% hierarchical ME scheme, the MV from the coarse search will be further refined to the integer-pixel level and the ME quality will be much better. However, the refining process reuses the level 0 module and it has to read the original values without sub-sampling from DRAM if the level 0 data buffer misses the needed data. If the refining process does not improve the final result of coarse search, this part of data without sub-sampling cannot be reused in fraction ME part since the low level search MV is chosen as the final result of integer ME, which causes extra bandwidth. In order to avoid unnecessary refining search and reduce this part of DRAM traffic, an early refinement termination strategy is proposed. In the process of high level search, the coarse search result will be transmitted to the local refining search. After the coarse search based on the sub-sampling values, the motion cost of the coarse MV is compared to the level 0 fine search result. And the refinement will be terminated if the of the coarse MV is larger than that of the MV resulted from low level search, as the level 0 search is considered to be enough to predict the motion without causing too much quality loss. Table V shows the hitting rates for the early refinement termination algorithm based on the encoding of two sequences. In the experiment, the total times when the early refinement termination condition is satisfied are calculated separately for level 1 and level 2 searches. And for each high level, the hitting times are also calculated when the refinement termination condition is satisfied and the motion cost of final MV in the high level even with refining strategy also appears more than that of the MV resulted from low level search. The hitting rates are calculated by comparing the hitting times to the total times for each high level. Table V shows that the proposed method has high hitting rates and has the ability to avoid most unnecessary refining searches successfully. V. EXPERIMENTAL RESULTS A. Simulation Condition In order to evaluate the performance of the total proposed scheme, the simulation is done based on JM 15.1 for typical sequences. Since we don t have SHV sized sequences, the two QFHD sized sequences (crowdrun and parkjoy) are expanded by the bilinear interpolation method to generate the 4320p sequences. In the experiment, it is hard to run the full 4320p sequences due to the huge requirement of computer memory, thus we only encode the middle 4320x2160 areas of the two 4320p sequences. The encoding parts have the same motion distances as the original sequences, and the basic method of the proposed hierarchical ME algorithm is mainly influenced by the motions rather than the picture definition. Therefore, this experimental strategy is expected to be able to

9 simulate the performance of the proposed scheme. In addition to the two 4320p sequences, three 2160p sequences as well as several 720p and 1080p sequences are also simulated. According to [1], the SHV sequences have relatively higher motions than the lower definition sequences with similar video contents. Therefore, several frames of these test sequences are skipped in order to simulate the motion distance of SHV sequences. Since the MV precision of SHV sequences is doubled compared with QFHD sized sequences, each 2160p sequence is estimated to skip 2 frames in order to achieve similar motion vector distance. Similarly, according to the MV scaling compared with the SHV sequences, the number of skipping frames is estimated to be about 4 for 1080p and 8 for 720p sequences. Therefore, we select a skipping of 2 frames for 2160p sequences, while a skipping of 4 frames is selected for 1080p sequences and a skipping of 8 frames is selected for the 720p sequences. Currently we have got all of the sequences with the format of 4:2:0 color sampling, 8 bpp (bits per pixel) and modeled the simulation based on this condition. Since the basic method of the proposed hierarchical ME and FRC algorithm is not influenced by the frame format, it is expected to be extended to the other conditions. In the experiment, all of the sequences are encoded by 15 frames with IPPP frame structure and RDO off. B. Cache Organization In order to realize the total proposed scheme, two kinds of caches are selected as pre-fetch buffers for storing the reference data (level 0~2 data buffer in Fig. 7) as well as the partition lengths (length buffer 0~1 in Fig. 7). For the data buffers to store the decompressed reference data for three levels, the cache organization shown in Fig. 9 is 6 groups 6 groups y 6 groups TABLE VI BUFFER SIZE FOR THREE LEVELS Memory Cost Data Buffer size Length Buffer size Level 0 (Kbyte) Level 1 (Kbyte) Level 2 (Kbyte) Total (Kbyte) Total Direct design Saving (%) % x Fig. 9. Cache organization for data buffers. The cache size is 6x6 groups for level 0 and level 1, and 18x18 groups for level 2. 6 groups applied. It is set to have 6x6 groups for level 0 and level 1, and 18x18 groups for level 2. According to our experiment, this kind of organization can reach the best bandwidth result. As a result, the remainder of group address divided by the cache size and the partition address inside the group are used as index to locate each partition, while the quotient is tag to judge whether this location stores the partition that is wanted. Two fully associative caches based on FIFO (first in first out) are applied to store lengths of compressed partitions in level 0 and level 1. For each level, since lengths of other partitions in the same group are always used to calculate the start address, the partition lengths in one group are processed as a unit. To increase the efficiency for each DRAM access, length information in every 4 horizontal groups is fetched to update the designated length cache when a miss is detected. Table VI shows the sizes of the three levels data buffers as well as the length buffers of three levels, indicating only 1% cache size increasing for length buffering. And the proposed ME scheme can save considerable on chip memory compared with direct design. C. Performance of Frame Recompression Scheme First of all, the lossless frame recompression scheme is integrated to the proposed hierarchical ME algorithm and the performance is evaluated. Until now there are no FRC schemes based on hierarchical ME scheme, hence we compare the performance of the total ME scheme with the designated RFC strategy integrated. Since the FRC scheme proposed in [12] aims to be integrated into FS ME scheme, the total performance of the FRC integrated motion search can be experimented. As is shown in Table VII, although the FRC scheme in [12] can reduce a considerable amount of bandwidth for FS video encoder, the total bandwidth of the ME scheme is still too large for beyond HD encoder chip design. Therefore, we propose the TABLE VII PSNR, BIT RATE AND BANDWIDTH COMPARISON OF FRC INTEGRATED ME SCHEME FS ME + FRC in [12] Proposed Her. ME + FRC Sequences Night (720p) Shields (720p) Crosswalk Parkscene Woman PSN R inc. Total Bandw idth (M Bytes) PSN R inc. Total Bandw idth (M Bytes) % % % % % % % % % %

10 hierarchical ME algorithm combined with the hierarchical data recompression strategy in order to reduce more DRAM bandwidth. And according to the results in Table VII, the proposed FRC integrated hierarchical ME algorithm is able to achieve much lower bandwidth than the former scheme. The quality loss of the proposed algorithm compared with FS based scheme comes from the hierarchical ME algorithm in order to achieve hardware efficiency. Since the proposed FRC scheme is based on lossless compression, there is no quality loss happening for frame recompression. D. Performance of Early Level Termination The performance of the early level termination strategy is tested by integrating it to the proposed hierarchical ME scheme with lossless frame recompression. In the comparison, the experimental results of the increasing of PSNR values and bit rate compared with FS, as well as the total bandwidth are examined, as is shown in Table VIII. In addition, the bandwidth reduction rate ( BW) is also calculated by comparing the final total scheme with the early level termination method to the original proposed scheme without terminating strategy. According to the results of Table VIII, the proposed early level termination strategy achieves a further 20% bandwidth reduction with sacrificing little quality loss when integrated into the proposed ME scheme. Although the quality loss becomes larger when dramatically high motion occurs for the sequences of shield and woman, this kind of quality loss is still acceptable and the total scheme also achieves the best ME performance in comparison with the other previous works in the following discussion. E. Performance comparison of Total Scheme The performance of the total scheme which includes the hierarchical ME structure, the frame recompression scheme as TABLE VIII PSNR, BIT RATE AND BANDWIDTH COMPARISON BY ADDING THE EARLY LEVEL TERMINATION STRATEGY Proposed Her. ME + FRC Proposed Her. ME + FRC + ELT Sequences Night (720p) Shields (720p) Crosswalk Parkscene Woman PSN R inc. Total Bandw idth (M Bytes) PSN R inc. Total Bandw idth (M Bytes) % % % % % % % % % % % % % % % % % % % % BW -20.6% -21.4% -21.7% -21.9% -22.8% -18.8% -18.3% -21.7% -21.4% -19.6% well as the early level termination strategy is tested by comparing it to CBME and PMRME. The total proposed scheme and the two former works are implemented and the coding results of PSNR values, bit rate as well as bandwidth are compared with FS, as is shown in Table IX. The experiment is modeled based on QP (quantization parameter) value of 24. According to the experiment results, the following two observations can be listed by comparing the ME performance and total bandwidth separately: (1) According to the experimental results, the performance of CBME is not as good as the proposed one due to limited Sequences Night (720p) Shields (720p) Crosswalk Parkscene Woman TABLE IX PSNR, BIT RATE AND TOTAL BANDWIDTH COMPARED WITH FULL SEARCH (QP: 24; RDO: OFF; FRAME STRUCTURE: IPPP) Direct Design CBME PMRME Proposed Total Scheme Bit Total PSNR Total PSNR Total PSNR rate Bandwidth inc. Bandwidth inc. Bandwidth inc. inc. (M Bytes) (M Bytes) (M Bytes) (%) PSN R inc. Total Bandwidth (M Bytes) % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %

11 60% 50% 40% 30% 20% 10% 0% CBME PMRME Proposed 40% 35% 30% 25% 20% 15% 10% 5% 0% Total BW/BW of Dir. Design (%) CBME PMRME Proposed (a). Comparison of bit rate increasing. (b). Comparison of total bandwidth. Fig.10. Comparison of bit rate increasing and total bandwidth between CBME, PMRME and the proposed scheme. prediction range. The results show that the bit rate increasing of CBME gets rather large when high motion occurs, which leads to the huge reduction of coding efficiency for the total encoder system. Since the proposed scheme aims to ensure the ME quality for high motions that exist in beyond HD sequences, it proposes the hierarchical ME strategy with the integration of large area searches. As a result, it is able to achieve a much less bit rate increasing than CBME for high motion sequences. In addition, the proposed algorithm applies the refining strategy for high levels during the hierarchical ME process. Hence it also outperforms PMRME by comparing the ME quality. Fig. 10 (a) shows the comparison of bit rate increasing between CBME, PMRME and the proposed scheme for different sequences, according to the results in Table IX. (2) For the bandwidth, the proposed total scheme achieves more than 80% bandwidth reduction, and it outperforms PMRME in bandwidth reduction according to the experimental results. The bandwidth is more than that of CBME for some sequences because the proposed scheme applies much larger search range in order to ensure the ME performance for high motions with sacrificing some bandwidth consumption. Fig. 10 (b) shows the rates of total bandwidths of three methods compared with the ones of direct design for different sequences. Besides, since the proposed ME architecture always stores the original reference frames in the low level, it is very suitable to apply the bit truncation method to the average values in high levels without causing any drift error between reference frames. Therefore, the bandwidth will be further reduced in our future work. F. Performance of the total scheme with different QP values The total proposed scheme is simulated with different QP values for all of the 4320p and 2160p sequences. Table X shows the increasing of PSNR values and bit rate, as well as total bandwidth reduction by comparing the proposed scheme to direct design for QP values of 24, 28, 32, 36. When QP value rises, the details of reference frames will lose. Hence the compression for the frames becomes easier. The Sequences TABLE XI BDPSNR AND BD BIT RATE COMPARISON CBME PMRME Proposed BD bit rate (%) BDPS NR BDPS NR BD bit rate (%) BDPSN R BD bit rate (%) Sequences PSNR inc. TABLE X PSNR, BIT RATE AND TOTAL BANDWIDTH COMPARED WITH FULL SEARCH (QP: 24; 28;32;36) QP=24 QP=28 QP=32 QP=36 Bit Bandwi PSNR Bandwidth PSNR Bandwidth PSNR rate dth inc. Reduction inc. Reduction inc. inc. Reducti (%) (%) (%) on (%) Bandwidth Reduction (%)

12 experimental results show that the bandwidth reduction of the proposed scheme grows as QP value becomes high. On the other hand, the ME quality degrades due to the increasing of blocky effect for higher QP, which is also described in [10]. With the PSNR and bit rate results of 4 QP values, we can calculate the BDPSNR and BD bit rate by getting the average differences between the FS scheme and the proposed one, according to the method in [25]. Then we also calculate the values of CBME and PMRME by modeling the simulations based on 4 QP values for three sequences. Finally we can compare the calculation results between the three schemes, as is shown in TABLE XI. From the table, the proposed scheme can also achieve better ME performance than the two former works in the comparison of combining the RD costs together. G. Subjective Comparison In the subjective comparison, the total scheme as well as two former works is encoded and the typical reconstructed frames of three sequences are compared. According to the figures in Fig. 11, there are no distinct differences between the three works. From the experimental results shown in Table IX, the PSNR values of three schemes are also very close. Therefore, the subjective and objective TABLE XII THE HARDWARE CLOCK CYCLES COMPARISON Sequences CBME PMRME Proposed Clock cycles for each MB > differences between the three schemes are not distinctive. So the main differences of three schemes are the final coding bit rates. As is concerned in the performance comparison, the total proposed scheme is able to restrain the total coding bit rate increasing especially for high motion sequences and achieves better ME performance than the two former works. H. Total running time comparison Since the proposed algorithm is hardware-oriented, in our work, the JM software is only used to model the coding performance and bandwidth consumption, but not optimized for execution time reduction. So we did not directly compare the software running time. Instead, we calculate the expected hardware clock cycles of the proposed ME scheme and compare it to the two previous works. The bottleneck of the running clock cycles for the proposed scheme is the ME calculation for each MB since the whole flow CBME PMRME Proposed Fig.11. (a). Sequence: Woman, 16 th frame, QP: 24. CBME PMRME Proposed Fig.11. (b). Sequence:, 11 th frame, QP: 24.. CBME PMRME Proposed Fig.11. (c). Sequence:, 3 rd frame, QP: 24. Fig.11. Comparison of reconstructed frames between CBME, PMRME and the proposed scheme. (QP: 24)

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING Dieison Silveira, Guilherme Povala,

More information

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN BANDWIDTH REDUCTION SCHEMES FOR MPEG- TO H. TRANSCODER DESIGN Xianghui Wei, Wenqi You, Guifen Tian, Yan Zhuang, Takeshi Ikenaga, Satoshi Goto Graduate School of Information, Production and Systems, Waseda

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

FRAME-LEVEL QUALITY AND MEMORY TRAFFIC ALLOCATION FOR LOSSY EMBEDDED COMPRESSION IN VIDEO CODEC SYSTEMS

FRAME-LEVEL QUALITY AND MEMORY TRAFFIC ALLOCATION FOR LOSSY EMBEDDED COMPRESSION IN VIDEO CODEC SYSTEMS FRAME-LEVEL QUALITY AD MEMORY TRAFFIC ALLOCATIO FOR LOSSY EMBEDDED COMPRESSIO I VIDEO CODEC SYSTEMS Li Guo, Dajiang Zhou, Shinji Kimura, and Satoshi Goto Graduate School of Information, Production and

More information

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration , pp.517-521 http://dx.doi.org/10.14257/astl.2015.1 Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration Jooheung Lee 1 and Jungwon Cho 2, * 1 Dept. of

More information

IN RECENT years, multimedia application has become more

IN RECENT years, multimedia application has become more 578 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 5, MAY 2007 A Fast Algorithm and Its VLSI Architecture for Fractional Motion Estimation for H.264/MPEG-4 AVC Video Coding

More information

High Efficiency Video Coding. Li Li 2016/10/18

High Efficiency Video Coding. Li Li 2016/10/18 High Efficiency Video Coding Li Li 2016/10/18 Email: lili90th@gmail.com Outline Video coding basics High Efficiency Video Coding Conclusion Digital Video A video is nothing but a number of frames Attributes

More information

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION Yen-Chieh Wang( 王彥傑 ), Zong-Yi Chen( 陳宗毅 ), Pao-Chi Chang( 張寶基 ) Dept. of Communication Engineering, National Central

More information

Stereo Image Compression

Stereo Image Compression Stereo Image Compression Deepa P. Sundar, Debabrata Sengupta, Divya Elayakumar {deepaps, dsgupta, divyae}@stanford.edu Electrical Engineering, Stanford University, CA. Abstract In this report we describe

More information

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Chuan-Yung Tsai, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute

More information

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ)

MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) 5 MRT based Adaptive Transform Coder with Classified Vector Quantization (MATC-CVQ) Contents 5.1 Introduction.128 5.2 Vector Quantization in MRT Domain Using Isometric Transformations and Scaling.130 5.2.1

More information

Block-Matching based image compression

Block-Matching based image compression IEEE Ninth International Conference on Computer and Information Technology Block-Matching based image compression Yun-Xia Liu, Yang Yang School of Information Science and Engineering, Shandong University,

More information

Introduction to Video Compression

Introduction to Video Compression Insight, Analysis, and Advice on Signal Processing Technology Introduction to Video Compression Jeff Bier Berkeley Design Technology, Inc. info@bdti.com http://www.bdti.com Outline Motivation and scope

More information

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame consists of two interlaced fields, giving a field rate of 50

More information

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About

More information

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Journal of Computational Information Systems 7: 8 (2011) 2843-2850 Available at http://www.jofcis.com High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Meihua GU 1,2, Ningmei

More information

Fast Mode Decision for H.264/AVC Using Mode Prediction

Fast Mode Decision for H.264/AVC Using Mode Prediction Fast Mode Decision for H.264/AVC Using Mode Prediction Song-Hak Ri and Joern Ostermann Institut fuer Informationsverarbeitung, Appelstr 9A, D-30167 Hannover, Germany ri@tnt.uni-hannover.de ostermann@tnt.uni-hannover.de

More information

Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC

Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC 0 Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC Ching-Yeh Chen Shao-Yi Chien Yu-Wen Huang Tung-Chien Chen Tu-Chih Wang and Liang-Gee Chen August 16 2005 1 Manuscript

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

Fast frame memory access method for H.264/AVC

Fast frame memory access method for H.264/AVC Fast frame memory access method for H.264/AVC Tian Song 1a), Tomoyuki Kishida 2, and Takashi Shimamoto 1 1 Computer Systems Engineering, Department of Institute of Technology and Science, Graduate School

More information

Introduction to Video Encoding

Introduction to Video Encoding Introduction to Video Encoding INF5063 23. September 2011 History of MPEG Motion Picture Experts Group MPEG1 work started in 1988, published by ISO in 1993 Part 1 Systems, Part 2 Video, Part 3 Audio, Part

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute of Electronics Engineering, National

More information

The Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store

The Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store The Power and Bandwidth Advantage of an H.264 IP Core with 8-16:1 Compressed Reference Frame Store Building a new class of H.264 devices without external DRAM Power is an increasingly important consideration

More information

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46 LIST OF TABLES TABLE Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46 Table 5.2 Macroblock types 46 Table 5.3 Inverse Scaling Matrix values 48 Table 5.4 Specification of QPC as function

More information

4G WIRELESS VIDEO COMMUNICATIONS

4G WIRELESS VIDEO COMMUNICATIONS 4G WIRELESS VIDEO COMMUNICATIONS Haohong Wang Marvell Semiconductors, USA Lisimachos P. Kondi University of Ioannina, Greece Ajay Luthra Motorola, USA Song Ci University of Nebraska-Lincoln, USA WILEY

More information

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical

More information

Star Diamond-Diamond Search Block Matching Motion Estimation Algorithm for H.264/AVC Video Codec

Star Diamond-Diamond Search Block Matching Motion Estimation Algorithm for H.264/AVC Video Codec Star Diamond-Diamond Search Block Matching Motion Estimation Algorithm for H.264/AVC Video Codec Satish Kumar Sahu 1* and Dolley Shukla 2 Electronics Telecommunication Department, SSTC, SSGI, FET, Junwani,

More information

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 198 Reduced 4x4 Block Intra Prediction Modes using Directional

More information

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV Jeffrey S. McVeigh 1 and Siu-Wai Wu 2 1 Carnegie Mellon University Department of Electrical and Computer Engineering

More information

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION Low Complexity H.264 Video Encoding Paula Carrillo, Hari Kalva, and Tao Pin. Dept. of Computer Science and Technology,Tsinghua University, Beijing, China Dept. of Computer Science and Engineering, Florida

More information

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Course Presentation Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology Video Coding Correlation in Video Sequence Spatial correlation Similar pixels seem

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS Theepan Moorthy and Andy Ye Department of Electrical and Computer Engineering Ryerson

More information

Motion Estimation for Video Coding Standards

Motion Estimation for Video Coding Standards Motion Estimation for Video Coding Standards Prof. Ja-Ling Wu Department of Computer Science and Information Engineering National Taiwan University Introduction of Motion Estimation The goal of video compression

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

2014 Summer School on MPEG/VCEG Video. Video Coding Concept 2014 Summer School on MPEG/VCEG Video 1 Video Coding Concept Outline 2 Introduction Capture and representation of digital video Fundamentals of video coding Summary Outline 3 Introduction Capture and representation

More information

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot Parallel Implementation Algorithm of Motion Estimation for GPU Applications by Tian Song 1,2*, Masashi Koshino 2, Yuya Matsunohana 2 and Takashi Shimamoto 1,2 Abstract The video coding standard H.264/AVC

More information

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Y. Vatis, B. Edler, I. Wassermann, D. T. Nguyen and J. Ostermann ABSTRACT Standard video compression techniques

More information

Sample Adaptive Offset Optimization in HEVC

Sample Adaptive Offset Optimization in HEVC Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Sample Adaptive Offset Optimization in HEVC * Yang Zhang, Zhi Liu, Jianfeng Qu North China University of Technology, Jinyuanzhuang

More information

Digital Video Processing

Digital Video Processing Video signal is basically any sequence of time varying images. In a digital video, the picture information is digitized both spatially and temporally and the resultant pixel intensities are quantized.

More information

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC Hamid Reza Tohidypour, Mahsa T. Pourazad 1,2, and Panos Nasiopoulos 1 1 Department of Electrical & Computer Engineering,

More information

Lecture 5: Error Resilience & Scalability

Lecture 5: Error Resilience & Scalability Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides

More information

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Complexity Reduced Mode Selection of H.264/AVC Intra Coding Complexity Reduced Mode Selection of H.264/AVC Intra Coding Mohammed Golam Sarwer 1,2, Lai-Man Po 1, Jonathan Wu 2 1 Department of Electronic Engineering City University of Hong Kong Kowloon, Hong Kong

More information

Texture Compression. Jacob Ström, Ericsson Research

Texture Compression. Jacob Ström, Ericsson Research Texture Compression Jacob Ström, Ericsson Research Overview Benefits of texture compression Differences from ordinary image compression Texture compression algorithms BTC The mother of all texture compression

More information

Research on Transcoding of MPEG-2/H.264 Video Compression

Research on Transcoding of MPEG-2/H.264 Video Compression Research on Transcoding of MPEG-2/H.264 Video Compression WEI, Xianghui Graduate School of Information, Production and Systems Waseda University February 2009 Abstract Video transcoding performs one or

More information

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Upcoming Video Standards Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Outline Brief history of Video Coding standards Scalable Video Coding (SVC) standard Multiview Video Coding

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Fingerprint Image Compression

Fingerprint Image Compression Fingerprint Image Compression Ms.Mansi Kambli 1*,Ms.Shalini Bhatia 2 * Student 1*, Professor 2 * Thadomal Shahani Engineering College * 1,2 Abstract Modified Set Partitioning in Hierarchical Tree with

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn Basic Video Compression Techniques Chapter 10 10.1 Introduction to Video Compression

More information

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames Ki-Kit Lai, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing Department of Electronic and Information Engineering

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

Motion Vector Coding Algorithm Based on Adaptive Template Matching

Motion Vector Coding Algorithm Based on Adaptive Template Matching Motion Vector Coding Algorithm Based on Adaptive Template Matching Wen Yang #1, Oscar C. Au #2, Jingjing Dai #3, Feng Zou #4, Chao Pang #5,Yu Liu 6 # Electronic and Computer Engineering, The Hong Kong

More information

CMPT 365 Multimedia Systems. Media Compression - Video

CMPT 365 Multimedia Systems. Media Compression - Video CMPT 365 Multimedia Systems Media Compression - Video Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Introduction What s video? a time-ordered sequence of frames, i.e.,

More information

MPEG-4: Simple Profile (SP)

MPEG-4: Simple Profile (SP) MPEG-4: Simple Profile (SP) I-VOP (Intra-coded rectangular VOP, progressive video format) P-VOP (Inter-coded rectangular VOP, progressive video format) Short Header mode (compatibility with H.263 codec)

More information

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform Circuits and Systems, 2010, 1, 12-17 doi:10.4236/cs.2010.11003 Published Online July 2010 (http://www.scirp.org/journal/cs) Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block

More information

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK Professor Laurence S. Dooley School of Computing and Communications Milton Keynes, UK How many bits required? 2.4Mbytes 84Kbytes 9.8Kbytes 50Kbytes Data Information Data and information are NOT the same!

More information

In the name of Allah. the compassionate, the merciful

In the name of Allah. the compassionate, the merciful In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei Room: CE 315 Department of Computer Engineering Sharif University of Technology E-Mail: skasaei@sharif.edu Webpage:

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

An Efficient Mode Selection Algorithm for H.264

An Efficient Mode Selection Algorithm for H.264 An Efficient Mode Selection Algorithm for H.64 Lu Lu 1, Wenhan Wu, and Zhou Wei 3 1 South China University of Technology, Institute of Computer Science, Guangzhou 510640, China lul@scut.edu.cn South China

More information

Error Concealment Used for P-Frame on Video Stream over the Internet

Error Concealment Used for P-Frame on Video Stream over the Internet Error Concealment Used for P-Frame on Video Stream over the Internet MA RAN, ZHANG ZHAO-YANG, AN PING Key Laboratory of Advanced Displays and System Application, Ministry of Education School of Communication

More information

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala Tampere University of Technology Korkeakoulunkatu 1, 720 Tampere, Finland ABSTRACT In

More information

Semi-Hierarchical Based Motion Estimation Algorithm for the Dirac Video Encoder

Semi-Hierarchical Based Motion Estimation Algorithm for the Dirac Video Encoder Semi-Hierarchical Based Motion Estimation Algorithm for the Dirac Video Encoder M. TUN, K. K. LOO, J. COSMAS School of Engineering and Design Brunel University Kingston Lane, Uxbridge, UB8 3PH UNITED KINGDOM

More information

Professor, CSE Department, Nirma University, Ahmedabad, India

Professor, CSE Department, Nirma University, Ahmedabad, India Bandwidth Optimization for Real Time Video Streaming Sarthak Trivedi 1, Priyanka Sharma 2 1 M.Tech Scholar, CSE Department, Nirma University, Ahmedabad, India 2 Professor, CSE Department, Nirma University,

More information

Rate Distortion Optimization in Video Compression

Rate Distortion Optimization in Video Compression Rate Distortion Optimization in Video Compression Xue Tu Dept. of Electrical and Computer Engineering State University of New York at Stony Brook 1. Introduction From Shannon s classic rate distortion

More information

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB Memory Technology Caches 1 Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per GB Ideal memory Average access time similar

More information

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER Zong-Yi Chen, Jiunn-Tsair Fang 2, Tsai-Ling Liao, and Pao-Chi Chang Department of Communication Engineering, National Central

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic

More information

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC Randa Atta, Rehab F. Abdel-Kader, and Amera Abd-AlRahem Electrical Engineering Department, Faculty of Engineering, Port

More information

H.264 STANDARD BASED SIDE INFORMATION GENERATION IN WYNER-ZIV CODING

H.264 STANDARD BASED SIDE INFORMATION GENERATION IN WYNER-ZIV CODING H.264 STANDARD BASED SIDE INFORMATION GENERATION IN WYNER-ZIV CODING SUBRAHMANYA MAIRA VENKATRAV Supervising Professor: Dr. K. R. Rao 1 TABLE OF CONTENTS 1. Introduction 1.1. Wyner-Ziv video coding 1.2.

More information

View Synthesis for Multiview Video Compression

View Synthesis for Multiview Video Compression View Synthesis for Multiview Video Compression Emin Martinian, Alexander Behrens, Jun Xin, and Anthony Vetro email:{martinian,jxin,avetro}@merl.com, behrens@tnt.uni-hannover.de Mitsubishi Electric Research

More information

VIDEO COMPRESSION STANDARDS

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS Family of standards: the evolution of the coding model state of the art (and implementation technology support): H.261: videoconference x64 (1988) MPEG-1: CD storage (up to

More information

SR college of engineering, Warangal, Andhra Pradesh, India 1

SR college of engineering, Warangal, Andhra Pradesh, India   1 POWER OPTIMIZATION IN SYSTEM ON CHIP BY IMPLEMENTATION OF EFFICIENT CACHE ARCHITECTURE 1 AKKALA SUBBA RAO, 2 PRATIK GANGULY 1 Associate Professor, 2 Senior Research Fellow, Dept. of. Electronics and Communications

More information

FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos

FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos Zhengdong Zhang, Vivienne Sze Massachusetts Institute of Technology http://www.mit.edu/~sze/fast.html 1 Super-Resolution

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison

Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Low Power Set-Associative Cache with Single-Cycle Partial Tag Comparison Jian Chen, Ruihua Peng, Yuzhuo Fu School of Micro-electronics, Shanghai Jiao Tong University, Shanghai 200030, China {chenjian,

More information

A Low Power 720p Motion Estimation Processor with 3D Stacked Memory

A Low Power 720p Motion Estimation Processor with 3D Stacked Memory A Low Power 720p Motion Estimation Processor with 3D Stacked Memory Shuping Zhang, Jinjia Zhou, Dajiang Zhou and Satoshi Goto Graduate School of Information, Production and Systems, Waseda University 2-7

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC Jung-Ah Choi, Jin Heo, and Yo-Sung Ho Gwangju Institute of Science and Technology {jachoi, jinheo, hoyo}@gist.ac.kr

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments 2013 IEEE Workshop on Signal Processing Systems A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264 Vivienne Sze, Madhukar Budagavi Massachusetts Institute of Technology Texas Instruments ABSTRACT

More information

FPGA Provides Speedy Data Compression for Hyperspectral Imagery

FPGA Provides Speedy Data Compression for Hyperspectral Imagery FPGA Provides Speedy Data Compression for Hyperspectral Imagery Engineers implement the Fast Lossless compression algorithm on a Virtex-5 FPGA; this implementation provides the ability to keep up with

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014 Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms Visual Computing Systems Review: mechanisms to reduce aliasing in the graphics pipeline When sampling visibility?! -

More information

Video encoders have always been one of the resource

Video encoders have always been one of the resource Fast Coding Unit Partition Search Satish Lokkoju # \ Dinesh Reddl2 # Samsung India Software Operations Private Ltd Bangalore, India. l l.satish@samsung.com 2 0inesh.reddy@samsung.com Abstract- Quad tree

More information

Ch. 2: Compression Basics Multimedia Systems

Ch. 2: Compression Basics Multimedia Systems Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information

More information

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university Real-Time Buffer Compression Michael Doggett Department of Computer Science Lund university Project 3D graphics project Demo, Game Implement 3D graphics algorithm(s) C++/OpenGL(Lab2)/iOS/android/3D engine

More information

Tutorial T5. Video Over IP. Magda El-Zarki (University of California at Irvine) Monday, 23 April, Morning

Tutorial T5. Video Over IP. Magda El-Zarki (University of California at Irvine) Monday, 23 April, Morning Tutorial T5 Video Over IP Magda El-Zarki (University of California at Irvine) Monday, 23 April, 2001 - Morning Infocom 2001 VIP - Magda El Zarki I.1 MPEG-4 over IP - Part 1 Magda El Zarki Dept. of ICS

More information

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM 1 KALIKI SRI HARSHA REDDY, 2 R.SARAVANAN 1 M.Tech VLSI Design, SASTRA University, Thanjavur, Tamilnadu,

More information

Spline-Based Motion Vector Encoding Scheme

Spline-Based Motion Vector Encoding Scheme Spline-Based Motion Vector Encoding Scheme by Parnia Farokhian A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of

More information

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm International Journal of Engineering Research and General Science Volume 3, Issue 4, July-August, 15 ISSN 91-2730 A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

More information

Scalable Extension of HEVC 한종기

Scalable Extension of HEVC 한종기 Scalable Extension of HEVC 한종기 Contents 0. Overview for Scalable Extension of HEVC 1. Requirements and Test Points 2. Coding Gain/Efficiency 3. Complexity 4. System Level Considerations 5. Related Contributions

More information

5LSE0 - Mod 10 Part 1. MPEG Motion Compensation and Video Coding. MPEG Video / Temporal Prediction (1)

5LSE0 - Mod 10 Part 1. MPEG Motion Compensation and Video Coding. MPEG Video / Temporal Prediction (1) 1 Multimedia Video Coding & Architectures (5LSE), Module 1 MPEG-1/ Standards: Motioncompensated video coding 5LSE - Mod 1 Part 1 MPEG Motion Compensation and Video Coding Peter H.N. de With (p.h.n.de.with@tue.nl

More information

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Image Compression for Mobile Devices using Prediction and Direct Coding Approach Image Compression for Mobile Devices using Prediction and Direct Coding Approach Joshua Rajah Devadason M.E. scholar, CIT Coimbatore, India Mr. T. Ramraj Assistant Professor, CIT Coimbatore, India Abstract

More information

Video Coding Using Spatially Varying Transform

Video Coding Using Spatially Varying Transform Video Coding Using Spatially Varying Transform Cixun Zhang 1, Kemal Ugur 2, Jani Lainema 2, and Moncef Gabbouj 1 1 Tampere University of Technology, Tampere, Finland {cixun.zhang,moncef.gabbouj}@tut.fi

More information

Anatomy of a Video Codec

Anatomy of a Video Codec Anatomy of a Video Codec The inner workings of Ogg Theora Dr. Timothy B. Terriberry Outline Introduction Video Structure Motion Compensation The DCT Transform Quantization and Coding The Loop Filter Conclusion

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

Video Compression MPEG-4. Market s requirements for Video compression standard

Video Compression MPEG-4. Market s requirements for Video compression standard Video Compression MPEG-4 Catania 10/04/2008 Arcangelo Bruna Market s requirements for Video compression standard Application s dependent Set Top Boxes (High bit rate) Digital Still Cameras (High / mid

More information