Low-Complexity Block-Based Motion Estimation via One-Bit Transforms

Similar documents
Rate Control for an Embedded Wavelet Video Coder

signal-to-noise ratio (PSNR), 2

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Motion Estimation Using Low-Band-Shift Method for Wavelet-Based Moving-Picture Coding

A 3-D Virtual SPIHT for Scalable Very Low Bit-Rate Embedded Video Compression

SIGNAL COMPRESSION. 9. Lossy image compression: SPIHT and S+P

Modified SPIHT Image Coder For Wireless Communication

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

DIGITAL video compression is essential for the reduction. Two-Bit Transform for Binary Block Motion Estimation

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

Vidhya.N.S. Murthy Student I.D Project report for Multimedia Processing course (EE5359) under Dr. K.R. Rao

Variable Temporal-Length 3-D Discrete Cosine Transform Coding

Multiframe Blocking-Artifact Reduction for Transform-Coded Video

Optimized Progressive Coding of Stereo Images Using Discrete Wavelet Transform

A deblocking filter with two separate modes in block-based video coding

Optimal Estimation for Error Concealment in Scalable Video Coding

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

FPGA IMPLEMENTATION OF BIT PLANE ENTROPY ENCODER FOR 3 D DWT BASED VIDEO COMPRESSION

Embedded Descendent-Only Zerotree Wavelet Coding for Image Compression

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

Motion Estimation. Original. enhancement layers. Motion Compensation. Baselayer. Scan-Specific Entropy Coding. Prediction Error.

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

An Efficient Context-Based BPGC Scalable Image Coder Rong Zhang, Qibin Sun, and Wai-Choong Wong

Fully Scalable Wavelet-Based Image Coding for Transmission Over Heterogeneous Networks

Mesh Based Interpolative Coding (MBIC)

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

DUE to the high computational complexity and real-time

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Error Protection of Wavelet Coded Images Using Residual Source Redundancy

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

MANY image and video compression standards such as

Implementation and analysis of Directional DCT in H.264

Adaptive Quantization for Video Compression in Frequency Domain

A High Sensitive and Fast Motion Estimation for One Bit Transformation Using SSD

Homogeneous Transcoding of HEVC for bit rate reduction

Video Compression System for Online Usage Using DCT 1 S.B. Midhun Kumar, 2 Mr.A.Jayakumar M.E 1 UG Student, 2 Associate Professor

A LOW-COMPLEXITY MULTIPLE DESCRIPTION VIDEO CODER BASED ON 3D-TRANSFORMS

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Visually Improved Image Compression by using Embedded Zero-tree Wavelet Coding

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

EXPLORING ON STEGANOGRAPHY FOR LOW BIT RATE WAVELET BASED CODER IN IMAGE RETRIEVAL SYSTEM

Bit-Plane Decomposition Steganography Using Wavelet Compressed Video

Compression of Stereo Images using a Huffman-Zip Scheme

Very Low Bit Rate Color Video

Reconstruction PSNR [db]

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

Video compression with 1-D directional transforms in H.264/AVC

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

ANALYSIS OF SPIHT ALGORITHM FOR SATELLITE IMAGE COMPRESSION

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

An Efficient Mode Selection Algorithm for H.264

VIDEO streaming applications over the Internet are gaining. Brief Papers

Rate Distortion Optimization in Video Compression

MRT based Fixed Block size Transform Coding

Fast Progressive Image Coding without Wavelets

Data Hiding in Video

10.2 Video Compression with Motion Compensation 10.4 H H.263

CMPT 365 Multimedia Systems. Media Compression - Video

An Embedded Wavelet Video. Set Partitioning in Hierarchical. Beong-Jo Kim and William A. Pearlman

Stereo Image Compression

Layered Self-Identifiable and Scalable Video Codec for Delivery to Heterogeneous Receivers

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

DIGITAL IMAGE PROCESSING WRITTEN REPORT ADAPTIVE IMAGE COMPRESSION TECHNIQUES FOR WIRELESS MULTIMEDIA APPLICATIONS

CSE237A: Final Project Mid-Report Image Enhancement for portable platforms Rohit Sunkam Ramanujam Soha Dalal

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

Using Shift Number Coding with Wavelet Transform for Image Compression

DCT-BASED IMAGE COMPRESSION USING WAVELET-BASED ALGORITHM WITH EFFICIENT DEBLOCKING FILTER

Reduced Frame Quantization in Video Coding

Adaptive bit-reduced mean absolute difference criterion for block-matching algorithm and its VLSI design

Partial Video Encryption Using Random Permutation Based on Modification on Dct Based Transformation

642 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 5, MAY 2001

Fully Spatial and SNR Scalable, SPIHT-Based Image Coding for Transmission Over Heterogenous Networks

An Embedded Wavelet Video Coder. Using Three-Dimensional Set. Partitioning in Hierarchical Trees. Beong-Jo Kim and William A.

An Optimum Approach for Image Compression: Tuned Degree-K Zerotree Wavelet Coding

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

Multiresolution motion compensation coding for video compression

Low Complexity Block Motion Estimation Using Morphological-based Feature Extraction and XOR Operations

An Operational Rate-Distortion Optimal Single-Pass SNR Scalable Video Coder

CSEP 521 Applied Algorithms Spring Lossy Image Compression

ADCTC: ADVANCED DCT-BASED IMAGE CODER

Embedded Rate Scalable Wavelet-Based Image Coding Algorithm with RPSWS

CS 335 Graphics and Multimedia. Image Compression

Adaptive GOF residual operation algorithm in video compression

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec

Quality versus Intelligibility: Evaluating the Coding Trade-offs for American Sign Language Video

Reversible Wavelets for Embedded Image Compression. Sri Rama Prasanna Pavani Electrical and Computer Engineering, CU Boulder

Fast Motion Estimation for Shape Coding in MPEG-4

FAST AND EFFICIENT SPATIAL SCALABLE IMAGE COMPRESSION USING WAVELET LOWER TREES

An embedded and efficient low-complexity hierarchical image coder

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Wavelet Based Image Compression Using ROI SPIHT Coding

REGION-BASED SPIHT CODING AND MULTIRESOLUTION DECODING OF IMAGE SEQUENCES

A new predictive image compression scheme using histogram analysis and pattern matching

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

ON THE LOW-POWER DESIGN OF DCT AND IDCT FOR LOW BIT-RATE VIDEO CODECS

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

Wavelet Transform (WT) & JPEG-2000

A SCALABLE SPIHT-BASED MULTISPECTRAL IMAGE COMPRESSION TECHNIQUE. Fouad Khelifi, Ahmed Bouridane, and Fatih Kurugollu

Transcription:

702 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 4, AUGUST 1997 [8] W. Ding and B. Liu, Rate control of MPEG video coding and recording by rate-quantization modeling, IEEE Trans. Image Processing, vol. 5, pp. 12 20, Feb. 1996. [9] S.-W. Wu and A. Gersho, Rate-constrained optimal block-adaptive coding for digital tape recording of HDTV, IEEE Trans. Circuits Syst. Video Technol., vol. 1, pp. 100 112, Mar. 1991. [10] J. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Processing, vol. 41, pp. 3445 3462, Dec. 1993. [11] D. Taubman and A. Zakhor, Multirate 3-D subband coding of video, IEEE Trans. Image Processing, vol. 3, pp. 572 588, Sept. 1994. [12] A. Said and W. A. Pearlman, A new, fast and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 243 250, June 1996. [13] K. Ramchandran, Z. Xiong, K. Asai, and M. Vetterli, Adaptive transforms for image coding using spatially-varying wavelet packets, IEEE Trans. Image Processing, 1995, submitted. [14] J. Li, P.-Y. Cheng, and C.-C. J. Kuo, On the improvements of embedded zerotree wavelet (EZW) coding, in SPIE: Visual Communication and Image Processing 95, Taipei, Taiwan, May 1995, vol. 2501, pp. 1490 1501. [15] J. Li, J. Li, and C.-C. J. Kuo, An embedded DCT approach to progressive image compression, in IEEE Int. Conf. Image Processing, Lausanne, Switzerland, Sept. 1996, pp. I:201 I:205. [16] Z. Xiong, O. Guleryuz, and M. T. Orchard, A DCT-based embedded image coder, IEEE Signal Processing Lett., vol. 3, pp. 289 290, Nov. 1996. Low-Complexity Block-Based Motion Estimation via One-Bit Transforms Balas Natarajan, Vasudev Bhaskaran, and Konstantinos Konstantinides Abstract We present an algorithm and a hardware architecture for block-based motion estimation that involves transforming video sequences from a multibit to a one-bit/pixel representation and then applying conventional motion estimation search strategies. This results in substantial reductions in arithmetic and hardware complexity and reduced power consumption, while maintaining good compression performance. Experimental results and a custom hardware design using a linear array of processing elements are also presented. Index Terms Architectures, CPU performance, instructions, motion estimation, multimedia, video compression standards. I. INTRODUCTION Digital video is typically stored and transmitted in compressed form conforming to the MPEG standards for motion sequences [1]. These standards utilize block-based motion estimation as a technique for exploiting the temporal redundancy in a sequence of images, thereby achieving increased compression. The simplest abstraction of the motion estimation problem is as follows. Given two blocks of pixels, a source block of size b 2 b and a search window larger than the source block, find the b 2 b subblock in the search window that is closest to the source block. Manuscript received September 30, 1996; revised January 31, 1997. This paper was recommended by Guest Editors B. Sheu, C.-Y. Wu, H.-D. Lin, and M. Ghanbari. The authors are with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA. Publisher Item Identifier S 1051-8215(97)05878-3. The distance between two blocks can be measured by a number of different metrics [2], and typically the l 1 metric (mean absolute deviation) is used. Using this metric and a search strategy, we can evaluate candidate subblocks of the search window to find the subblock that is closest to the source block. The search strategy may be exhaustive search, evaluating each one of the candidate blocks from the search window and selecting the one that is closest in appearance to the source block. Or we may employ faster but approximate strategies, such as logarithmic search [1], to find a subblock that is close in appearance to the source block but is not necessarily the closest. Whatever the search strategy, evaluating the l 1 metric on pixels of full intensity resolution is computationally expensive. To overcome this obstacle, we propose to transform the current and reference frame to frames of binary-valued pixels. We then apply one of the conventional search strategies to these frames. The l 1 metric then amounts to computing the exclusive-or of a sequence of bits and adding up the number of ones in the result. This can result in substantial savings in software implementations as well as reduced complexity and power consumption in hardware implementations. Our experiments show that a careful choice of the one-bit transform can realize these gains with a small sacrifice in compression efficiency. Previously, a one-bit modification of the l 1 metric was proposed in [3], and we will compare our approach to theirs later in this paper. Recently and independently, Feng et al. [4] proposed a onebit transform similar to ours, but exploited it as a preprocessing step to exhaustive search with the l 1 metric. Their approach differs from ours on three counts. 1) They use the block mean as the threshold. However, we have found that the block mean does not offer the best results in our experiments. 2) The complexity of their strategy is roughly six times that of ours. 3) Their strategy is adaptive and not suited for simple hardware implementation at low power consumption. In [5], Mizuki et al. describe a binary block matching architecture where block matching is performed on the binary edge maps of the current and the reference frames. They also present a custom hardware implementation that includes circuitry for edge detection and a two-dimensional (2-D) array of elementary processors, where the number of elementary processors is equal to the number of candidate blocks for full-search motion estimation. Compared to conventional block matching schemes, they estimate that binary block matching for motion estimation reduces the silicon area required by a factor of five. In Section II, we establish the preliminaries and define the problem; In Section III, we give details of the proposed one-bit transform; in Section IV, we present a custom architecture for the one-bit motion estimation strategy, and in Section V, we present experimental results from applying our technique to sample video sequences. II. PRELIMINARIES Let s denote the source block of b 2 b pixels, with s i; j being the pixel at row i and column j. Similarly, let w denote the search window with w i; j being the pixel at row i and column j. The subblock of w at position x; y is denoted by w x; y, and is the block of b 2 b pixels w x+i; y+j, for i =1;2;111;b,j=1;2;111;b. The distance between two blocks u and v can be measured in many metrics, but typically the mean absolute deviation is used. The mean absolute deviation or l 1 metric is given by ku; vk 1 = 1 b 2 i; j ju i; j 0 v i; j j: (1) 1051 8215/97$10.00 1997 IEEE

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 4, AUGUST 1997 703 Fig. 1. Operations for thresholding 8-b frames to 1-b frames. For two one-bit images this metric reduces to ku; vk 1 = 1 b 2 i; j u i; j v i; j (2) where denotes the exclusive-or operation. The problem of motion estimation is to find the position x; y so that the subblock w x; y is closest to the source block s, in that ks; w x; y k is minimum over all subblocks of w. III. THE ONE-BIT STRATEGY We now construct a transform Q that maps a frame of multivalued pixels to a frame of binary-valued pixels. Q is defined with respect to a convolution kernel K and is denoted by Q K. Let F denote a frame and let ^F denote the filtered version of F obtained by applying the convolution kernel K to F. Let G = Q K (F ) be the frame obtained by applying Q K to F. The pixels of G are given by Fig. 2. Flow of operations for 1-b block-based motion estimation. G i; j = 1; if Fi; j ^F i; j 0; otherwise. (3) In this paper, we use the 17 2 17 convolution kernel K given below Fig. 3. range. Pixel coordinates for a 16 2 16 source block and a [08, 7] search K i; j = 1 ; 25 if i; j 2 [1; 4; 8; 12; 16] 0; otherwise. The motivation behind our method rests on the observation that the edges in an image are key to accurate motion estimation. A simple way to extract the edges is to carry out a high-pass thresholding, that is, compare the frame pixel by pixel to a high-pass filtered version of the frame, and threshold the pixels to zero or one, depending on the outcome of the comparison. Unfortunately, this would also cause the thresholded frame to track the high-frequency noise in the original frame. To overcome this, we use band-pass thresholding, wherein the smoothed version is a band-pass filtered version of the original frame, so that the thresholded frame represents the mid-frequency content of the original frame. The convolution kernel that we propose is motivated by this consideration, as well as the need to minimize the number of arithmetic operations. For comparison, [3] uses a block averaging kernel, which corresponds to using low-pass thresholding. The operations for the one-bit transform are shown in Fig. 1. Note that there is no global threshold for all pixels in a frame. For video coding, our one-bit motion estimation strategy consists of the following steps (Fig. 2): 1) apply the one-bit transform Q to both the current frame and the reference frame; 2) use any motion-vector search strategy in combination with the metric defined in (2). (4) IV. THE ARCHITECTURE The proposed one-bit motion estimation strategy is amenable to both single-processor-based and multiprocessor-based implementations. Experimental results using full-search and logarithmic search strategies on a single-processor, 32-b architecture are presented in the next section. In this section, we consider the design and performance of a custom architecture based on a linear array of custom, but simple, processing elements. Without loss of generality, let us consider the architecture for a block-matching full-search motion estimator for blocks of 16 2 16 pixels and a search range of [08, 7] pixels. If we shift the coordinate system so that there are only positive pixel indexes, Fig. 3 shows the pixel coordinates for the 16 2 16 source block and the 31 2 31 search window. We assume that the one-bit transform has been completed and all pixels have binary values. For example, if r i; j denotes a pixel in the source block, then r i; j is either zero or one. Let R i =[r i; 0; r i; 1; 111;r i; 15] denote the ith row of the search block, and V i; j =[v i; j ;v i; j+1 ; 111;v i; j+15 ] denote a 16-b vector from the search window, starting from location (i; j), where i and j 2 [0; 15]. The problem of motion estimation can be expressed as

704 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 4, AUGUST 1997 TABLE I PIXEL FLOW FOR COMPUTING THE FIRST 16 DISTORTIONS USING A 16-PROCESSOR ARRAY Fig. 4. Linear array for motion estimation. TABLE II NUMBER OF OPERATIONS PER PIXEL FOR FULL SEARCH AND LOGARITHMIC SEARCH WITH AND WITHOUT THE ONE-BIT TRANSFORM Fig. 5. Processor architecture for motion estimation. TABLE III AVERAGE PSNR VALUES FOR THE SEARCH STRATEGIES (BLOCKSIZE: 16 216; MOTION-VECTOR SEARCH RANGE: 615 PIXELS; VIDEO FRAMESIZE: 352 2 240) finding k and l 2 [0; 15] for which D(k; l) = 15 i=0 is minimized, where f ()is defined as f (R i;v i+k; l) = 15 j=0 f(r i ;V i+k; l) (5) r i; j v i+k; j+l: (6) That is, the f ( )function computes the number of bits for which there is a match between the R and V binary vectors. From (5), each V vector is used in the computation of multiple distortion values. For example, V 15; 0 (Fig. 3) is used in the computation of 16 distortions, namely D(0; 0), D(1; 0); 111; D(15; 0). Hence, if the V vectors are distributed to multiple processors, then one can compute multiple distortions in parallel. Fig. 4 shows such an implementation using an array of 16 processors. This is similar to the implementation in [6], except that each processor operates on 16-b vectors instead of on 8-b pixels. The architecture of each processor is shown in detail in Fig. 5. The f () function defined in (6) is computed using two 8-b exclusive-or arrays, a dual-port look-up table (LUT) with 256 entries, and a 4-b adder. The look-up table yields the total number of ones (or matches) at the output of each exclusive-or array. One xor-array operates on the eight most significant bits of the R and V vectors and the other one on the eight least significant bits. Table I shows in more detail the data flow of operations on processors PE-0, PE-1, and PE-15 for the computation of the first 16 distortion values. At t =0, only PE-0 is active with binary vectors R 0 and V 0; 0 as inputs. At t =1, PE-0 processes R 1 and V 1; 0, and PE-1 processes R 0 and V 1; 0. Following this approach, D(0; 0), in PE-0, will be ready after t =15, and all the first 16 distortion values will be computed in 16 + 15 cycles. However, as shown in Table I, by using two ports for the search memory, processing of the next set of distortions can begin at t =16. As shown in Fig. 5, a multiplexor in each processor selects the appropriate input from the search memory. The complete set of 256 distortion values can then be computed in 16216+15 = 271 cycles. In contrast, the traditional architecture [6] requires 4111 cycles. Thus, the one-bit transform allows for a roughly 15 : 1 speed improvement. This is consistent with the fact that at each cycle we now process 16 binary pixels instead of one 8-b pixel. For higher throughput, multiple arrays could be used. For example, in pipelined mode, two such arrays (which is equivalent to using a 16-processor array of 32-b processors) could compute all distortions in 128 cycles.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 4, AUGUST 1997 705 Fig. 6. Motion-compensated prediction residual for the Miss America sequence using various search schemes. Consider now the case of motion estimation using a search range of [016, 15] pixels. Then, the 1-b search window is 47 2 47 pixels and we need to compute 1024 distortion values. Since our architecture can compute 16 distortion values in 16 cycles, we can estimate that the 16-processor linear array will require now 16 2 64 + 15 = 1039 cycles to compute all 1024 distortion values. V. EXPERIMENTAL RESULTS Custom architectures may provide the highest level of performance for motion estimation, however, binary block matching schemes are also ideally suited for software-only implementations on a general purpose processor. We studied the performance of the following search strategies on several sequences from the MPEG video test suite. 1) Full8: Full search on 8-b data, l1 metric. 2) Log 8 : Logarithmic search on 8-b data, l1 metric. 3) Full1: Full search after 1-b transform, distance metric of (2). 4) Log 1 : Logarithmic search after 1-b transform, distance metric of (2). Table II shows the computational complexity of the four strategies for two different 32-b architectures. The first one, referred to as 32-b ops, has a native instruction for counting the population of ones in a register. This allows for 32 binary comparisons per instruction. The second one, referred to as 1-b ops, is a traditional one where only one binary comparison per instruction is performed. We also include estimates for the pixel distance criterion (PDC) metric, which is the scheme proposed in [3]. Note that the calculations for the PDC metric in this table is for the full-search scheme. For Full1 and Log 1, the additional expense of the filtering and compare operations of Fig. 1 is also included in the number of operations. From Table II, we note at least a 200-fold reduction in complexity for Log 1 compared to Full8. The effectiveness of each of the search strategies can be measured in

706 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 4, AUGUST 1997 TABLE IV ENTROPY OF MOTION-COMPENSATED DIFFERENCES FOR THE SEARCH STRATEGIES (BLOCKSIZE: 16216; MOTION-VECTOR SEARCH RANGE: 615 PIXELS; VIDEO FRAMESIZE: 352 2 240) searching stage, we select three. For the next stage, each of these three locations is then used as a starting seed. From Fig. 6, Log 1 (3) yields a motion-compensated residual that is comparable to that obtained using the one-bit exhaustive search scheme (Full1). However, in terms of one-bit operations, it has nearly six times lower complexity. In a fixed-rate coder, quantization noise can mask the prediction error resulting from a suboptimum search strategy. This is illustrated in Fig. 7, which shows the output quality of an H.263 decoder, measured in terms of the PSNR, versus bit rate for various search strategies used in macroblock motion estimation. The results were obtained using the Telenor TMN software coder (version 1.5) [7] with a fixed quantizer, fixed frame skipping (one), in arithmetic coding mode, with no advanced prediction modes, and a motion vector search range of 615 pixels. At 28 kb/s, PSNR values for Full8, Full1, Log 8, and Log 1 (3) are 28.48, 27.98, 27.65, and 26.83 db, respectively. Thus, Full1 is only 0.5 db worse than Full8 and the one-bit, three-candidate logarithmic search [Log 1 (3)] yields only 1.65 db lower performance than the exhaustive full-search (Full8) method, while its complexity is 65 times lower. VI. CONCLUSIONS We presented a motion estimation strategy for digital video based on a one-bit transform and gave an architecture for its hardware implementation. The strategy can effectively integrate low complexity search schemes, such as logarithmic search, to obtain complexity reductions as large as 200 fold relative to classical exhaustive search. The complexity reduction can translate into proportionate reduction in power consumption of custom hardware. Experimental results indicate that the reduced arithmetic complexity is accompanied by acceptable levels of performance degradation. REFERENCES Fig. 7. PSNR versus bit rate for an H.263 coder and various search strategies. Test sequence: Foreman, QCIF, 12 frames/s, fixed quantizer. terms of the peak signal-to-noise ratio (PSNR) and entropy values of the motion-compensated difference frames. We compute PSNR as 10 log 10 i; j 255 F i; j 2 db (7) where F is the motion-compensated prediction residual image. Table III shows the PSNR of several video sequences, averaged over 100 motion-compensated difference frames for each sequence. It is clear that the performance of full search after the 1-b transform (Full1) is comparable to or better than that of logarithmic search on 8-b data (Log 8 ). Also, for typical video sequences, the performance of Log 1 compares quite favorably with Full8, considering the 200- fold reduction in complexity. In Table IV, we show the entropy of the motion-compensated difference image. From this table, we note that the one-bit transform scheme with a suboptimum search strategy such as logarithmic search results, on the average, in an 8% increase in the entropy of the motion-compensated prediction residual relative to Full8 at substantially lower complexity. In Fig. 6 we show motion-compensated prediction residuals for several search strategies. To improve the performance of the Log 1 scheme, we also examined simple extensions to this approach, namely a multicandidate logarithmic search scheme. For example, in a threecandidate logarithmic search [Log 1 (3)], instead of selecting a single (usually the one that yields the minimum error) candidate after each [1] V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards, Algorithms and Architectures. Boston: Kluwer, 1995. [2] K. R. Rao and P. Yip, Discrete Cosine Transform Algorithms, Advantages, Applications. New York: Academic, 1990. [3] H. Gharavi and M. Mills, Blockmatching motion estimation algorithms New results, IEEE Trans. Circuits Syst., vol. 37, pp. 649 651, May 1990. [4] J. Feng, K.-T. Lo, H. Mehrpour, and A. E. Karbowiak, Adaptive block matching motion estimation algorithm using bit-plane matching, in IEEE Int. Conf. Image Processing, Washington, DC, 1995, pp. 496 499. [5] M. M. Mizuki, U. Y. Desai, I. Masaki, and A. Chandrakasan, A binary block matching architecture with reduced power consumption and silicon area requirements, in IEEE ICASSP-96, Atlanta, 1996, vol. 6, pp. 3248 3251. [6] K.-M. Yang, M.-T. Sun, and L. Wu, A family of VLSI designs for the motion compensation block-matching algorithm, IEEE Trans. Circuits Syst., vol. 36, pp. 1317 1325, Oct. 1989. [7] Telenor Research, H.263 coder, http://www.nta.no/brukere/dvc/ h263\_software/.