A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8

Similar documents
2016, IJARCSSE All Rights Reserved Page 441

Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT in FPGA Technology

FPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT

DESIGN OF DCT ARCHITECTURE USING ARAI ALGORITHMS

Reconfigurable Architecture and an Algorithm for Scalable And Efficient Orthogonal Approximation of Dct

Orthogonal Approximation of DCT in Video Compressing Using Generalized Algorithm

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

MRT based Fixed Block size Transform Coding

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Video Compression An Introduction

ROI Based Image Compression in Baseline JPEG

Implementation of Discrete Wavelet Transform for Image Compression Using Enhanced Half Ripple Carry Adder

-

A NEW ENTROPY ENCODING ALGORITHM FOR IMAGE COMPRESSION USING DCT

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Multimedia Communications. Transform Coding

Statistical Image Compression using Fast Fourier Coefficients

ECE 533 Digital Image Processing- Fall Group Project Embedded Image coding using zero-trees of Wavelet Transform

AN ANALYTICAL STUDY OF LOSSY COMPRESSION TECHINIQUES ON CONTINUOUS TONE GRAPHICAL IMAGES

ISSN Vol.06,Issue.10, November-2014, Pages:

CS 335 Graphics and Multimedia. Image Compression

THE TRANSFORM AND DATA COMPRESSION HANDBOOK

Design of Area Efficient and High performance of Scalable and Reconfigurable Approximation of DCT

Image Compression Techniques

International Research Journal of Engineering and Technology (IRJET) e-issn:

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

A full-pipelined 2-D IDCT/ IDST VLSI architecture with adaptive block-size for HEVC standard

Efficient Implementation of Low Power 2-D DCT Architecture

Image Compression Algorithm and JPEG Standard

Haar Wavelet Image Compression

Lecture 8 JPEG Compression (Part 3)

IMAGE COMPRESSION USING HYBRID TRANSFORM TECHNIQUE

IMAGE COMPRESSION TECHNIQUES

Transmission and Reception of Image using Encryption and Decryption Technique Anita Khandelwal 1, Bhavneesh Malik 2 1 M.Tech Scholar, RIEM Rohtak

Index. 1. Motivation 2. Background 3. JPEG Compression The Discrete Cosine Transformation Quantization Coding 4. MPEG 5.

An introduction to JPEG compression using MATLAB

Scaled Discrete Cosine Transform (DCT) using AAN Algorithm on FPGA

Image Compression. CS 6640 School of Computing University of Utah

JPEG Compression Using MATLAB

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Digital Image Representation Image Compression

FPGA IMPLEMENTATION OF HIGH SPEED DCT COMPUTATION OF JPEG USING VEDIC MULTIPLIER

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

HYBRID TRANSFORMATION TECHNIQUE FOR IMAGE COMPRESSION

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

Lecture 5: Compression I. This Week s Schedule

Enhancing the Image Compression Rate Using Steganography

VLSI Computational Architectures for the Arithmetic Cosine Transform

A Comparative Study of DCT, DWT & Hybrid (DCT-DWT) Transform

Priyanka Dixit CSE Department, TRUBA Institute of Engineering & Information Technology, Bhopal, India

Chapter 1. Digital Data Representation and Communication. Part 2

Application of Daubechies Wavelets for Image Compression

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

Fundamentals of Video Compression. Video Compression

Comparative Study between DCT and Wavelet Transform Based Image Compression Algorithm

Performance analysis of Integer DCT of different block sizes.

DCT SVD Based Hybrid Transform Coding for Image Compression

Volume 2, Issue 9, September 2014 ISSN

Reversible Wavelets for Embedded Image Compression. Sri Rama Prasanna Pavani Electrical and Computer Engineering, CU Boulder

signal-to-noise ratio (PSNR), 2

Audio Compression Using DCT and DWT Techniques

Department of Electronics and Communication KMP College of Engineering, Perumbavoor, Kerala, India 1 2

Variable Size 2D DCT with FPGA Implementation

Interactive Progressive Encoding System For Transmission of Complex Images

The PackBits program on the Macintosh used a generalized RLE scheme for data compression.

Design of an Area and Power Efficient 8- Point Approximate DCT Architecture Requiring Only 14 Additions

ISSN (ONLINE): , VOLUME-3, ISSUE-1,

CHAPTER 9 INPAINTING USING SPARSE REPRESENTATION AND INVERSE DCT

VLSI Implementation of Daubechies Wavelet Filter for Image Compression

Redundant Data Elimination for Image Compression and Internet Transmission using MATLAB

Final Review. Image Processing CSE 166 Lecture 18

Compression of RADARSAT Data with Block Adaptive Wavelets Abstract: 1. Introduction

A New Approach to Compressed Image Steganography Using Wavelet Transform

DIGITAL IMAGE PROCESSING WRITTEN REPORT ADAPTIVE IMAGE COMPRESSION TECHNIQUES FOR WIRELESS MULTIMEDIA APPLICATIONS

IMAGE COMPRESSION USING HYBRID QUANTIZATION METHOD IN JPEG

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

Lecture 8 JPEG Compression (Part 3)

A Novel VLSI Architecture for Digital Image Compression Using Discrete Cosine Transform and Quantization

A Novel VLSI Architecture for Digital Image Compression using Discrete Cosine Transform and Quantization

IMAGE COMPRESSION TECHNIQUES

Enhanced Implementation of Image Compression using DWT, DPCM Architecture

IMAGE PROCESSING USING DISCRETE WAVELET TRANSFORM

AUDIO COMPRESSION USING WAVELET TRANSFORM

Image Compression using Discrete Wavelet Transform Preston Dye ME 535 6/2/18

Image Gap Interpolation for Color Images Using Discrete Cosine Transform

Scalable Perceptual and Lossless Audio Coding based on MPEG-4 AAC

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Ian Snyder. December 14, 2009

Multimedia Systems Image III (Image Compression, JPEG) Mahdi Amiri April 2011 Sharif University of Technology

HIGH LEVEL SYNTHESIS OF A 2D-DWT SYSTEM ARCHITECTURE FOR JPEG 2000 USING FPGAs

International Journal of Research in Computer and Communication Technology, Vol 4, Issue 11, November- 2015

Performance Comparison between DWT-based and DCT-based Encoders

Digital Watermarking with Copyright Authentication for Image Communication

Design of 2-D DWT VLSI Architecture for Image Processing

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Topic 5 Image Compression

Performance Analysis of Various Transforms Based Methods for ECG Data

Transcription:

Page20 A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8 ABSTRACT: Parthiban K G* & Sabin.A.B ** * Professor, M.P. Nachimuthu M. Jaganathan Engineering College, Erode, India ** PG Scholar, M.P. Nachimuthu M. Jaganathan Engineering College, Erode, India Discrete Cosine Transform (DCT) algorithm is very effective due to its symmetry and simplicity. It is good replacement of Fast Fourier Transform (FFT) due to consideration of real component of image data. This paper introduces an orthogonal approximation for the 32point DCT, using recursive sparse matrix decomposition and makes use of the symmetries of DCT basis vectors. The proposed transformation matrix contains only ones and zeros. Bit shift operations and multiplication operations are absent. The approximate transform of DCT is obtained to meet the low complexity requirements. The proposed image compression algorithm is comprehended using Matlab code. A fully scalable reconfigurable parallel architecture for the computation of approximate DCT of a 32-point DCT or for parallel computation of two 16-point DCTs or four 8-point DCTs with a marginal control overhead based on the proposed algorithm. Index terms- Discrete Cosine Transform, Fast Fourier Transform (FFT), Matlab INTRODUCTION THE DISCRETE COSINE TRANSFORM (DCT) expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies. DCTs are important to numerous applications in science and engineering, from lossy compression of audio (e.g. MP3) and images (e.g. JPEG) (where small high-frequency components can be discarded), to spectral methods for the numerical solution of partial differential equations. The use of cosine rather than sine functions is critical for compression, since it turns out (as described below) that fewer cosine functions are needed to approximate a typical signal, whereas for differential equations the cosines express a particular choice of boundary conditions. DCT is mainly used in Image and Video compression. In particular, a DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. Like any Fourier-related transform, DCTs express a function or a signal in terms of a sum of sinusoids with different frequencies and amplitudes. Like the Discrete Fourier Transforms (DFT), a DCT operates on a function at a finite number of discrete data points. The obvious distinction between a DCT and a DFT is that the former uses only cosine functions, while the latter uses both cosines and sines (in the form of complex exponentials). However, this visible difference is merely a consequence of a deeper distinction: a DCT implies different boundary conditions from the DFT or other related transforms. The Fourierrelated transforms that operate on a function over a finite domain, such as the DFT or DCT or a Fourier series, can be thought of as implicitly defining an extension of that function outside the domain. That is, once you write a function f(x) as a sum of sinusoids, you can evaluate that

Page21 sum at any x, even for x where the original f(x) was not specified. The DFT, like the Fourier series, implies a periodic extension of the original function. A DCT, like a cosine transform, implies an even extension of the original function. Image data compression has been an active research area for image processing over the last decade and has been used in a variety of applications. Image and video data compression refers to a process in which the amount of data used to represent image and video is reduced to meet a bit rate requirement (below or at most equal to the maximum available bit rate), while the quality of the reconstructed image or video satisfies a requirement for a certain application and the complexity of computation involved is affordable for the application. Recently, two new transforms have been proposed for 8-point DCT approximation: Cintra have proposed a low-complexity 8-point approximate DCT based on integer functions and Potluri have proposed a novel 8-point DCT approximation that requires only 14 additions. On the other hand, Bouguezel have proposed two methods for multiplication-free approximate form of DCT. The first method is for length N=8, 16 and 32, and is based on the appropriate extension of integer DCT. Cintra have proposed a new 16X16 matrix also for approximation of 16-point DCT and have validated it experimentally. II. IMAGE AND VIDEO COMPRESSION Image and video data compression refers to a process in which the amount of data used to represent image and video is reduced to meet a bit rate requirement (below or at most equal to the maximum available bit rate), while the quality of the reconstructed image or video satisfies a requirement for a certain application and the complexity of computation involved is affordable for the application. Image compression system has three main blocks (i) a transform (usually DCT on 8x8 blocks), (ii) a quantizer, (iii) a lossless (Entropy) coder; each tries to throw away information which is not essential to understand the image but costs bits. The DCT transform throws away correlations, if we make a plot of the value of a pixel as a function of one of its neighbors, we can see that the pixels are highly correlated (i.e. most of the time they are very similar), and this is just a consequence of the fact that surfaces are smooth. The advantage of working in the frequency domain is that our visual system is less sensitive to distortion around edges. The transition associated with the edge masks our ability to perceive the noise. Fig.1 Image Compression Technique Compression is a reversible conversion (encoding) of data that contains fewer bits. This allows a more efficient storage and transmission of the data. The inverse process is called decompression (decoding). Software and hardware that can encode and decode are called decoders. Both combined form a codec and should not be confused with the terms data container or compression algorithms. Lossless compression allows a 100% recovery of the

Page22 original data. It is usually used for text or executable files, where a loss of information is a major damage. These compression algorithms often use statistical information to reduce redundancies. Huffman-Coding and Run Length Encoding are the two popular examples allowing high compression ratios depending on the data. Using lossy compression does not allow an exact recovery of the original data. Lossy compression allows higher compression ratios than lossless compression. III. EXISTING METHOD The 8-point discrete cosine transform (DCT) is a key step in many image and video processing applications. This particular block length is widely adopted in several image and video coding standards, such as JPEG, MPEG-1, MPEG-2, and H.261. This is mainly due to its good energy compaction properties. This 8-point DCT is used as the basis for the proposed system for 16, 32 and 64 point DCTs. This correspondence introduced an approximation algorithm for the DCT computation based on matrix polar decomposition. This method could outperform the BAS-2008 method in high and low-compression ratios scenarios. Moreover, this method possesses constructive formulation based on the round-off function. Therefore, generalizations are more readily possible. This existing transformation matrix contains only zeros and ones, multiplications and bit shift operations are absent. Fig. 2 Signal Flow Graph of DCT-8 Most of the existing algorithms for approximation of the DCT target only the DCT of small transform lengths such as 16-point and 32-point is not possible and some of them are nonorthogonal. If the transform is orthogonal, we can always find its inverse and the kernel matrix of the inverse transform is obtained by just transposing the kernel matrix of the forward transform. As specified in the recently adopted HEVC, DCT of different lengths such as N=8, 16, 32 are required to be used in video coding applications. Therefore, a given DCT architecture should be potentially reused for the DCT of different lengths instead of using separate structures for different lengths. Here proposing such reconfigurable DCT structures which could be reused for the computation of DCT of different lengths. The proposed reconfigurable architecture for the implementation of approximated 32-point DCT is shown in Fig. 3. It consists of four computing units, two 16-point approximated DCT units, output permutation unit and a 32- point input adder unit that generates a(i) and b(i), i E [1:15]. The input to the first 16-point DCT approximation unit is fed through 16 MUXes that select either [a(0), a(1),...., a(14), a(15)] or [X(0),X(1),....,X(14), X(15)], depending on whether it is used for 32-point DCT calculation or 16-point DCT calculation. Similarly, the input to the second 16-point DCT unit (Fig. 4.3) is fed through 16 MUXes that select either [b(0),b(1),.....,b(14), b(15)] or

Page23 [X(0),X(1),...,X(14), X(15)], depending on whether it is used for 32-point DCT calculation or 16-point DCT calculation. Fig.3 Proposed Reconfigurable Architecture for N=8, 16, 32 Then outputs from the 16-point adder units and output from the 32-point adder units are fed to another section of MUXes in which, input to the first 8-point DCT approximation unit is fed through 8 MUXes that select either [a(0), a(1),...., a(7)] or [m(0),m(1),....,m(7)], depending on whether it is used for 16-point DCT calculation or 8-point DCT calculation. Then the MUXes for the next 8-point DCT approximation unit selects either [a(8), a(9),...., a(15)] or [m(8),m(9),....,m(15)], depending on whether it is used for 16-point DCT calculation or 8-point DCT calculation. Similarly, the input to the third and fourth 8-point DCT units(fig. 3) are fed through 16 MUXes, in which first 8 MUXes select either [b(0),b(1),,b(7)] or [m(0),m(1),.. m(7)] )], depending on whether it is used for 16- point DCT calculation or 8-point DCT calculation, and the next 8 MUXes select either [b(8),b(9),.... b(15)] or [m(8),m(9),...,m(15)], depending on whether it is used for 16- point DCT calculation or 8-point DCT calculation. On the other hand, the output permutation unit uses thirty 3-input MUXes to select and re-order the output depending on the size of the selected DCT. Sel32 is used as control input for the first 32 MUXes used in the 32/16 computation unit, sel16 is used as control input for the second 32 MUXes used in the 16/8 computation unit to select inputs and to perform permutation according to the size of the DCT to be computed. Specifically, Sel32=1 & Sel16=1, enables the computation of 32-point DCT, Sel32=0 & Sel16=1, enables the computation of a pair of 16-point DCTs in parallel and finally when Sel32=0 & Sel16=0 enables the computation of four 8-point DCTs in parallel. Consequently, the architecture of Fig. 4.3 allows the calculation of a 32-point DCT or two 16-point DCTs in parallel or four 8-point DCTs in parallel.

Page24 IV. PROPOSED SYSTEM For increasing the Maximum Operating Frequency (MOF), reducing the number of registers and LUTs used, proposing a parallel pipelined structure in which each set of input values (first 32 inputs) are first computed using the first 32-point block, at the same time all the different 32 inputs are computed parallel so that achieving more MOFs and increased speed. Parallel reconfigurable structure for the computation of DCTs of lengths N= 32/16/8 are shown in the Fig.4 given below. Fig.4 Proposed Parallel Reconfigurable Architecture for N=8, 16, 32 V. SIMULATION RESULTS The input to the first 16-point DCT approximation unit is fed through 16 MUXes that select either [a(0), a(1),.., a(14), a(15)] or [X(0),X(1),....X(14), X(15)], depending on whether it is used for 32-point DCT calculation or 16-point DCT calculation. Similarly, the input to the second 16-point DCT unit is fed through 16 MUXes that select either [b(0),b(1),.b(7)] or [X(0),X(1),,...,X(14), X(15)], depending on whether it is used for 32-point DCT calculation or 16-point DCT calculation. Fig.5.1 Input of Proposed 32 point DCT Transform(X0 to X16)

Page25 Fig.5.2 Input of Proposed 32 point DCT Transform(X15 to X31) The output of 32-Point DCT transform when both the select signal is 1 shown in Fig.5.3 & 5.4. Outputs are given by F0, F1, F2, F3......, F15..., F30, F31. The energy of the image is compacted towards the first row of the output register. Also most of the remaining values are zero (0), so that image can be easily transmitted. Fig.5.3 Output of Proposed 32-point DCT Transform when Sel32=1 & Sel16=1 [F0 to F15]

Page26 Fig.5.4 Output of Proposed 32-point DCT Transform when Sel32=1 & Sel16=1 [F16 to F31] The output of 32-point DCT transform when Sel32=0 & Sel16=1 is shown in Fig.5.5 & 5.6. The computation of two 16-point DCTs in parallel occurs in which most of the remaining values are zero (0), so that image can be easily transmitted. Fig.5.5 Output of Proposed 32-point DCT Transform when Sel32=0 & Sel16=1 [F0 to F15]

Page27 Fig.5.6 Output of Proposed 32-point DCT Transform when Sel32=0 & Sel16=1 [F16 to F31] The output of 32-point DCT transform when Sel32=0 & Sel16=0 is shown in Fig.5.7 & 5.8. The computation of four 8-point DCTs in parallel occurs in which most of the remaining values are zero (0), so that image can be easily transmitted. Fig.5.7 Output of Proposed 32-point DCT Transform when Sel32=0 & Sel16=1 [F0 to F16]

Page28 Fig.5.8 Output of Proposed 32-point DCT Transform when Sel32=0 & Sel16=1 [F0 to F16] VI. REDUCED DELAY Proposed design involves nearly 7% less area for 16-point DCT, 6% less area for 32-point DCT, also will have a 5% less area for 64-point DCT on comparing with existing systems like BDCT. VII. COMPARISON OF EXISTING AND PROPOSED SYSTEMS TABLE 7.1 Comparisons of Existing and Proposed Systems PARAMETER EXISTING SYSTEM PROPOSED SYSTEM Computation complexity is low Delay is less MOF 22 adders for 8-point. 72 adders for 16-point 160 adders for 32-point 22 adders for 8-point 72 adders for 16-point 160 adders for 32-point 215.3 for 8-point 167.0 for 16-point 136.4 for 32-point 22 adders for 8-point 60 adders for 16-point 152 adders for 32-point 22 adders for 8-point 60 adders for 16-point 152 adders for 32-point 500.9 for 8-point 496.2 for 16-point 490.4 for 32-point

Page29 VIII. CONCLUSION AND FUTURE SCOPE In this paper, I have proposed a generalized recursive algorithm to obtain orthogonal approximation of DCTs of larger lengths where the computation of 32-point DCT is configured for parallel computation of two 16-point DCTs and four 8-point DCTs. Future enhancement can be to propose a fully scalable reconfigurable architecture for approximate DCT computation where the computation of 64-point DCT could be configured for parallel computation of two 32-point DCTs and four 16-point DCTs and also can be configured for the parallel computation of eight 8-point DCTs. REFERENCES i. Maher Jridi, Ayman Alfalou and Pramod Kumar Meher (2015), A Generalized Algorithm and Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT, IEEE transactions on circuits and systems- vol. 62, no. 2, page no: 449-457. ii. iii. iv. Arjuna Madanayake, Cintra R.J, Denis Onen, Dimitrov V.S, Nilanka Rajapaksha, Bruton L.T and Amila Edirisuriya (2012), A Row-Parallel 8 8 2-D DCT Architecture Using Algebraic Integer-Based Exact Computation, IEEE transactions on circuits and systems for video technology, vol. 22, no. 6, page no: 915-939 Renato J. Cintra, Fábio M. Bayer (2011), A DCT Approximation for Image Compression, IEEE SIGNAL processing letters, vol. 18, no. 10, page no: 579-582. Saraswathy K, Vaithiyanathan D and Seshasayanan R (2013), A DCT Approximation with Low Complexity for Image Compression, International conference on Communication and Signal Processing, page no: 3-5. v. Uma Sadhvi Potluri, Arjuna Madanayake, Cintra R.J, Bayer F.M, Sunera Kulasekera and Amila Edirisuriya (2014), Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions, IEEE transactions on circuits and systems- vol. 61, no. 6,page no: 1727-1736. vi. Vijaya Prakash.A.M, Gurumurthy K.S (2010), A Novel VLSI Architecture for Image Compression Model Using Low power Discrete Cosine Transform, World Academy of Science, Engineering and Tech, page no:72-77