Implementation of Discrete Wavelet Transform for Image Compression Using Enhanced Half Ripple Carry Adder

Volume 118 No. 20 2018, 2821-2827 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Implementation of Discrete Wavelet Transform for Image Compression Using Enhanced Half Ripple Carry Adder Dr.M.Anto Bennet 1, V.Mahalakshmi 2 1,2 Faculty of Electronics and Communication Engineering, Vel Tech, Chennai, India. * Corresponding author s Email bennetmab@gmail.com Abstract: The aim is to design an efficient two-dimensional Discrete Wavelet Transformation (DWT) based image compression technique. In order to achieve best performance, Enhanced Half -Ripple Carry Adder (EHRCA) has been designed. Verilog Hardware Description Language (Verilog HDL) is used to model the EHRCA and DWT technique. DWT technique has been designed with the help of two types of filtering technique known as Low Pass Filter (LPF) and High Pass Filter (HPF). Three levels of decomposition is made by DWT process and each process have two levels compressions called Row Wise Compression and Column Wise Compression. In proposed DWT models, adders are recognized as high potential than other components. In order to improve the efficiency of DWT process, an efficient adder called Enhanced Half-Ripple Carry Adder (EHRCA) has been designed in this research work. Proposed EHRCA circuit offers 10.71% improvements in hardware slice utilization, 11.78% improvements in total power consu mption than traditional Binary to Excess 1 Conversion (BEC) based Square Root Carry Select Adder (SQRT CSLA). Further proposed adder has been incorporated into Row W ise Compression and Column Wise Compression for improving the architectural performances of DWT. In future, proposed EHRCA based DWT will be useful in Discrete Cosine Transformation (DC0T) and hybrid type and lifting based DWT techniques. Keywords: Discrete Wavelet Transform(DWT), Discrete Cosine Transformation(DCOT) 1. Introduction Discrete Wavelet Transformation (DWT) is the technique for decomposing/compressing the images. Also DWT represents as an image which is the sum of wavelet functions (wavelets) with different location and scale. It represents the data into a set of low pass and high pass coefficients. The input data is passed through set of low pass and high pass filters. The output from high pass filters and low pass filters are down sampled by 2. The output from low pass filter is an average coefficient and the output from high pass filter is a detail coefficient. In 2-D DWT, the input data is passed through set of both low pass and high pass filter in two directions, both rows and columns. As in 1-D DWT, the outputs from low pass and high pass filters are down sampled by 2 in each direction. Two Dimensional (2-D) Discrete Wavelet Transformation techniques (DWT) are widely used for image and video compression process5. The 2-D DWT technique has multi-resolution decomposition capability, because it plays role in many engineering fields. However, accumulation of large values of data of various decomposition levels of the transform makes their complexity computationally very intensive. Large endeavours have been designed many architectures which are aimed at providing high speed 2-D DWT computation with the requirement of reasonable hardware utilization. These architectures can be classified as separable and non-separable architectures. In a separable architecture, 2-D filtering operation can be done through two 1-D filtering operations, one for processing the data in row-wise and another one for processing the data in column-wise. The decomposition levels of input images can be employed by either a Recursive Pyramid Algorithm (RPA) or lighting operation. In separable filtering architecture a 1-D filtering structure is used to perform the 2-D DWT and hence it must need additional computational complexity between two 1-D filtering processes. This increases the latency as well as memory size of the architectures. The non-separable architectures are used to reduce the limitation of separable architectures, since in non-separable architectures, 2-D DWT are computed directly by using 2-D filters. However, the speed of the DWT process is very low for non-separable architectures. In order to overcome this problem, pipelining technique is used in DWT architecture.in general, Haar Discrete Wavelet Transform (HDWT) is used to compress 2821

the signal/image6. To increase the compression ability of image, precision-aware self-quantizing architectures can be used. To generate the DWT coefficients, Distributed Arithmetic (DA) based Multiplication is used. DA based multiplier performs the multiplication operation with the help of Look up Tables (LUTs). Therefore, the performance of DA based multiplier is better than any other multiplier. In, one dimensional DWT techniques can be implemented in Very Large Scale Integration (VLSI) System design environment. Further, VLSI based high speed 2-D DWT can be implemented. 2-D DWT technique is designed by using Enhanced Half Ripple Carry Adder (EHRCA). An EHRCA is the type of Ripple Carry Adder (RCA), hardware complexity and power consumption is reduced effectively than traditional RCA circuit. Also, the performance of DWT can be increased in terms of silicon area and power consumption, when EHRCA incorporated into DWT process. 2. Survey Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity. Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms. In this,a introduction of a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications. The proposed transform possesses low computational complexity and is compared to stateof-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC. The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology [1]. Transformation and quantization play a critical role in video codecs. Recently proposed algebraicinteger-(ai-) based discrete cosine transform (DCT) algorithms are analyzed in the presence of quantization, using the High Efficiency Video Coding (HEVC) standard. AI DCT is implemented and tested on asynchronous quasi delay-insensitive logic, using Achronix SPD60 field programmable gate array (FPGA), which leads to lower complexity, higher speed of operation, and insensitivity to process-voltage-temperature variations. Performance of AI DCT with HEVC is measured in terms of the accuracy of the transform coefficients and the overall rate-distortion (R-D) characteristics, using HM 7.1 reference software. Results indicate a 31% improvement over the integer DCT in the number of transform coefficients having error within 1%. The performance of the 65 nm asynchronous hardware in terms of speed ofoperation is investigated and compared with the 65 nm synchronous Xilinx FPGA. Considering word lengths of 5 and 6 bits, a speed increase of 230% and 199% is observed, respectively. These results indicate that AI DCT can be potentially utilized in HEVC for applications demanding high accuracy as well as high throughput. However, novel quantization schemes are required to allow the accuracy improvements obtained[2]. An algebraic integer (AI)-based time-multiplexed row-parallel architecture and two final reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an errorfree 2-D DCT without using FRSs between row column transforms, leading to an 8 8 2-D DCT that is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients[3]. The discrete cosine transform (DCT) is a central mathematical operation in several digital signal processing methods and image/video standards. In this, a collection of twelve approximations for the 8-point DCTbased on integer00 functions. Considered functions include: the floor, ceiling, truncation, and rounding-off function0s. Sought approximations are required to meet the following specific criteria: (i) very low arithmetic complexity, (ii) orthogonality or quasiorthogonality, and (iii) low-complexity inversion. By varying a scaling parameter, approximations could be systematically obtained and several existing approximations were identified as particular cases of the proposed methodology. Particular cases include the signed DCT and the rounded DCT. Four new quasi-orthogonal approximations were introduced and their practical relevance was demonstrated. All approximations were given fast algorithms based on matrix factorization methods. 2822

Proposed approximations are multiplierless; their computation requires only additions and bit-shifting operations. Additive complexity ranged from 18 to 24 additions. Obtained approximations were compared with the exact DCT and assessed in the context of JPEG-like image compression. As quality assessment measures, we considered the peak signal-to-noise ratio and the structural similarity index. Because its low-complexity and good performance properties, the proposed approximations are suitable for hardware implementation in dedicated architectures [4]. A new class of matrices based on a parametrization of the Feig Winograd factorization of 8-point DCT is proposed. Such parametrization induces a matrix subspace, which unifies a number of existing methods for DCT approximation. By solving a comprehensive multicriteria optimization problem, we identified several new DCT approximations. Obtained solutions were sought to possess the following properties: (i) low multiplierless computational complexity, (ii) orthogonality or near orthogonality, (iii) low complexity invertibility, and (iv) close proximity and performance to the exact DCT. Proposed approximations were submitted to assessment in terms of proximity to the DCT, coding performance, and suitability for image compression. Considering Pareto efficiency, particular new proposed approximations could outperform various existing methods archived in literature[5].a multiplierless architecture based on algebraic integer representation for computing the Daubechies 4-tap wavelet transform for 1-D/2-D signal processing is proposed. This architecture improves on previous designs in a sense that it minimizes the number of parallel 2-input adder circuits. The algorithm was achieved using numerical optimization based o exhaustive search over the algebraic integer representation. The proposed architecture furnishes exact computation up to the final reconstruction step, which is the operation that maps the exactly computed filtered results from algebraic integer representation to fixed-point[6]. A flipping-based high speed VLSI architecture for lifting-based 2-D DWT is proposed. The direct implementation of lifting equation has long critical path delay.to reduce the critical path, the flipping structure is widely used. In the proposed architecture, the multipliers in flipping structure are replaced by shift-and-add algorithm. This reduces the critical path delay to one adder (Ta), which is the minimum possible delay any DWT architecture can have. Thus, the proposed architecture is suitable for high-speed applications and has 100% hardware utilization with low control complexity. The architecture is described using VHDL and implemented on FPGA[7]. The Discrete wavelet transform (DWT) has been used in a wide range of real-time application. Algebraic integer quantization (AIQ) encoding has been proposed to represent the irrational transform basis of the wavelet transform as polynomials with integer coefficients. In this, to restate these polynomials to obtain simpler coefficients for both the integer coefficients and the polynomial basis, while keeping numerically equivalence with the original AIQ coefficient. We present an integer linear programming (ILP) model for restating these linear expressions. The results show that for the considered DAUB66 wavelet, the number of additions required can be reduced by up to 18% compared to earlier work [8]. 3. Proposed System 3.1 Enhanced Half Ripple Carry Adder It is possible to create a logical circuit using multiple full adders to add N-bit numbers. Each full adder inputs a C in, which is the C out of the previous adder. This kind of adder is called a ripple-carry adder, since each carry bit "ripples" to the next full adder. Note that the first (and only the first) full adder may be replaced by a half adder (under the assumption that C in = 0). The layout of a ripplecarry adder is simple, which allows for fast design time; however, the ripple-carry adder is relatively slow, since each full adder must wait for the carry bit to be calculated from the previous full adder. The gate delay can easily be calculated by inspection of the full adder circuit. Each full adder requires three levels of logic. In a 32-bit ripple-carry adder, there are 32 full adders, so the critical path (worst case) delay is 3 (from input to carry in first adder) + 31 * 2 (for carry propagation in later adders) = 65 gate delays. The general equation for the worst-case delay for a n-bit carry-ripple adder is -----(1) The delay from bit position 0 to the carry-out is a little different: -------(2) The carry-in must travel through n carry-generator blocks to have an effect on the carry-out -----------(3) A design with alternating carry polarities and optimized AND-OR-Invert gates can be about twice as fast.rca is one of the basic adders to perform the binary addition process. However, CPD is the main 2823

disadvantages in RCA circuit (i.e.,) every stage must have wait for carry signal from previous stage. In order to reduce the problem of CPD in RCA circuit, Enhanced Half Ripple Carry Adder (EHRCA) is developed. The circuit diagram for developed EHRCA circuit for 4-bit is illustrated in Figure 1. It consists of HAs, OR gate, AND gate and Multiplexors for performing addition process. As the name itself, final half of the circuit only (Multiplexors part) must have to wait until carry signal load from previous stage, remaining circuits can execute in a parallel manner. Hence, this adder circuit named as Enhanced Half Ripple Carry Adder. In other hand, the structure of this circuit is like that SQRT CSLA. Instead of RCA-BEC combination for Cin = 0 and Cin = 1 respectively of CSLA circuit, simplified circuit is designed as shown in Figure 3.1. The carry input is considered only final stage of EHRCA where as remaining circuit can perform the respective computation in a parallel manner with the help of available input data. Similar to Figure 1, the EHRCA circuit for 8-bit and 16-bit can be designed. Further, the EHRCA adder is incorporated into the addition process of Equation (3) to increase the performance of 2-D DWT. Three levels of decomposition are made for image compression. The performances of conventional SQRT CSLA and developed EHRCA circuits are analyzed in Results and Discussion. The input image to determine the DWT coefficients is shown in Figure 2. Three levels of decomposed images are illustrated in the figure 3, figure 4 and figure 5 respectively. The overall MATLAB results are shown in figure 7.The images have multiresolution decomposition capability. Three levels of decomposition are done to compress the image with the help of EHRCA. In reconstruction, input data can be achieved in multiple resolutions by decomposing the LL coefficient further for different levels. The compressed data is up-sampled by a factor of 2 in order to reconstruct the original input data while performing interpolation process Figure 2. Input image Figure 3. Row wise compression in MATLAB Figure 1. Circuit diagram for 4-bit EHRCA circuit 4. Experimental Results The input image is shown in the figure 2.The simulation result for 2-D DWT compression are shown in fig 3.The Level-1 compressions is shown in figure 4, Level-2 compressions is shown in figure 5 and Level-3 compressions is shown in figure 6.The input image is converted into the pixels and these pixels are demonstrated in Figure 7. Three levels of decomposition are made in this for image compression with the help of DWT and EHRCA. Figure 4. DWT LEVEL-1 compression in MATLAB 2824

Figure5.DWT LEVEL-2 compression Figure 10. DWT LEVEL-3 compression Table1. Comparison of 16-bit conventional BEC based SQRT CSLA and developed 16-bit EHRCA circuits Figure 6. DWT LEVEL-3 compression Type Slices LUT Delay (ns) 16-bit Convention al BEC based SQRT CSLA 16-bit developed EHRCA Power (mw) 28 47 15,971 280 25 42 16,707 247 Figure 7. Overall DWT compression outputs Figure 8. DWT LEVEL-1 compression Figure 9. DWT LEVEL-2 compression Enhanced Half Ripple Carry Adder (EHRCA) circuit is designed using Verilog Hardware Description Language (Verilog HDL). The validation of proposed adder circuit is evaluated using Model Sim 6.3C and Synthesis results are evaluated by using Xilinx 10.1i design tool. Also levels of decomposition of image using 2-D DWT are measured usingxilinx10.1i tool shown in fig8,9,10. The RCA circuit is realized and identified the redundant logic operations. Based on identified redundant logic, EHRCA circuit is designed.the circuit of EHRCA is most likely conventional BEC based SQRT CSLA. Hence, the performance of conventional BEC based SQRT CSLA and developed EHRCA circuit for 16-bit is compared in Table 1. It is clear that 16-bit developed EHRCA circuit offers 10.71% reduction in silicon area and 11.78% reduction in power consumption than conventional BEC based SQRT CSLA. Therefore, developed EHRCA circuit is the best choice for 2-D DWT implementation. Further, the developed EHRCA circuit is incorporated into 2-D DWT addition process to improve the performance. 5. Conclusion Enhanced Half Ripple Carry Adder (EHRCA) circuit is designed using Verilog Hardware 2825

Description Language (Verilog HDL). The validation of proposed adder circuit is evaluated using Model Sim 6.3C and Synthesis results are evaluated by using Xilinx 10.1i design tool. Also levels of decomposition of image using 2-D DWT are measured usingxilinx10.1i tool shown in fig8,9,10. The RCA circuit is realized and identified the redundant logic operations. Based on identified redundant logic, EHRCA circuit is designed.the circuit of EHRCA is most likely conventional BEC based SQRT CSLA. Hence, the performance of conventional BEC based SQRT CSLA and developed EHRCA circuit for 16-bit is compared in Table 1. It is clear that 16-bit developed EHRCA circuit offers 10.71% reduction in silicon area and 11.78% reduction in power consumption than conventional BEC based SQRT CSLA. Therefore, developed EHRCA circuit is the best choice for 2-D DWT implementation. Further, the developed EHRCA circuit is incorporated into 2-D DWT addition process to improve the performance. [6] Dr. AntoBennet, M, Sankar Babu G, Suresh R, Mohammed Sulaiman S, Sheriff M, Janakiraman G,Natarajan S, Design & Testing of Tcam Faults Using T H Algorithm, Middle-East Journal of Scientific Research 23(08): 1921-1929, August 2015. [7] Dr. AntoBennet, M Power Optimization Techniques for sequential elements using pulse triggered flipflops, International Journal of Computer & Modern Technology, Issue 01,Volume01,pp 29-40, June 2015. [8] Dr. AntoBennet, M, Manimaraboopathy M,P. Maragathavalli P,Dinesh Kumar T R, Low Complexity Multiplier For Gf(2m) Based All One Polynomial, Middle-East Journal of Scientific Research 21 (11): 2064-2071, October 2014. References [1] A. Madanayake, R. J. Cintra, D. Onen, V. S. Dimitrov, N. Rajapaksha, L. T. Bruton, and A. Edirisuriya, A row-parallel 8 # 8 2-D DCT architecture using algebraic integer-based exact computation, IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 6, pp. 915 929, June 2012] [2] N. Rajapaksha, A. Edirisuriya, A. Madanayake, R. J. Cintra, D. Onen, I.Amer, and V. S. Dimitrov. (2013). Asynchronous realization of algebraic integer-based 2D DCT using Achronix Speedster SPD60 FPGA. J. Electr. Comput.Eng. [Online]. 2013, pp. 1 9, 2013. [3] S. K. Madishetty, A. Madanayake, R. J. Cintra, V. S. Dimitrov, and D. H.Mugler, VLSI architectures for the 4-tap and 6-tap 2-D Daubechies waveletfilters using algebraic integers, IEEE Trans. Circuits Syst. I, vol. 60,no. 6, pp.1455 1468, June 2013 [4] U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, and N. Rajapaksha.(2012). Multiplier-free DCT approximations for RF multi-beam digitalaperture-array space imaging and directional sensing. Measure. Sci. Technol.[Online]. 23(11), p. 114003. [5] K. A. Wahid, M. A. Islam, and S. Ko, Lossless implementation of Daubechies 8-tap wavelet transform, in Proc. IEEE Int. Symp. Circuits Systems (ISCAS), 2011, pp. 2157 2160 2826

2827

2828