DISCRETE COSINE TRANSFORM (DCT) is a widely
|
|
- Magdalene Matthews
- 6 years ago
- Views:
Transcription
1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 20, NO 4, APRIL A High Performance Video Transform Engine by Using Space-Time Scheduling Strategy Yuan-Ho Chen, Student Member, IEEE, and Tsin-Yuan Chang, Member, IEEE Abstract In this paper, a spatial and time scheduling strategy, called the space-time scheduling (STS) strategy, that achieves high image resolutions in real-time systems is proposed The proposed spatial scheduling strategy includes the ability to choose the distributed arithmetic (DA)-precision bit length, a hardware sharing architecture that reduces the hardware cost, and the proposed time scheduling strategy arranges different dimensional computations in that it can calculate first-dimensional and second-dimensional transformations simultaneously in single 1-D discrete cosine transform (DCT) core to reach a hardware utilization of 100% The DA-precision bit length is chosen as 9 bits instead of the traditional 12 bits based on test image simulations In addition, the proposed hardware sharing architecture employs a binary signed-digit DA architecture that enables the arithmetic resources to be shared during the four time slots For this reason, the proposed 2-D DCT core achieves high accuracy with a small area and a high throughput rate and is verified using a TSMC 018- m 1P6M CMOS process chip implementation Measurement results show that the core has a latency of 84 clock cycles with a 52 db peak-signal-to-noise-ratio and is operated at 167 MHz with 158 K gate counts Index Terms Binary signed-digit (BSD), discrete cosine transform (DCT), distributed arithmetic (DA)-based, space-time scheduling (STS) I INTRODUCTION DISCRETE COSINE TRANSFORM (DCT) is a widely used transform engine for image and video compression applications [1] In recent years, the development of visual media has been progressed towards high-resolution specifications, such as high definition television (HDTV) Consequently, a high-accuracy and high-throughput rate component is needed to meet future specifications In addition, in order to reduce the manufacturing costs of the integrated circuit (IC), a low hardware cost design is also required Therefore, a high performance video transform engine that utilizes high accuracy, a small area, and a high-throughput rate is desired for VLSI designs The 2-D DCT core design has often been implemented using either direct [2] [4] or indirect [5] [13] methods The direct Manuscript received February 22, 2010; revised August 09, 2010 and December 30, 2010; accepted January 24, 2011 Date of publication February 28, 2011; date of current version March 12, 2012 This work was supported in part by the Chip implementation Center and National Science Council under project number CIC T18-98C-12a and NSC E , respectively The authors are with the Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan ( yhchen@larceenthu edutw) Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TVLSI methods include fast algorithms that reduce the computation complexity by mapping and rotating the DCT transform into a complex number [3] However, the structure of the DCT is not as regular as that of the fast Fourier transform (FFT) A regular 2-D DCT core using the direct method that derives the 2-D shifted FFT (SFFT) from the 2-D DCT algorithm format, and shares the hardware in FFT/IFFT/2-D DCT computation is implemented in [4] On the other hand, the 2-D DCT cores using the indirect method are implemented based on transpose memory and have the following two structures: 1) two 1-D DCT cores and one transpose memory (TMEM) and 2) a single 1-D DCT core and one TMEM In the first structure, the 2-D DCT core has a high throughput rate, because the two 1-D DCT cores compute the transformation simultaneously [5] [9] In order to reduce the area overhead, a single 1-D DCT core is applied in the second structure [10] [13], thereby saving hardware costs Hsia et al [10] present a 2-D DCT with a single 1-D DCT core and one custom transposed memory that can transpose data sequences using serial-input and parallel-output and requires only a small area, but the speed is reduced because the single DCT core performs the first-dimensional (1st-D) and second-dimensional (2nd-D) transformations at different times Therefore, Guo et al use two parallel paths to deal with the reduction of throughput rate in the second structure [11] However, the additional input buffers are needed in order to temporarily store the input data during the 2nd-D transformation Madisetti et al present a hardware sharing technique to implement the 2-D DCT core using a single 1-D DCT core [13] The 1-D DCT core can calculate the 1st-D and 2nd-D DCT computations simultaneously, and the throughput achieves 100 Mpels/s However, multiplier-based processing element requires a large amount of circuit area in [13] As a result, it is obvious that a tradeoff is required between the hardware cost and the speed Tumeo et al find a balance point between the area required and the speed for 2-D DCT designs [14] and present a multiplier-based pipeline fast 2-D DCT accelerator implemented using a field-programmable gate arrays (FPGA) platform [14] Additionally, several 2-D DCT designs are implemented based on FPGA for fast verification [14] [17] In this paper, an D DCT core that consists of a single 1-D DCT core and one TMEM is proposed using a strategy known as space-time scheduling (STS) Due to the accuracy simulations in DA-based binary signed-digit (BSD) expression, 9-bit DA precision is chosen in order to meet the requirements of the peak-signal-to-noise-ratio (PSNR) outlined in previous works [5] [7] Furthermore, the proposed DCT core is designed based on a hardware sharing architecture with BSD DA-based computation so as to reduce the area cost The arithmetics share the hardware resources during the four time slots in the DCT /$ IEEE
2 656 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 20, NO 4, APRIL 2012 core design A 100% hardware utilization is also achieved by using the proposed time scheduling strategy, and the 1st-D and 2nd-D DCT computations can be calculated at the same time Therefore, a high performance transform engine with high accuracy, small area, and a high-throughput rate has been achieved in this research This paper is organized as follows In Section II, the mathematical derivation of the BSD format distributed arithmetic is given The proposed D DCT architecture that includes an analysis of the coefficient bits, the hardware sharing architecture, and the proposed timing scheduling strategy is discussed in Section III Comparisons and discussions are presented in Section IV, and conclusions are drawn in Section V where for, and for Consider the 1-D 8-point DCT first where for, for non-zero, and and denote the input data and the transform output, respectively By neglecting the scaling factor 1/2, the 1-D 8-point DCT in (4) can be divided into even and odd parts, and, as listed in (5) and (6), respectively (4) II MATHEMATICAL DERIVATION OF BSD FORMAT DISTRIBUTED ARITHMETIC The inner product for a general matrix multiplication-andaccumulation can be written as follows: (5) (1) (6) where is a fixed coefficient and is the input data Assume that the coefficient is an -bit BSD number Equation (1) can be expressed as follows: where and (7) (8) where,, and The matrix is called the DA coefficient matrix In (2), can be calculated by adding or subtracting the with, and then the transform output can be obtained by shifting and adding every non-zero Thus, the inner product computation in (1) can be implemented by using shifters and adders instead of multipliers Therefore, a smaller area can be achieved by using BSD DA-based architecture III PROPOSED D DCT CORE DESIGN This section introduces the proposed 8 implementation The 2-D DCT is defined as (2) 8 2-D DCT core For the 2-D DCT core implementation, the three main strategies for increasing the hardware and time utilization, called the space-time scheduling (STS) strategy, are proposed and listed as follows: find the DA-precision bit length for the BSD representation to achieve system accuracy requirements; share the hardware resource in time to reduce area cost; plan an 100% hardware utilization by using time scheduling strategy A Analysis of the Coefficient Bits The seven internal coefficients from to for the 2-D DCT transformation are expressed as BSD representations in order to save computation time and hardware cost, as well as to achieve the requirements of PSNR The system PSNR is defined [1] as follows: (9) (10) (3) where the is the mean-square-error between the original image and the reconstructed image for each pixel
3 CHEN AND CHANG: HIGH PERFORMANCE VIDEO TRANSFORM ENGINE 657 Fig 1 Simulation for system PSNR in different BSD bit length expressions TABLE I NORMALIZED HW COST IN DIFFERENT BSD BIT LENGTH EXPRESSIONS TABLE II 9-BIT BSD EXPRESSIONS FOR DCT COEFFICIENTS First, the BSD coefficients for different bit length expressions use the popular test images, Lena, Peppers, and Baboon, to simulate the system PSNR After inputting the original test image pixels to the forward and inverse DCT transform, the system PSNR vs the different BSD bit length expressions is illustrated in Fig 1(a), and the reconstructed images, ie, that have passed through the forward and inverse DCT for different BSD bit lengths, are also shown in Fig 1(b) The normalized hardware (HW) cost of processing element (PE) for the different BSD bit length expressions is shown in Table I The HW cost is synthesized in area for each BSD bit length using a Synopsys Design Compiler with the Artisan TSMC 018- m Standard cell library The gap in accuracy between the 8-bit and the 9-bit BSD expressions is shown in Fig 1(a), and the HW cost increases when the BSD bit length increases Consequently, the 9-bit BSD expression coefficient is chosen due to the tradeoff between the hardware cost and the system accuracy The BSD expressions and the accuracy of the DCT coefficients are listed in Table II, where the coefficient in Table II indicates the BSD number 1 The coefficient accuracy is defined as the signal-to-quantization-noise ratio (SQNR) shown in (11) (11) where and are the power of the desired floating-point signal and the quantization noise in this case, respectively B Hardware Sharing Strategy All modules in the 1-D DCT core, including the modified two-input butterfly (MBF2), the pre-reorder, the process element even (PEE), the process element odd (PEO), and the postreorder share the hardware resources in order to reduce the area cost Moreover, the D DCT core is implemented using a single 1-D DCT core and one TMEM The architecture of the 2-D DCT core is described in the following sections 1) Modified Butterfly Module: Equation (9) is easily implemented using a two-input butterfly module [18] called BF2 In general, the BF2 has a hardware utilization rate in the adder and subtracter of 50% In order to enable the hardware resources to be shared, additional multiplexers and Reorder Registers are added to the proposed MBF2 module as illustrated in Fig 2 The Reorder Registers consist of four word registers that use the control signals to select the input data and use enable signals to output or hold the data Similar to BF2, the operation of the proposed MBF2 has an eight-clock-cycle period In the first four cycles, the 1st-D input data shift into Reorder Registers 1, and the 2nd-D input data execute the operation of addition and substraction In the next four cycles, the operations of 1st-D and 2nd-D will be changed The 1st-D data calculate and in (9) by using adder and subtracter In the second four cycles, the results are fed into next stage, and the
4 658 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 20, NO 4, APRIL 2012 TABLE III COEFFICIENTS AND INPUT DATA FROM MBF2 FOR THE FOUR SEPARATE TIME SLOT OUTPUTS TABLE IV EVEN PART TRANSFORMATION TABLE Fig 2 Proposed MBF2 architecture resources Hence, (12) and (13) can be expressed as (14) and (15), respectively results shift to Reorder Registers 1 in each cycle At the same time, the 2nd-D input data shift to Reorder Registers 2 Therefore, the adder and subtracter in MBF2 employ the operations of 1st-D and 2nd-D in turns to achieve 100% hardware utilization 2) Pre-Reorder Module: In (5), the even part transform output can be modified as follows: (14) (15) (12) Also, the odd part transform output in (6) can be rewritten as (13) From (12) and (13), the transform output can be calculated using four separate time slots in order to share the hardware where the time slot, the coefficient elements,, and the transform inputs from the MBF2 stage and Equations (14) and (15) for the four time slots are summarized in Table III The proposed Pre-Reorder module follows the order of the inputs, and ) depending on each time slot indicated in Table III The hardware resources can be shared by reordering the inputs during the four separate time slots 3) Process Element Module: For DA-based computation, the even part and odd part transformations can be implemented using PEE and PEO, respectively From Table III, the even part transformation can be expanded for the DA-based computation formats and can share the hardware resources at the bit level The coefficient vector has two combinations of and
5 CHEN AND CHANG: HIGH PERFORMANCE VIDEO TRANSFORM ENGINE 659 Fig 4 Proposed post-reorder architecture tency of 52 cycles Based on the time scheduling strategy described in Section III-C, a hardware utilization of 100% can be achieved Fig 3 Architecture for the proposed PEE and PEO for and transform outputs, respectively Table IV expresses (14) in the bit level formulation Using given input data,,, and, the transform output needs only three adders (to compute,, and ), a Data-Distributed module, and one even part adder-tree (EAT) to obtain the result for during the four time slots The Data-Distributed module consists of seven two-input multiplexers that are used to appoint different non-zero values for each weighted input during the four separate time slots, as shown in Table IV, and the EAT sums the different weighted values in tree-like adders to complete the transform output Similarly, the odd part transformation in (15) can be implemented in a DA-based format using four adders and one odd part adder-tree (OAT) The EAT and OAT can be implemented using the error-compensated adder tree [19] to improve the computation accuracy Moreover, there are three pipeline stages (two in both the EAT and the OAT) in each PEE and PEO module that enable high speed computation to be achieved The proposed architecture for both PEE and PEO is shown in Fig 3 4) Post-Reorder Module: For the last stage of the 1-D DCT computation, the data sequence after the PEE and PEO must be merged and repermuted in the Post-Reorder module The order of the original 8-point data sequence from the PEE and the PEO is After the Post-Reorder module permutes the data order, the data sequence is ordered as Fig 4 shows the proposed Post-Reorder architecture Two multiplexers select the data that is fed into the different Reorder Registers in order to permute the output order is the transform output for the 1st-D DCT that will input into the TMEM In addition, the 2nd-D DCT transform output is completed after the permutation by Reorder Registers 4 5) 2-D DCT Core Architecture: To save hardware costs, the proposed 2-D DCT core, as shown in Fig 5, is implemented using a single 1-D DCT core and one TMEM The 1-D DCT core includes an MBF2, a Pre-Reorder module, a PEE, a PEO, a Post-Reorder module, and one TMEM The TMEM is implemented using 64-word 12-bit dual-port registers and has a la- C Time Scheduling Strategy As a result of the time scheduling strategy, the 1st-D and 2nd-D transforms are able to be computed simultaneously, which increases hardware utilization be 100% The timing flow chart for the proposed strategy is illustrated in Fig 6 1) 1st-D Data Computation: In the first four cycles, the first four-point input data shift into the Reorder Register 1 During cycles 5 8, the MBF2 performs additions and subtractions using the first eight-point input data In the 9th cycle, the even-part of Pre-Reorder module obtains and reorders the input data output to the PEE based on Table III During cycles 10 13, the is calculated in the PEE module Based on the latency of the three pipeline stages in the PEE, is completed during cycles Also, the is performed in the PEO module during cycles and is finished in cycles Therefore, after the 16th cycle, the Post-Reorder module permutes the 1st-D transform results and inputs them into the TMEM 2) 2nd-D Data Computation: From the 17th cycle, the 1st-D transform data is input into the TMEM Due to the latency of the TMEM 52 cycles, the 2nd-D computation data transposed by the TMEM is sent to the input of Reorder Registers 2 in MBF2 at the 69th cycle Then, the adder and subtracter are operated during cycles to compute the first 2nd-D 8-point data sequences, and the first 2nd-D data is permuted by Pre-Reorder module during cycles The PEE and PEO calculate the first 2nd-D data during cycles and 82 85, respectively The Post-Reorder module finishes the 2nd-D transform results from the 84th cycle In addition, the 2-D DCT transform output data is obtained at the end of 84th cycle 3) Hardware Utilization: In Fig 6, the adder and subtracter in MBF2 is at 50% utilization during cycles 1 68 However, after the 68th cycle, the MBF2 achieves a 100% hardware utilization as a result of the 2nd-D data being input At the same time, the utilization in the PEE and PEO also reaches 100% from 74th and 78th cycles, respectively The Post-Reorder module does not work between cycles 1 12, but after the 16th cycle the 1st-D transform output are completed using Reorder Registers 3 in the Post-Reorder Similarly, the 2nd-D transform result is finished using Reorder Registers 4 in the Post-Reorder from the 84th cycle At the end of 84th cycle, the 1st-D and 2nd-D transform outputs and are obtained simultaneously Hence, the hardware utilization increases to 100% after the 84th
6 660 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 20, NO 4, APRIL 2012 Fig 5 Proposed 2-D DCT architecture Fig 6 Timing scheduling for the proposed 2-D DCT core cycle, and the latency of the proposed D DCT core is 84 clock cycles Based on the proposed time scheduling strategy, a 100% hardware utilization is achieved, and computation of the data sequence is straightforward In summary, following an analysis of the accuracy analysis of the coefficients, the DA-precision bit length was chosen as a 9-bit BSD expression In this way, the hardware cost is reduced by using the BSD 9-bit DA-precision instead of the 12-bit used in previous works Using the proposed hardware sharing strategy, the DCT computation shares the hardware resources during the four time slots, thereby reducing the area cost In addition, a 100% hardware utilization can be achieved based on the proposed time scheduling strategy, effectively achieving a highthroughput-rate design Therefore, a high-accuracy, small-area, and high-throughput-rate 2-D DCT core has been obtained by applying the proposed STS strategy IV DISCUSSION AND COMPARISONS In this section, the system accuracy test for the proposed 2-D DCT core is discussed Then, a comparison of the proposed 2-D DCT core with previous works is also presented Finally, the characteristics of the implemented chip and FPGA are described A System Accuracy Test Seven test images used to check system accuracy are all pixels in size, with each pixel being represented by 8-bit 256 gray level data After the original test image pixels are fed
7 CHEN AND CHANG: HIGH PERFORMANCE VIDEO TRANSFORM ENGINE 661 Fig 7 Accuracy verification flow for proposed DCT core TABLE V PSNR PERFORMANCE FOR EACH TEST PATTERNS TABLE VI PSNR VERSUS QF FOR THE PROPOSED 2-D DCT APPLYING TO JPEG COMPRESSION FLOW into the proposed 2-D DCT core, the transform output data is captured and passed to MATLAB tool in order to compute the inverse DCT using 64-bit double-precision operations The verification flow follows the precedent of previous works ([4], [7], and [20]) test flow illustrated as Fig 7 The PSNR values of each test images verified by the test flow are listed in Table V, and the average PSNR closes to 52 db using the proposed D DCT core Furthermore, the proposed 2-D DCT core was introduced to enable utilization of JPEG compression flow [21], [22] to evaluate the proposed 2-D DCT applied in real compression systems The quality factor (QF) dominates the compression quality and data size, and therefore QF affects the system PSNR of the JPEG compression Table VI lists the PSNR versus QF for the proposed 2-D DCT core applying to JPEG, and Fig 8 illustrates both the original and the compressed versions of the test image Lena using the JPEG compression flow The quality of the compressed image is excellent B Comparison With Other 2-D DCT Architectures Table VII compares the proposed D DCT core with previous works In [4], Lin et al derived an SFFT format from a 2-D DCT and shared the hardware resources for the FFT/ IFFT/2-D DCT computation in a triple-mode processor design The 2-D DCT core is implemented in a direct 2-D method based on a pipeline radix- single delay feedback path architecture In [7], a 2-D DCT core used NEDA architecture with two 1-D cores and one TMEM to achieve a low-cost design Speed limitations exist in the operations of serial shifting and addition in the NEDA operations Huang et al improved the throughput rate using adder-tree (AT) instead of shifting and addition operations for NEDA architecture [9] However, the parallel input and output in the NEDA architecture required a large amount of I/O resources The 2-D DCT core using a single 1-D core and one TMEM for a multiplier-based DCT implementation was addressed in [10] An area-saving serial-input and parallel-output transpose memory was also implemented in the 2-D DCT core to achieve a small-area design operated at frequency of 55 MHz However, the 1-D core could not compute 1st-D and 2nd-D transformation simultaneously Hence, the throughput rate was half of the operation frequency Therefore, a hardware sharing technique was addressed in the 2-D DCT core to deal with throughput rate reduction, and the 2-D DCT core achieved a throughput of 100 Mpels/s [13] The proposed 2-D DCT core employs a single 1-D core and one TMEM in order to reduce the core area Also, the 1-D core utilizes 9 adders and 2 adder-trees (AT) using the proposed hardware sharing architecture in order to reduce hardware cost Meaning that the throughput rate of the proposed 2-D DCT core is enhanced based on the time scheduling strategy In this way, the proposed 2-D core is able to compute 1st-D and 2nd-D transformation simultaneously, and a 167 MHz throughput rate is achieved Therefore, the proposed 2-D DCT core has the highest hardware efficiency, defined as follows: Hardware Efficiency pels/s-gate (16) A comparison with previous works is summarized in Table VII Noted that previous works presented only simulation results, while proposed work has chip implementation with design for test (DFT) and measured results For fair comparisons, the simulation results of proposed work are also shown (Remove DFT considerations in chip implement that needs extra area for memory built-in self-test circuit and scan flip-slops) The gate count of STS DCT without DFT insertion is 122 K and the hardware efficiency of STS DCT is equal to 1369 that is the largest in the comparison Table VII On the other hand, the proposed 2-D DCT achieves 52 db PSNR value, that is the medium accuracy compared with other pervious works, verified by the test flow in Fig 7 The power consumption is another issue in the circuit designs There are many different technologies to implement DCT core in previous works In order to have a fair comparison, the power is normalized to 018- m technology process Similar to [23] as listed in (17) and is also summarized in Table VII The proposed 2-D DCT core has medium power consumption between pervious works Normalized Power (17)
8 662 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 20, NO 4, APRIL 2012 Fig 8 Compressed image using JPEG compression processing with the proposed 2-D DCT core (a) Original image (b) Compressed image with QF = 99 TABLE VII COMPARISON OF DIFFERENT 2-D DCT ARCHITECTURES WITH THE PROPOSED ARCHITECTURE Four transistors per NAND2 gate for different technology 5=(): Where 5 is the chip measurement result with DFT insertion and is simulation result without DFT insertion The power result is estimated for 1-D DCT operation C Chip Implementation Implemented in a 18-V TSMC 018- m 1P6M CMOS process, the proposed D DCT core uses the Synopsys Design Compiler to synthesize the RTL code and the Cadence SoC Encounter for placement and routing (P&R) The proposed DCT core has a latency of 84 clock cycles and is operated at 167 MHz to meet HDTV specifications For the design for test (DFT) considerations, the proposed core uses the SynTest SRAMBIST to generate the memory built-in self-test (BIST) circuit and the Synopsys Design Compiler to synthesize scan flip-flops, and the total gate counts with DFT insertion of the proposed core is 158 K The photomicrograph of the proposed core with marking each module s boundary is shown in Fig 9 and the measured characteristics are listed in Table VIII D FPGA Implementation The proposed 2-D DCT core synthesized using Xilinx ISE 101 and Xilinx XC2VP30 FPGA can be operated at a clock frequency of 110 MHz Table IX shows a comparison of the proposed 2-D DCT core with previous FPGA implementations The multiplier-less coordinate rotation digital computer (CORDIC) architecture was presented in [15] Sun et al utilized 120 adders and 80 shifters performing a 2-D quantized discrete cosine integer transform (QDCIT) to achieve a high clock rate of 149 MHz, but with large FPGA area resources Also, a 2-D DCT core that was optimized in both delay and area was addressed by Tumeo et al [14], who presented a balance point between the delay and the area for multiplier-based pipeline DCT design with 19 adders/subtracters and 4 multipliers in their DCT core
9 CHEN AND CHANG: HIGH PERFORMANCE VIDEO TRANSFORM ENGINE 663 saving in area over the NEDA architecture for a DA-based DCT design Furthermore, the proposed time scheduling strategy arranges the computation time for each process element The 1-D core can calculate the 1st-D and 2nd-D transformations simultaneously, thereby achieving a high throughput rate Therefore, a high performance D DCT core utilizing high-accuracy, a small-area, and a high-throughput rate has been achieved using the proposed STS strategy Finally, the proposed high performance 2-D DCT core is fabricated using a TSMC 018-1P6M CMOS process and implemented in the XC2VP30 FPGA Fig 9 Photomicrograph of the proposed DCT core TABLE VIII CHIP CHARACTERISTICS OF THE PROPOSED 2-D DCT ARCHITECTURE TABLE IX COMPARISONS OF 2-D DCT ARCHITECTURES IN FPGAS operated at clock frequency of 107 MHz The proposed DCT core has lower area resources and a medium operation frequency in the XC2VP30 FPGA implementations V CONCLUSION This paper proposes a high performance video transform engine using the STS strategy The proposed 2-D DCT core employs a single 1-D DCT core and one TMEM with a small area Based on the test image simulations, 9-bit DA-precision is chosen in order to meet the PSNR requirements, and the hardware sharing architecture enables the arithmetic resources to be shared based on time so as to reduce area cost Hence, the number of adders/substracters in 1-D DCT core allows a 74% REFERENCES [1] Y Wang, J Ostermann, and Y Zhang, Video Processing and Communications, 1st ed Englewood Cliffs, NJ: Prentice-Hall, 2002 [2] E Feig and S Winograd, Fast algorithms for the discrete cosine transform, IEEE Trans Signal Process, vol 40, no 9, pp , Sep 1992 [3] Y P Lee, T H Chen, L G Chen, M J Chen, and C W Ku, A cost-effective architecture for two-dimensional DCT/IDCT using direct method, IEEE Trans Circuits Syst Video Technol, vol 7, no 3, pp , Jun 1997 [4] C T Lin, Y C Yu, and L D Van, Cost-effective triple-mode reconfigurable pipeline FFT/IFFT/2-D DCT processor, IEEE Trans Very Large Scale Integr (VLSI) Syst, vol 16, no 8, pp , Aug 2008 [5] M Alam, W Badawy, and G Jullien, A new time distributed DCT architecture for MPEG-4 hardware reference model, IEEE Trans Circuits Syst Video Technol, vol 15, no 5, pp , May 2005 [6] T Xanthopoulos and A P Chandrakasan, A low-power DCT core using adaptive bitwidth and arithmetic activity exploiting signal correlations and quantization, IEEE J Solid-State Circuits, vol 35, no 5, pp , May 2000 [7] A M Shams, A Chidanandan, W Pan, and M A Bayoumi, NEDA: A low-power high-performance DCT architecture, IEEE Trans Signal Process, vol 54, no 3, pp , Mar 2006 [8] C Peng, X Cao, D Yu, and X Zhang, A 250 MHz optimized distributed architecture of 2D DCT, in Proc Int Conf ASIC, 2007, pp [9] C Y Huang, L F Chen, and Y K Lai, A high-speed 2-D transform architecture with unique kernel for multi-standard video applications, in Proc IEEE Int Symp Circuits Syst, 2008, pp [10] S C Hsia and S H Wang, Shift-register-based data transposition for cost-effective discrete cosine transform, IEEE Trans Very Large Scale Integr (VLSI) Syst, vol 15, no 6, pp , Jun 2007 [11] J I Guo, R C Ju, and J W Chen, An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization, IEEE Trans Circuits Syst Video Technol, vol 14, no 4, pp , Apr 2004 [12] H C Hsu, K B Lee, N Y Chang, and T S Chang, Architecture design of shape-adaptive discrete cosine transform and its inverse for MPEG-4 video coding, IEEE Trans Circuits Syst Video Technol, vol 18, no 3, pp , Mar 2008 [13] A Madisetti and A N Willson, A 100 MHz 2-D DCT/IDCT processor for HDTV applications, IEEE Trans Circuits Syst Video Technol, vol 5, no 2, pp , Apr 1995 [14] A Tumeo, M Monchiero, G Palermo, F Ferrandi, and D Sciuto, A pipelined fast 2D-DCT accelerator for FPGA-based SOCs, in Proc IEEE Comput Soc Annu Symp VLSI, 2007, pp [15] C C Sun, P Donner, and J Gotze, Low-complexity multi-purpose IP core for quantized discrete cosine and integer transform, in Proc IEEE Int Symp Circuits Syst, 2009, pp [16] S Ghosh, S Venigalla, and M Bayoumi, Design and implementaion of a 2D-DCT architecture using coefficient distributed arithmetic, in Proc IEEE Comput Soc Annu Symp VLSI, 2005, pp [17] L V Agostini, I S Silva, and S Bampi, Pipelined fast 2D DCT architecture for JPEG image compression, in Proc IEEE Symp Integr Circuits Syst Des, 2001, pp [18] C H Chang, C L Wang, and Y T Chang, Efficient VLSI architectures for fast computation of the discrete Fourier transform and its inverse, IEEE Trans Signal Process, vol 48, no 11, pp , Nov 2000
10 664 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 20, NO 4, APRIL 2012 [19] Y H Chen, T Y Chang, and C Y Li, High throughput DA-based DCT with high accuracy error-compensated adder tree, IEEE Trans Very Large Scale Integr (VLSI) Syst, 2010, to be published [20] W Pan, A fast 2-D DCT algorithm via distributed arithmetic optimization, in Proc IEEE Int Conf Image Process, 2000, vol 3, pp [21] Information Technology Digital Compression and Coding of Continuous-Tone Still Images: Requirements and Guidelines, ISO/IEC , 1991 [22] W B Pennebaker and J L Mitchell, JPEG Still Image Data Compression Standard New York: Van Nostrand Reinhold, 1992 [23] S N Tang, J W Tsai, and T Y Chang, A 24-Gs/s FFT processor for OFDM-based WPAN applications, IEEE Trans Circuits Syst II, Exp Briefs, vol 57, no 6, pp , Jun 2010 Tsin-Yuan Chang (S 87 M 90) received the BS degree in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 1982, and the MS and PhD degrees from Michigan State University, East Lansing, in 1987 and 1989, respectively, both in electrical engineering He is an Associate Professor with the Department of Electrical Engineering, National Tsing Hua University His research interests include IC design and testing, Computer arithmetic, and VLSI DSP Yuan-Ho Chen (S 10) received the BS degree from Chang Gung University, Taoyuan, Taiwan, in 2002, and the MS degree from the National Tsing Hua University (NTHU), Hsinchu, Taiwan, in 2004, where he is currently pursuing the PhD degree in electrical engineering His research interests include video/image and digital signal processing, VLSI architecture design, VLSI implementation, computer arithmetic, and signal processing for wireless communication
DUE to the high computational complexity and real-time
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen
More informationRECENTLY, researches on gigabit wireless personal area
146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,
More informationTHE orthogonal frequency-division multiplex (OFDM)
26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,
More informationFPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA
FPGA Implementation of 16-Point FFT Core Using NEDA Abhishek Mankar, Ansuman Diptisankar Das and N Prasad Abstract--NEDA is one of the techniques to implement many digital signal processing systems that
More informationThe Serial Commutator FFT
The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this
More informationPipelined Fast 2-D DCT Architecture for JPEG Image Compression
Pipelined Fast 2-D DCT Architecture for JPEG Image Compression Luciano Volcan Agostini agostini@inf.ufrgs.br Ivan Saraiva Silva* ivan@dimap.ufrn.br *Federal University of Rio Grande do Norte DIMAp - Natal
More informationFAST FOURIER TRANSFORM (FFT) and inverse fast
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 11, NOVEMBER 2004 2005 A Dynamic Scaling FFT Processor for DVB-T Applications Yu-Wei Lin, Hsuan-Yu Liu, and Chen-Yi Lee Abstract This paper presents an
More informationAMONG various transform techniques for image compression,
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 3, JUNE 1997 459 A Cost-Effective Architecture for 8 8 Two-Dimensional DCT/IDCT Using Direct Method Yung-Pin Lee, Student Member,
More informationArea and Power efficient MST core supported video codec using CSDA
International Journal of Science, Engineering and Technology Research (IJSETR), Volume 4, Issue 6, June 0 Area and Power efficient MST core supported video codec using A B.Sutha Sivakumari*, B.Mohan**
More informationDESIGN OF DCT ARCHITECTURE USING ARAI ALGORITHMS
DESIGN OF DCT ARCHITECTURE USING ARAI ALGORITHMS Prerana Ajmire 1, A.B Thatere 2, Shubhangi Rathkanthivar 3 1,2,3 Y C College of Engineering, Nagpur, (India) ABSTRACT Nowadays the demand for applications
More informationA Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO
2402 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016 A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO Antony Xavier Glittas,
More informationParallel-computing approach for FFT implementation on digital signal processor (DSP)
Parallel-computing approach for FFT implementation on digital signal processor (DSP) Yi-Pin Hsu and Shin-Yu Lin Abstract An efficient parallel form in digital signal processor can improve the algorithm
More informationDESIGN METHODOLOGY. 5.1 General
87 5 FFT DESIGN METHODOLOGY 5.1 General The fast Fourier transform is used to deliver a fast approach for the processing of data in the wireless transmission. The Fast Fourier Transform is one of the methods
More informationA VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation
Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical
More informationFPGA IMPLEMENTATION OF HIGH SPEED DCT COMPUTATION OF JPEG USING VEDIC MULTIPLIER
FPGA IMPLEMENTATION OF HIGH SPEED DCT COMPUTATION OF JPEG USING VEDIC MULTIPLIER Prasannkumar Sohani Department of Electronics Shivaji University, Kolhapur, Maharashtra, India P.C.Bhaskar Department of
More informationDesign and Implementation of Effective Architecture for DCT with Reduced Multipliers
Design and Implementation of Effective Architecture for DCT with Reduced Multipliers Susmitha. Remmanapudi & Panguluri Sindhura Dept. of Electronics and Communications Engineering, SVECW Bhimavaram, Andhra
More informationA Novel Discrete cosine transforms & Distributed arithmetic
A Novel Discrete cosine transforms & Distributed arithmetic Miss.M Ramadevi 1 &Mr. R. Srinivasa Rao 2 1 M. Tech Dept. VLSI in Khammam Institute of Technology and Sciences, Khammam District 2 Associate
More informationKeywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.
ww.semargroup.org www.ijvdcs.org ISSN 2322-0929 Vol.02, Issue.05, August-2014, Pages:0294-0298 Radix-2 k Feed Forward FFT Architectures K.KIRAN KUMAR 1, M.MADHU BABU 2 1 PG Scholar, Dept of VLSI & ES,
More informationFPGA Implementation of 2-D DCT Architecture for JPEG Image Compression
FPGA Implementation of 2-D DCT Architecture for JPEG Image Compression Prashant Chaturvedi 1, Tarun Verma 2, Rita Jain 3 1 Department of Electronics & Communication Engineering Lakshmi Narayan College
More informationFIR Filter Architecture for Fixed and Reconfigurable Applications
FIR Filter Architecture for Fixed and Reconfigurable Applications Nagajyothi 1,P.Sayannna 2 1 M.Tech student, Dept. of ECE, Sudheer reddy college of Engineering & technology (w), Telangana, India 2 Assosciate
More information/$ IEEE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,
More informationAnalysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope
Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope G. Mohana Durga 1, D.V.R. Mohan 2 1 M.Tech Student, 2 Professor, Department of ECE, SRKR Engineering College, Bhimavaram, Andhra
More informationIJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): 2321-0613 A Reconfigurable and Scalable Architecture for Discrete Cosine Transform Maitra S Aldi
More informationJoint Optimization of Low-power DCT Architecture and Efficient Quantization Technique for Embedded Image Compression
Joint Optimization of Low-power DCT Architecture and Efficient Quantization Technique for Embedded Image Compression Maher Jridi and Ayman Alfalou L@bISen Laboratory, Equipe Vision ISEN-Brest CS 42807
More informationSpeed Optimised CORDIC Based Fast Algorithm for DCT
GRD Journals Global Research and Development Journal for Engineering International Conference on Innovations in Engineering and Technology (ICIET) - 2016 July 2016 e-issn: 2455-5703 Speed Optimised CORDIC
More informationDesign and Implementation of Lifting Based Two Dimensional Discrete Wavelet Transform
Design and Implementation of Lifting Based Two Dimensional Discrete Wavelet Transform Yamuna 1, Dr.Deepa Jose 2, R.Rajagopal 3 1 Department of Electronics and Communication engineering, Centre for Excellence
More informationA Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter
A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently
More informationThree-D DWT of Efficient Architecture
Bonfring International Journal of Advances in Image Processing, Vol. 1, Special Issue, December 2011 6 Three-D DWT of Efficient Architecture S. Suresh, K. Rajasekhar, M. Venugopal Rao, Dr.B.V. Rammohan
More informationFPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS
FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,
More informationHIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE
HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE Anni Benitta.M #1 and Felcy Jeba Malar.M *2 1# Centre for excellence in VLSI Design, ECE, KCG College of Technology, Chennai, Tamilnadu
More informationAn HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication
2018 IEEE International Conference on Consumer Electronics (ICCE) An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication Ahmet Can Mert, Ercan Kalali, Ilker Hamzaoglu Faculty
More informationMemory-Efficient and High-Speed Line-Based Architecture for 2-D Discrete Wavelet Transform with Lifting Scheme
Proceedings of the 7th WSEAS International Conference on Multimedia Systems & Signal Processing, Hangzhou, China, April 5-7, 007 3 Memory-Efficient and High-Speed Line-Based Architecture for -D Discrete
More informationLow Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm
Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,
More informationVLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2
IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 05, 2015 ISSN (online): 2321-0613 VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila
More informationLow Power Complex Multiplier based FFT Processor
Low Power Complex Multiplier based FFT Processor V.Sarada, Dr.T.Vigneswaran 2 ECE, SRM University, Chennai,India saradasaran@gmail.com 2 ECE, VIT University, Chennai,India vigneshvlsi@gmail.com Abstract-
More informationMANY image and video compression standards such as
696 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 9, NO 5, AUGUST 1999 An Efficient Method for DCT-Domain Image Resizing with Mixed Field/Frame-Mode Macroblocks Changhoon Yim and
More informationImplementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression
Volume 01, No. 01 www.semargroups.org Jul-Dec 2012, P.P. 60-66 Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression A.PAVANI 1,C.HEMASUNDARA RAO 2,A.BALAJI
More informationLinköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs
Linköping University Post Print Analysis of Twiddle Factor Complexity of Radix-2^i Pipelined FFTs Fahad Qureshi and Oscar Gustafsson N.B.: When citing this work, cite the original article. 200 IEEE. Personal
More informationHigh Speed Special Function Unit for Graphics Processing Unit
High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum
More informationVLSI Architecture to Detect/Correct Errors in Motion Estimation Using Biresidue Codes
VLSI Architecture to Detect/Correct Errors in Motion Estimation Using Biresidue Codes Harsha Priya. M 1, Jyothi Kamatam 2, Y. Aruna Suhasini Devi 3 1,2 Assistant Professor, 3 Associate Professor, Department
More informationA Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding
A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely
More informationDESIGN & SIMULATION PARALLEL PIPELINED RADIX -2^2 FFT ARCHITECTURE FOR REAL VALUED SIGNALS
DESIGN & SIMULATION PARALLEL PIPELINED RADIX -2^2 FFT ARCHITECTURE FOR REAL VALUED SIGNALS Madhavi S.Kapale #1, Prof.Nilesh P. Bodne #2 1 Student Mtech Electronics Engineering (Communication) 2 Assistant
More informationDesign and Implementation of 3-D DWT for Video Processing Applications
Design and Implementation of 3-D DWT for Video Processing Applications P. Mohaniah 1, P. Sathyanarayana 2, A. S. Ram Kumar Reddy 3 & A. Vijayalakshmi 4 1 E.C.E, N.B.K.R.IST, Vidyanagar, 2 E.C.E, S.V University
More informationFault Tolerant Parallel Filters Based on ECC Codes
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 11, Number 7 (2018) pp. 597-605 Research India Publications http://www.ripublication.com Fault Tolerant Parallel Filters Based on
More informationAn efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients
Title An efficient multiplierless approximation of the fast Fourier transm using sum-of-powers-of-two (SOPOT) coefficients Author(s) Chan, SC; Yiu, PM Citation Ieee Signal Processing Letters, 2002, v.
More informationResearch Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:
International Journal of Emerging Research in Management &Technology Research Article August 27 Design and Implementation of Fast Fourier Transform (FFT) using VHDL Code Akarshika Singhal, Anjana Goen,
More informationMULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION
MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION Maheshwari.U 1, Josephine Sugan Priya. 2, 1 PG Student, Dept Of Communication Systems Engg, Idhaya Engg. College For Women, 2 Asst Prof, Dept Of Communication
More informationFPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression
FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression Divakara.S.S, Research Scholar, J.S.S. Research Foundation, Mysore Cyril Prasanna Raj P Dean(R&D), MSEC, Bangalore Thejas
More informationArchitectures for Dynamic Data Scaling in 2/4/8K Pipeline FFT Cores
Architectures for Dynamic Data Scaling in 2/4/8K Pipeline FFT Cores Lenart, Thomas; Öwall, Viktor Published in: IEEE Transactions on Very Large Scale Integration (VLSI) Systems DOI: 10.1109/TVLSI.2006.886407
More informationII. MOTIVATION AND IMPLEMENTATION
An Efficient Design of Modified Booth Recoder for Fused Add-Multiply operator Dhanalakshmi.G Applied Electronics PSN College of Engineering and Technology Tirunelveli dhanamgovind20@gmail.com Prof.V.Gopi
More informationImage Compression System on an FPGA
Image Compression System on an FPGA Group 1 Megan Fuller, Ezzeldin Hamed 6.375 Contents 1 Objective 2 2 Background 2 2.1 The DFT........................................ 3 2.2 The DCT........................................
More informationDesign of 2-D DWT VLSI Architecture for Image Processing
Design of 2-D DWT VLSI Architecture for Image Processing Betsy Jose 1 1 ME VLSI Design student Sri Ramakrishna Engineering College, Coimbatore B. Sathish Kumar 2 2 Assistant Professor, ECE Sri Ramakrishna
More informationf. ws V r.» ««w V... V, 'V. v...
M. SV V 'Vy' i*-- V.J ". -. '. j 1. vv f. ws. v wn V r.» ««w V... V, 'V. v... --
More informationFixed Point Streaming Fft Processor For Ofdm
Fixed Point Streaming Fft Processor For Ofdm Sudhir Kumar Sa Rashmi Panda Aradhana Raju Abstract Fast Fourier Transform (FFT) processors are today one of the most important blocks in communication systems.
More informationISSN Vol.02, Issue.11, December-2014, Pages:
ISSN 2322-0929 Vol.02, Issue.11, December-2014, Pages:1119-1123 www.ijvdcs.org High Speed and Area Efficient Radix-2 2 Feed Forward FFT Architecture ARRA ASHOK 1, S.N.CHANDRASHEKHAR 2 1 PG Scholar, Dept
More informationA full-pipelined 2-D IDCT/ IDST VLSI architecture with adaptive block-size for HEVC standard
LETTER IEICE Electronics Express, Vol.10, No.9, 1 11 A full-pipelined 2-D IDCT/ IDST VLSI architecture with adaptive block-size for HEVC standard Hong Liang a), He Weifeng b), Zhu Hui, and Mao Zhigang
More informationImplementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics
Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics Yojana Jadhav 1, A.P. Hatkar 2 PG Student [VLSI & Embedded system], Dept. of ECE, S.V.I.T Engineering College, Chincholi,
More informationNovel design of multiplier-less FFT processors
Signal Processing 8 (00) 140 140 www.elsevier.com/locate/sigpro Novel design of multiplier-less FFT processors Yuan Zhou, J.M. Noras, S.J. Shepherd School of EDT, University of Bradford, Bradford, West
More information120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014
120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014 VL-ECC: Variable Data-Length Error Correction Code for Embedded Memory in DSP Applications Jangwon Park,
More informationA Pipelined Fast 2D-DCT Accelerator for FPGA-based SoCs
A Pipelined Fast 2D-DCT Accelerator for FPGA-based SoCs Antonino Tumeo, Matteo Monchiero, Gianluca Palermo, Fabrizio Ferrandi, Donatella Sciuto Politecnico di Milano, Dipartimento di Elettronica e Informazione
More informationAn Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture
IJSTE - International Journal of Science Technology & Engineering Volume 2 Issue 10 April 2016 ISSN (online): 2349-784X An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT
More informationFast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec
Proceedings of the International MultiConference of Engineers and Computer Scientists 8 Vol I IMECS 8, 19-1 March, 8, Hong Kong Fast Wavelet-based Macro-block Selection Algorithm for H.64 Video Codec Shi-Huang
More informationExpress Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 2, APRIL 1997 429 Express Letters A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation Jianhua Lu and
More informationImplementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator
Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator A.Sindhu 1, K.PriyaMeenakshi 2 PG Student [VLSI], Dept. of ECE, Muthayammal Engineering College, Rasipuram, Tamil Nadu,
More informationARITHMETIC operations based on residue number systems
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,
More informationISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies
VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group
More informationON THE LOW-POWER DESIGN OF DCT AND IDCT FOR LOW BIT-RATE VIDEO CODECS
ON THE LOW-POWER DESIGN OF DCT AND FOR LOW BIT-RATE VIDEO CODECS Nathaniel August Intel Corporation M/S JF3-40 2 N.E. 25th Avenue Hillsboro, OR 9724 E-mail: nathaniel.j.august@intel.com Dong Sam Ha Virginia
More informationFFT/IFFTProcessor IP Core Datasheet
System-on-Chip engineering FFT/IFFTProcessor IP Core Datasheet - Released - Core:120801 Doc: 130107 This page has been intentionally left blank ii Copyright reminder Copyright c 2012 by System-on-Chip
More informationAN FFT PROCESSOR BASED ON 16-POINT MODULE
AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,
More informationA SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN
A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China
More informationFast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda
Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for
More informationFPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT
FPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT Rajalekshmi R Embedded Systems Sree Buddha College of Engineering, Pattoor India Arya Lekshmi M Electronics and Communication
More informationA Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8
Page20 A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8 ABSTRACT: Parthiban K G* & Sabin.A.B ** * Professor, M.P. Nachimuthu M. Jaganathan Engineering College, Erode, India ** PG Scholar,
More informationIMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS
IMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS K. UMAPATHY, Research scholar, Department of ECE, Jawaharlal Nehru Technological University, Anantapur,
More informationPower-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization
This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power
More informationCOPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code
COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material
More informationGated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver
Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath
More informationMRT based Fixed Block size Transform Coding
3 MRT based Fixed Block size Transform Coding Contents 3.1 Transform Coding..64 3.1.1 Transform Selection...65 3.1.2 Sub-image size selection... 66 3.1.3 Bit Allocation.....67 3.2 Transform coding using
More informationVHDL Implementation of Multiplierless, High Performance DWT Filter Bank
VHDL Implementation of Multiplierless, High Performance DWT Filter Bank Mr. M.M. Aswale 1, Prof. Ms. R.B Patil 2,Member ISTE Abstract The JPEG 2000 image coding standard employs the biorthogonal 9/7 wavelet
More informationEfficient Implementation of Low Power 2-D DCT Architecture
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-3164-3169 ISSN: 2249-6645 Efficient Implementation of Low Power 2-D DCT Architecture 1 Kalyan Chakravarthy. K, 2 G.V.K.S.Prasad 1 M.Tech student, ECE, AKRG College
More informationModified Welch Power Spectral Density Computation with Fast Fourier Transform
Modified Welch Power Spectral Density Computation with Fast Fourier Transform Sreelekha S 1, Sabi S 2 1 Department of Electronics and Communication, Sree Budha College of Engineering, Kerala, India 2 Professor,
More informationAn Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator
An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator M.Chitra Evangelin Christina Associate Professor Department of Electronics and Communication Engineering Francis Xavier
More informationImplementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture
International Journal of Computer Trends and Technology (IJCTT) volume 5 number 5 Nov 2013 Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture
More informationHigh-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications
High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications Pallavi R. Yewale ME Student, Dept. of Electronics and Tele-communication, DYPCOE, Savitribai phule University, Pune,
More informationImage Error Concealment Based on Watermarking
Image Error Concealment Based on Watermarking Shinfeng D. Lin, Shih-Chieh Shie and Jie-Wei Chen Department of Computer Science and Information Engineering,National Dong Hwa Universuty, Hualien, Taiwan,
More informationIMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC
IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC Thangamonikha.A 1, Dr.V.R.Balaji 2 1 PG Scholar, Department OF ECE, 2 Assitant Professor, Department of ECE 1, 2 Sri Krishna
More informationIN RECENT years, multimedia application has become more
578 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 5, MAY 2007 A Fast Algorithm and Its VLSI Architecture for Fractional Motion Estimation for H.264/MPEG-4 AVC Video Coding
More informationImplementation of Two Level DWT VLSI Architecture
V. Revathi Tanuja et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Implementation of Two Level DWT VLSI Architecture V. Revathi Tanuja*, R V V Krishna ** *(Department
More informationMCM Based FIR Filter Architecture for High Performance
ISSN No: 2454-9614 MCM Based FIR Filter Architecture for High Performance R.Gopalana, A.Parameswari * Department Of Electronics and Communication Engineering, Velalar College of Engineering and Technology,
More informationOPTIMIZING THE POWER USING FUSED ADD MULTIPLIER
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationDesign and Implementation of VLSI 8 Bit Systolic Array Multiplier
Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry,
More informationDesign and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.
Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. K. Ram Prakash 1, A.V.Sanju 2 1 Professor, 2 PG scholar, Department of Electronics
More informationFeedforward FFT Hardware Architectures Based on Rotator Allocation
Feedforward FFT Hardware Architectures Based on Rotator Allocation Mario Garrido Gálvez, Shen-Jui Huang and Sau-Gee Chen The self-archived postprint version of this journal article is available at Linköping
More informationFPGA Based Low Area Motion Estimation with BISCD Architecture
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 10 October, 2014 Page No. 8610-8614 FPGA Based Low Area Motion Estimation with BISCD Architecture R.Pragathi,
More informationA High-Speed FPGA Implementation of an RSD-Based ECC Processor
RESEARCH ARTICLE International Journal of Engineering and Techniques - Volume 4 Issue 1, Jan Feb 2018 A High-Speed FPGA Implementation of an RSD-Based ECC Processor 1 K Durga Prasad, 2 M.Suresh kumar 1
More informationDesign and Implementation of CVNS Based Low Power 64-Bit Adder
Design and Implementation of CVNS Based Low Power 64-Bit Adder Ch.Vijay Kumar Department of ECE Embedded Systems & VLSI Design Vishakhapatnam, India Sri.Sagara Pandu Department of ECE Embedded Systems
More informationArea And Power Efficient LMS Adaptive Filter With Low Adaptation Delay
e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Area And Power Efficient LMS Adaptive
More informationA 4096-Point Radix-4 Memory-Based FFT Using DSP Slices
A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices Mario Garrido Gálvez, Miguel Angel Sanchez, Maria Luisa Lopez-Vallejo and Jesus Grajal Journal Article N.B.: When citing this work, cite the original
More informationLOW-POWER SPLIT-RADIX FFT PROCESSORS
LOW-POWER SPLIT-RADIX FFT PROCESSORS Avinash 1, Manjunath Managuli 2, Suresh Babu D 3 ABSTRACT To design a split radix fast Fourier transform is an ideal person for the implementing of a low-power FFT
More informationAn Efficient Hardware Architecture for H.264 Transform and Quantization Algorithms
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.6, June 2008 167 An Efficient Hardware Architecture for H.264 Transform and Quantization Algorithms Logashanmugam.E*, Ramachandran.R**
More information