WHILE most digital signal processing algorithms are

Size: px
Start display at page:

Download "WHILE most digital signal processing algorithms are"

Transcription

1 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER Fixed-Point Optimization Utility for C and C Based Digital Signal Processing Programs Seehyun Kim, Member, IEEE, Ki-Il Kum, and Wonyong Sung, Member, IEEE Abstract Fixed-point optimization utility software is developed that can aid scaling and wordlength determination of digital signal processing algorithms written in C or C This utility consists of two programs: the range estimator and the fixed-point simulator. The former estimates the ranges of floatingpoint variables for purposes of automatic scaling, and the latter translates floating-point programs into fixed-point equivalents to evaluate the fixed-point performance by simulation. By exploiting the operator overloading characteristics of C++ ++, the range estimation and the fixed-point simulation can be conducted by simply modifying the variable declaration of the original program. This utility is easily applicable to nearly all types of digital signal processing programs including nonlinear, time-varying, multirate, and multidimensional signal processing algorithms. In addition, this software can be used to compare the fixed-point characteristics of different implementation architectures. An optimization example for an inverse discrete cosine transform (IDCT) architecture that conforms to the IEEE standard specifications is presented. The optimized results require 8% fewer gates when compared with the previous best implementation. Index Terms Finite wordlength effects, fixed-point optimization, fixed-point simulation, range estimation. I. INTRODUCTION WHILE most digital signal processing algorithms are developed using floating-point arithmetic, their implementation using very large scale implementation (VLSI) or fixed-point digital signal processors usually requires fixedpoint arithmetic for the sake of hardware cost and speed. However, fixed-point implementation can suffer from excessive finite wordlength effects, due to overflows and quantization noise, unless all signals are scaled properly and enough wordlengths are assigned [1]. Previously known analytical methods can be used for scaling and wordlength optimization of linear digital filters and some specific algorithms [2] [4]. However, it is very difficult to apply these analytical methods to general digital signal processing algorithms, and it is usually necessary to simulate digital signal processing algorithms extensively using fixed-point arithmetic before implementation. It has been considered a tedious process to determine scaling information and prepare fixed-point simulation models Manuscript received August 5, 1996; revised April 8, This paper was recommended by Associate Editor T. Q. Nguyen. S. Kim was with the School of Electrical Engineering, Seoul National University, Seoul , Korea. He is now with LG Corporate Institue of Technology, Seoul , Korea. K.-I. Kum and W. Sung are with the School of Electrical Engineering, Seoul National University, Seoul , Korea ( wysung@dsp.snu.ac.kr). Publisher Item Identifier S (98) Fig. 1. Proposed fixed-point algorithm development procedure. of complex signal processing algorithms, which may contain nonlinear and time-varying blocks. C is still most popular for describing digital signal processing algorithms although there are several program languages and block diagram based CAD tools that support fixed-point data types, such as Silage [5], DSP/C [6], DSP Station [7], and SPW [8]. In particular, C is more flexible for the development of digital signal processing programs containing control intensive algorithms. Although there are some previous works that introduce different formats in C using operator overloading, such as the variable precision floating-point simulator [9], C does not support fixed-point formats. As a result, the conversion of a floating-point C program into a fixed-point version requires much effort. In order to solve this problem, we have developed an automatic scaling and fixed-point simulation utility for digital signal processing programs written in C or C. This utility consists of the range estimator and the fixedpoint simulator. The proposed procedure for developing fixedpoint algorithms is shown in Fig. 1. Users develop C or C models with floating-point arithmetic and mark the variables whose fixed-point behavior is to be examined with the range estimation directives. The range estimator then finds the statistics of internal signals throughout the floating-point simulation using real inputs and determines scaling parameters. AC data class for range estimation is developed for this purpose. The fixed-point simulator converts a floating-point digital signal processing program with fixed-point simulation directives to a fixed-point equivalent by introducing two fixedpoint data classes, one for bit-accurate simulation and the other for fast execution. In order to overcome the representational /98$ IEEE

2 1456 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1998 III. RANGE ESTIMATION USING STATISTICS The minimum integer wordlength for a variable can be determined from the range as follows: (2) Fig. 2. A hardware model for adding two fixed-point variables of different integer wordlengths. limit of a fractional or integer format, a fixed-point data format [10], [11] is employed that can support an arbitrary representation range by scaling, as shown in (1). The operations associated with the fixed-point data class, such as,,,, and are also defined at the class declaration. Then, fixed-point arithmetic operations, instead of floating-point arithmetic, are conducted automatically due to the operator overloading capability of C [12]. In Section II, the fixed-point data representation method is described. An algorithm for estimating the range is discussed in Section III. In Section IV, the range estimation utility for determining the integer wordlength is explained. Details of the fixed-point simulator are presented in Section V. In Section VI, a fast fixed-point simulator employing a hardware floating-point data-path is explained. As an example of implementation, the internal wordlengths of an 8 8 inverse discrete cosine transform (IDCT) architecture are optimized in Section VII. Concluding remarks follow in Section VIII. II. FIXED-POINT DATA REPRESENTATION For the representation of the fixed-point data, a generalized format [10] is employed using the attributes specified in the following: wordlength integer-wordlength sign-overflow-quantization-mode (1) In the fixed-point data format, two numbers can be added or subtracted only when their hypothetical binary-points are aligned. Let us consider an addition of two signals and to produce a signal where,, and have the fixedpoint formats of 10, 2, tsr, 9, 3, tsr, and 10, 3, tsr, respectively. In order to calculate, must have one more integer bit by sign-extension, while needs to have two more fractional bits. Then, -bit addition, where is greater than or equal to 11, is conducted, and the saturation and rounding operation is applied. The above procedure can be implemented in hardware, as shown in Fig. 2. Previously known analytical methods try to determine the range by calculating the L1 norm of a transfer function [1]. The range estimated using the L1 norm guarantees no overflow for any signal, but it is a very conservative estimate for most applications. According to our comparison for implementing a fourth-order infinite impulse response (IIR) digital filter for the application to speech signal processing, the L1 norm needs four extra bits when compared with the optimum scaling result using a real input speech signal [10]. It is also very difficult to calculate the L1 norm of adaptive or nonlinear systems. In our approach, a simulation-based method is adopted to estimate ranges, where the range of each signal is measured during the floating-point simulation using realistic input signal files. This method is applicable to both nonlinear and linear systems, but requires an adequate estimation of the range from a finite length of simulation results. It is easy to parameterize simple distributions, such as uniform, Gaussian, or Laplacian, by applying a few statistics. It is well known that speech signals have a Laplacian distribution. However, it is not possible to model all signals in practical systems using a simple distribution [13]. For example, they may be nonsymmetric or multimodal; note that the estimated range of a multimodal signal could be too small if we employ the rule for unimodal distributions. The scheme for estimating the range should therefore vary according to the distribution. In this section, we describe a method for range estimation based on identifying the distribution of a signal. A. Statistical Characteristics of a Signal Skewness for a given set of samples is defined as [14] where is the th-order central moment and.it vanishes if the distribution of the samples is symmetrical about the mean,. Nonzero skewness implies that the distribution spreads more widely to the left or right from the mean. On the other hand, the kurtosis is defined as follows [14]: This indicates how many samples are close to the mean, and becomes zero if the s have a Gaussian distribution. Modes represent local maxima of the distribution. While unimodal distributions have a single peak, multimodal distributions have several peaks. The standard deviation of a unimodal distribution can be taken as the width of the peak, as shown in Fig. 3(a). For instance, 99.99% of a Gaussian distribution is included in a range of four times. Thus, we can estimate the range of a unimodal and symmetric signal by means of and times, where is highly dependent on the distribution. For the multimodal situation, however, (3) (4)

3 KIM et al.: FIXED-POINT OPTIMIZATION UTILITY FOR C AND C BASED DSP PROGRAMS 1457 (a) to discriminate unimodal distributions from multimodal ones. That is, if, the distribution is approximately unimodal. As an example, LMS error and LMS coeff signals in Table I all have nonzero skewness and a very large kurtosis. Thus, they are estimated to have nonsymmetric and multimodal distributions. In fact, they have two modes, which represent the initial and final steady states, respectively. For unimodal and symmetric distributions, the range can be estimated effectively by Fig. 3. (b) (a) Unimodal and (b) bimodal distributions. TABLE I STATISTICAL INFORMATION OF A FEW SIGNALS Note that for two symmetric distributions that have an identical variance, the one with the larger kurtosis spreads more widely than the other. Thus, a greater value of is needed in order to estimate the range of the signal having the larger kurtosis. Specifically, we use as. For example, the distribution of IIR speech in Table I can be covered effectively by five times according to its kurtosis. The above rule is not satisfactory for multimodal or nonsymmetric distributions. For such cases, we can consider an alternative rule (5) (6) no longer indicates how widely the distribution spreads. As an example, let us consider a bimodal distribution, shown in Fig. 3(b). Most internal signals of adaptive systems have this distribution, where two modes indicate the initial and the final states, respectively. In this case, it is not possible to simply estimate the range using and. Not only are the mean and the standard deviation important for estimating the range, but also the characteristics of the distribution. B. Estimation of the Range In order to estimate the range effectively, distributions can be characterized as follows: unimodal/multimodal; symmetric/nonsymmetric; zero mean/nonzero mean. Although symmetry and zero mean can be verified easily by the skewness and the mean, respectively, it is harder to estimate the number of modes. From the experimental results shown in Table I, we can derive a heuristic method where is a guard for the range and is defined as. Note that indicates a submaximum value, which covers % of the entire sample. Various submaxima are collected during the simulation. The greater the difference between and is, the larger the guard value must be. The scale factor also controls the guard value and currently is two. Note that the statistical information obtained is dependent on the input data set. Thus, it is necessary to use several input signal sample files for a more reliable estimation of the range. In order to measure the variation of each distribution parameter according to the input signal sample files, four sensitivities are calculated for mean, variance, skewness, and kurtosis, respectively. They are defined as (7) (8) (9) (10) where and are the maximum and the minimum value, respectively, among the multiple simulation results. Then, the statistics are modified as (11) (12) (13) (14) The scale factor is currently chosen as 0.1. When and turn out to be symmetric and unimodal, (5) can be used for estimating ranges. Otherwise, (6) is used with and.

4 1458 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1998 (a) (b) C++ programs for a first-order IIR filter. (a) The original C++ Fig. 5. program. (b) An automatically translated version for the range estimation. Fig. 4. Declaration of the class. Fig. 6. The result of the range estimator for the IIR filter. IV. RANGE ESTIMATION UTILITY Since we employ the simulation-based approach for estimating the range, a program needs to be generated for collecting the statistical information during the simulation. To avoid changing the original floating-point C or C program for range estimation, the operator overloading characteristics of C are utilized. The new data class for tracing the possible maximum value of a signal, i.e., the range, is named as. In order to prepare a range estimation model of a C or C digital signal processing program, it is only necessary to change the type of variables from to, since the class in C is also a type of variable defined by users. The class not only computes the current value, but also keeps records of the variable using private members. Thus, when the simulation is completed, the range of a variable declared as class is readily available from the records stored in the class. The class has several private members, as shown in Fig. 4. The variable keeps the current value, while and record the summation and the squared summation of past values, respectively. and record the third and fourth moments, respectively. This information is needed to calculate the statistics of a variable including its mean, standard deviation, skewness, and kurtosis. stores the absolute maximum value of a variable during the simulation. The class also keeps the number of modifications during the simulation in the field. The class overloads arithmetic and relational operators. Hence, basic arithmetic operations such as addition, subtraction, multiplication, and division are conducted automatically for variables. This property is also applicable for relational operators, such as,,,, and. Therefore, any instance can be compared with floating-point variables and constants. The contents, or private members, of a variable declared by the class are updated when the variable is assigned by one of the assignment operators, such as,,,, and. For example, is updated when the absolute of the present value is larger than the previously determined. After the simulation model for estimating the ranges is prepared by modifying the variable declaration, the range estimator is executed. It compiles the simulation model and links it with the simulation driver and the overloaded operators of the class. In our work, the GNU C compiler, version 2.6.3, is used throughout the development [15]. Next, the simulation driver executes the simulation model, and the range information is gathered during the simulation. After the simulation is completed, the mean ( ), standard deviation ( ), skewness ( ), and kurtosis ( ) can be calculated using the information,,,, and. Then, the statistical range of a variable can be estimated by the procedure shown in the previous section. Finally, the integer wordlengths of all signals are obtained from their ranges, as shown in (2). As an example, let us consider a first-order digital IIR filter. The overall procedure to estimate the ranges of internal variables can be summarized as follows. 1) Develop a C program for the first-order IIR filter. 2) Insert the range estimation directives, as shown in Fig. 5(a). Since the ranges of the input (Xin) and the coefficient (Coeff) are known already, only the output (Yout) and the state variable (Ydly) are to be examined. 3) Invoke the range estimator. The estimator generates the simulation model [Fig. 5(b)] and runs it. After the simulation, we obtain the ranges and the integer wordlengths of the variables, as shown in Fig. 6.

5 KIM et al.: FIXED-POINT OPTIMIZATION UTILITY FOR C AND C BASED DSP PROGRAMS 1459 TABLE II EXECUTION SPEED OF THE RANGE ESTIMATOR The simulation time required for the range estimation is approximately two to four times that for the original floatingpoint C program. The execution speed of the original floatingpoint, the developed range estimator, and the Autoscaler for a fourth-order IIR digital filtering program with input samples in the Pentium 90-MHz based PC are compared in Table II. The Autoscaler is a range estimation and automatic scaling program for the fixed-point assembly program development of TMS320C25 and C50 [10], [16]. This clearly shows that the range estimator developed here is quite fast, because it conducts the simulation using a high-level program. Thus, it is practical to obtain a reliable range estimation by simulating with several input signal files. V. FIXED-POINT SIMULATION UTILITY Previously developed analytical methods for evaluating the fixed-point performance of a digital signal processing algorithm are not easily applicable to practical systems containing nonlinear and time-varying blocks [17], [18]. The analysis is more complicated when a specific kind of input signal, such as speech, is required for the evaluation. In order to relieve these problems, we employed a simulation-based method for evaluating the fixed-point characteristics of a digital signal processing algorithm. However, most high-level language compilers do not support fixed-point arithmetic which needs variable wordlength for each arithmetic operation. Therefore, a new fixed-point data class and its operators are developed to prepare a bit-accurate fixed-point version of a floating-point program and to know its finite wordlength and scaling effects by simulation. The declaration part of the class is shown Fig. 7. As shown in this figure, the fixed-point data class has several private members, which are the mantissa ( ), the wordlength ( ), the integer wordlength ( ), and attributes ( ). In order to represent mantissa values requiring a larger precision than that of built-in integer type, e.g., 32 bits, the fixed-point class employs the data class of the GNU C library, which provides multiple precision integer arithmetic facilities [19]. The class supports all of the assignment and arithmetic operations supported in C or C. The list of the operations supported can be found at the operator list in Fig. 7. Brief explanations of them are as follows. 1) The assignment operator converts the input data according to the fixed-point format of the left side variable and assigns the format converted data to this variable. The input data, which is the evaluated result at the right side, can be either a floating-point or a fixed-point data. If the given format of the left side variable does not provide enough precision for representing the input data, the data is modified according to the fixed-point Fig. 7. Declaration of the class. attributes of the left side variable, such as saturation, overflow, rounding, or truncation. 2) The operation of the fixed-point add operator is shown in Fig. 8. In order to prevent any loss of accuracy during the operation, it first computes the maximum integer and the fractional wordlengths of two input data. For example, assume that the integer and fractional wordlengths for the first operand are two and eight, and those for the second operand are four and six, respectively. Then, the internal data has the integer wordlength of five in order to prevent overflows and the fractional wordlength of eight not to lose any accuracy. After then, the input data are aligned by using shift operations, and added in fixed-point. 3) The fixed-point multiply operator is also described in Fig. 8. For two s complement data, the wordlength of the product is the sum of the wordlengths of the two input data minus one in order to eliminate the superfluous sign. And, the integer wordlength becomes the sum of the two input integer wordlengths. 4) Arithmetic right shift operator shifts right signals in bit-level. Since the wordlength and the integer wordlength are not affected by this operator, one-bit right shift halves the real value of variables. 5) Arithmetic left shift operator shifts left signals in bit-level. In a similar way, one-bit left shift doubles the real value of variables.

6 1460 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1998 Fig. 10. A fixed-point C++ program for a first-order IIR filter. TABLE III COMPARISON OF SIMULATION TIME FOR A FOURTH-ORDER IIR FILTER (a) (b) Fig. 8. Operators of the class. (a) Addition. (b) Multiplication. (a) (b) Fig. 9. Three operand addition employing different architectures. including,,,,, and are also supported. Let us reconsider the simple IIR filter presented in Section IV. The overall procedure to investigate the fixedpoint behavior can be summarized as follows. 1) Develop a C program for the first-order IIR filter. Users can use the same filter developed in Section IV. 2) Insert the fixed-point simulation directives in the same manner in Fig. 5(a). 3) Invoke the fixed-point simulator. The simulator generates the simulation model shown in Fig. 10 and runs it. Note that only the type of variables is converted to, but the other parts of the program are not changed. After the simulation, we obtain the fixed-point output of the filter and compute the performance measure such as signal-to-quantization noise ratio (SQNR). The fixed-point version shows the SQNR of db when compared to the floating-point implementation results. As described above, there is no loss of accuracy during the fixed-point add or multiply operations. However, arithmetic right shift or arithmetic left shift may cause loss of accuracy or overflows. The fixed-point format conversion or precision degradation occurs only at the assignment operator. Thus, two programs shown in Fig. 9(a) and (b) can have different fixed-point results. In Fig. 9(b), the result of is format converted to 10 bit data, is added to the operand, and then format converted again to. Fixed-point performance of different implementation architectures that may be based on the same algorithm can be compared by utilizing the above characteristics. The relational- and assignment-based operators are also supported. The relational operators include,,,,, and operations. Since objects are compared after being interpreted as real values, the relational operators can also be used with other variables as well as or variables. Assignment based operators, VI. FAST FIXED-POINT SIMULATION LIBRARY The execution speed of the fixed-point library is very important because the optimization of wordlength may require iterative simulations, with different fixed-point formats, using a long length of input signal [20]. However, as shown in Table III, the execution speed of the operation is quite slow compared to the floating-point case. Most of today s computers, such as Pentium-based PC s and SPARC-based workstations, are equipped with fast floating-point hardware units. Actually, the operations for,, or that are shown in Fig. 8 can be conducted using floatingpoint hardware units when the input and internal wordlengths are not very large. For example, the addition of two fixedpoint data in the class consists of the integer wordlength comparison, fixed-point data shift, and fixed-point addition as shown in Fig. 8. These operations, respectively, correspond to the exponent comparison, mantissa alignment, and mantissa addition in a floating-point arithmetic unit [21]. We developed

7 KIM et al.: FIXED-POINT OPTIMIZATION UTILITY FOR C AND C BASED DSP PROGRAMS 1461 a pfix (pseudo fixed-point) library for implementing fast fixedpoint operations using a hardware floating-point data-path. In this pfix library, the assignment operator plays the most important role, which limits the accuracy of a floatingpoint data according to the fixed-point format of the result operand. The assignment operator converts the input data according to the fixed-point format of the left side variable and assigns the format converted data to this variable. If the given format of the left side variable does not provide enough precision for representing the input data, the data is modified, such as saturation, overflow, rounding, or truncation, according to the fixed-point attributes of the left side variable. Wordlength of the fixed-point data is limited in the pfix library when a bit accurate simulation result is required. The wordlength of fixed-point data, including the temporary results, should not exceed the wordlength of the mantissa in a floating-point unit. Note that the IEEE standard double precision floating-point format assigns 53 bits for the mantissa. When two fixed-point data having the wordlength of and are multiplied, the product has the wordlength of for the two s complementary signed case. The wordlength of 53 is sufficient for modeling 16- or 24-bit programmable digital signal processors, but is not very sufficient in general. For example, the pfix-library-based simulation of a digital filter having the signal wordlength of 32 bits and the coefficient wordlength of 24 bits does not produce a bit-exact result. The accuracy of the pfix library when compared with the bit-exact library can be categorized into three cases. When the wordlengths of all the variables including temporary ones are less than 53 bits, the simulation results are bitaccurate. As the second case, when the input and output signal wordlengths are quite smaller than 53 bits, but the wordlength of temporary variables, such as products or accumulated results, are larger than 53 bits, the results are not bit-accurate. But, we can obtain quite accurate results that can be used for the computation of SQNR or other quantization effects. For example, the pfix-based simulation of a digital filter having the signal wordlength of 32 bits and the coefficient wordlength of 24 bits does not produce a bit-accurate result when compared to the simulation. But, the SQNR comparison with the double precision simulation result is quite close: db for both the pfix and the. Finally, in the case that the wordlengths of data or coefficients are larger than 53 bits, the pfix library is not suitable for the fixed-point simulation. The execution speed of various simulation methods are compared quantitatively in Table III. The simulation of the fourth-order IIR filter using a pfix library takes only 7.4 times the execution time of the original floating-point program. Although the or VHDL-based simulation can guarantee the bit-accurate results, it usually takes 50 to 200 times longer than the floating-point simulation, as shown in Table III. VII. EXAMPLE WORDLENGTH OPTIMIZATION OF AN 8 8 IDCT ARCHITECTURE The developed utility is very useful for the fixed-point performance evaluation of large C-based digital signal processing programs, such as the FS-CELP vocoder and the MPEG- 2 audio decoder [16], [22]. In addition, this program can be used for the wordlength optimization of a digital signal processing algorithm based on a specific architecture. Note that the finite wordlength performances are affected not only by the algorithm but by the architecture as well. In this section, the wordlength optimization of the multiplier-adder based 8 8 IDCT architecture conforming to the IEEE standard specification will be illustrated. The two-dimensional discrete cosine transform has been used widely for various image and video processing standards, such as JPEG, H.261 for video telephone, MPEG, and HDTV. Fixed-point implementation of the algorithm may result in a noticeable mismatch between the encoder and the decoder. Especially, this problem can be magnified when the IDCT algorithm is used in a feedback loop for motion compensation because the quantization error is accumulated. To solve this problem, IEEE specifies the numerical characteristics of the 8 8 IDCT for use in visual telephone and similar applications using the IEEE Standard [18]. The test procedure is also described in [23], and the output errors shall meet the following specifications when samples of random input data sequence are applied. 1) For any pixel location, the peak error (ppe) shall not exceed one in magnitude. 2) For any pixel location, the mean square error (pmse) shall not exceed ) Overall, the mean square error (omse) shall not exceed ) For any pixel location, the mean error (pme) shall not exceed in magnitude. 5) Overall, the mean error (ome) shall not exceed in magnitude. 6) For all-zero input, the proposed IDCT shall generate all-zero output. There have been several studies to analyze the finite wordlength effects of several fast DCT/IDCT algorithms [18], [24], [25]. But these studies that compared different algorithms are not readily applicable to the hardware optimization mainly because the implementation architecture was not considered. We optimized the wordlengths of a multiplier adder-based implementation for the 8 8 row column-based IDCT algorithm. The simulation-based wordlength optimization method is employed which uses the input sequences specified in the IEEE standard and the accurate hardware model of the architecture to be evaluated. The bit-accurate hardware model is derived from the floating-point model by using the developed fixed-point simulation utility. A. Multiplier Adder-Based 8 8 IDCT Architecture The row-column decomposition method is most popular for implementing the 8 8 IDCT algorithm because of the structural and computational regularities. The block diagram of the row column decomposition-based 2-D IDCT architecture is shown in Fig. 11. In order to reduce the number of arithmetic operations and keep the regularity of the 1-D IDCT unit, Chen s algorithm [26] is employed. Then, the eight-point

8 1462 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1998 Fig. 11. Block diagram of 2-D IDCT using the row column decomposition. IDCT can be calculated as follows: Fig. 12. Block diagram of a multiplier adder-based 2-D IDCT. (15) Fig. 13. Overall procedure for wordlength optimization. where (16) As shown above, we can obtain the eight-point IDCT results by conducting two 4 4 matrix vector multiplications and butterfly operations. In order to calculate matrix vector products, the multiplier adder-based architecture, shown in Fig. 12, is employed. In this figure, there are five quantization error sources: quantization of coefficients for the row-wise and the column-wise transform (Coeff1, Coeff2), wordlength reduction for the outputs of the first and the second multipliers (Adder1, Adder2), and the output of the limiter for the rowwise transform (1D_Out). The conventional rounding scheme is used for quantization except for the output of multiplier. Since the multiplier adder chain usually comprises the critical path, we assume that the output of the multiplier is simply truncated in order not to employ an additional adder for rounding. B. Wordlength Optimization The overall wordlength determination procedure using the fixed-point optimization utility is shown in Fig. 13 [27]. In order to determine the integer wordlength, the range estimation utility has been used. First, C or C based programs are developed for modeling the various architectures using the floating-point arithmetic. Variables and coefficients are declared as the double precision floating-point. Then, the range estimation utility estimates the range of internal signals during the floating-point simulation by using the new floating-point data class and the operator overloading characteristic of the C. The minimum integer wordlength that prevents overflows can be determined from the estimated range information. A set of cost-optimum wordlengths should require the minimum hardware cost while satisfying the IEEE Standard The fixed-point performance is measured using the developed fixed-point simulator [28]. The wordlength optimization method shown in [20] is employed to find out the optimum wordlength using a small number of simulations. First, minimum wordlengths of all signals are determined. The minimum wordlength of a signal is the smallest wordlength guaranteeing the specified system performance when all the other signals have enough precision, such as double floatingpoint format [20]. From this lower bound of wordlengths, the set of optimal wordlengths that minimizes the hardware cost while satisfying the given specifications is determined. As for modeling the hardware cost, the cell libraries of VLSI Technologies, Inc. are used [29]. A C program for computing the 1-D IDCT using the multiplier adder chain is shown in Fig. 14. From the simulation results, it was found that the most crucial condition for determining the minimum wordlength

9 KIM et al.: FIXED-POINT OPTIMIZATION UTILITY FOR C AND C BASED DSP PROGRAMS 1463 Fig. 14. A multiplier adder-based 1-D IDCT program using floating-point arithmetic. TABLE IV OPTIMIZED WORDLENGTHS FOR THE MULTIPLIER ADDER-BASED ARCHITECTURE of Coeff1, Coeff2, and 1D_Out is the overall mean square error, omse. However, the peak mean error pme and the overall mean error ome play the key role for determining the minimum wordlengths of Adder1 and Adder2, because the means of the quantization errors are not zero due to truncation. Minimum wordlengths and optimum wordlengths are shown in Table IV. The numbers inside parentheses show the wordlengths of the previous implementation [30]. As shown in the table, the internal wordlengths can be substantially reduced when compared with the previous work. This fixed-point utility can also be used for the optimization of the bit-serial arithmeticbased implementation of 8 8 IDCT architecture. VIII. CONCLUDING REMARKS Fixed-point utility software that aids scaling and wordlength optimization of algorithms written in C or C programs is developed. The integer wordlength for each fixed-point signal is automatically determined using the developed range estimator, and the finite wordlength effects can also be evaluated using the fixed-point simulator. In order to obtain reliable scaling information from the finite length of simulation results, a statistical model of the range, which covers both unimodal and multimodal signals, is developed. The range estimator is very fast when compared with our previously developed Autoscaler because it collects the range information from the simulation of C programs, instead of the assembly programs. The fixed-point simulator can meet the requirements for bit-accuracy and fast simulation by employing two specific fixed-point classes. The library models the fixedpoint arithmetic using software routines, but can be used to obtain bit-exact results without any practical limitation in the wordlength. The pfix library utilizes floating-point hardware for fast fixed-point simulation. This library is useful for simulation-based wordlength optimization of digital signal processing algorithms, which requires iterative fixed-point simulation with different wordlengths assigned [20]. This work can be extended to efficient VLSI or fixed-point digital signal processor-based development tools because the optimized fixed-point digital signal processing programs can be converted easily to VHDL codes or integer programs. This software has been used for the fixed-point performance evaluation of complex digital signal processing algorithms such as CELP vocoder [16] and MPEG audio [22]. This utility is freely available to academics through our web site, REFERENCES [1] L. B. Jackson, On the interaction of roundoff noise and dynamic range in digital filters, Bell Syst. Tech. J., vol. 49, pp , Feb [2] B. Liu, Effect of finite word-length on the accuracy of digital filters: A review, IEEE Trans. Circuit Theory, vol. CT-18, pp , Nov [3] H. K. Kwan, Amplitude scaling of arbitrary linear digital networks, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp , Dec [4] C. Caraiscos and B. Liu, A roundoff error analysis of the LMS adaptive algorithm, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP- 32, pp , Feb [5] P. Hilfinger, A high-level language and silicon compiler for digital signal processing, in Proc. Custom Integrated Circuits Conf., Los Alamitos, CA, 1985, pp [6] K. W. Leary and W. Waddington, DSP/C: A standard high level language for DSP and numeric processing, in Proc. Int. Conf. Acoustics, Speech and Signal Processing, 1990, pp [7] Mentor Graphics, DSP Station Design Architect User s Manual, [8] Comdisco Systems, Inc., SPW The DSP Framework Hardware Design System User s Guide, Aug [9] D. M. Samani, J. Ellinger, E. J. Pwers, and E. E. Swartzlander, Jr., Simulation of variable precision IEEE floating point using C++ and its application in digital signal processor design, in Proc. 27th Annual Asilomar Conf. Signals, Systems, and Computer, 1993, pp [10] S. Kim and W. Sung, A floating-point to fixed-point assembly program translator for the TMS320C25, IEEE Trans. Circuits Syst. II, vol. 41, pp , Nov [11], Fixed-point simulation utility for C and C++ based digital signal processing programs, in Proc. 28th Annual Asilomar Conf. Signals, Systems, and Computer, 1994, pp [12] B. Stroustrup, The C++ Programming Language, 2nd ed. Reading, MA: Addison Wesley, [13] K. Baudendistel, Compiler development for fixed-point processors, Ph.D. dissertation, Georgia Inst. Technol., Sept [14] S. M. Kendall and A. Stuart, The Advanced Theory of Statistics. London, U.K.: Griffin, 1987, vol. 1. [15] R. M. Stallman, Using and Porting GNU CC, Free Software Foundation, Inc., [16] W. Sung, J. Sohn, J. Kang, and S. Kim, Fixed-point implementation of the FS-CELP vocoder using the autoscaler for the TMS320C50, in Proc. Int. Conf. Signal Processing Applications and Technology, 1995, pp [17] P. W. Wong, Quantization and roundoff noises in fixed-point FIR digital filters, IEEE Trans. Signal Processing, vol. 39, pp , July [18] I. D. Yun and S. U. Lee, On the fixed-point-error analysis of several fast DCT algorithms, IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp , Feb

10 1464 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 11, NOVEMBER 1998 [19] D. Lea, User s Guide to the GNU C++ Library, version 2.0, Apr [20] W. Sung and K.-I. Kum, Simulation-based word-length optimization method for fixed-point digital signal processing systems, IEEE Trans. Signal Processing, pp , Dec [21] I. Koren, Computer Arithmetic Algorithms. New York: Wiley, [22] M. S. Jeong, S. Kim, J. S. Sohn, J. Y. Kang, and W. Sung, Finite wordlength effects evaluation of the MPEG audio decoder, in Proc. Int. Conf. Signal Processing Applications and Technology, Oct. 1996, to be published. [23] IEEE Standard Specifications for the Implementations of Inverse Discrete Cosine Transform, [24] Y. C. Yao and C. Y. Hsu, Comparative performance of fast cosine transform with fixed-point roundoff error analysis, IEEE Trans. Signal Processing, vol. 42, pp , May [25] I. D. Yun and S. U. Lee, On the fixed-point-error analysis of several fast IDCT algorithms, IEEE Trans. Circuits Syst., vol. 42, pp , Nov [26] W. H. Chen, C. H. Smith, and S. C. Fralick, A fast computational algorithm for the discrete cosine transform, IEEE Trans. Commun., vol. COM-25, pp , Sept [27] W. Sung and K.-I. Kum, Wordlength determination and scaling software for a signal flow block diagram, in Proc. Int. Conf. Acoustics, Speech and Signal Processing, Apr. 1994, vol. 2, pp [28] S. Kim, K.-I. Kum, and W. Sung, Fixed-point optimization utility for C and C++ based digital signal processing programs, in Proc IEEE Workshop on VLSI Signal Processing, Oct. 1995, pp [29] VLSI Technologies, Inc., 1-Micron Cell Compiler Library, Nov [30] T. Miyazaki, T. Nishitani, M. Edahiro, and I. Ono, DCT/IDCT processor for HDTV developed with DSP silicon compiler, J. VLSI Signal Processing, vol. 5, pp , Seehyun Kim (S 91 M 97) received the B.S., M.S., and Ph.D. degrees in electrical engineering from Seoul National University, Korea, in 1990, 1992, and 1996, respectively. From 1996 to 1997, he was with the University of California at Berkeley as a Postdoctoral Researcher, where he was involved in the Ptolemy project and studied an infrastructure for finite precision implementation of digital signal processing algorithms. In 1997, he joined LG Corporate Institute of Technology and has been involved in developing a high-definition TV (HDTV) decoder LSI. His research interests include VLSI algorithms and architectures for signal processing and concurrent hardware and software design of embedded real-time systems. Ki-Il Kum received the B.S. and M.S. degrees in control and instrumentation engineering from Seoul National University, Korea, in 1991 and 1994, respectively. He is working for the Ph.D. degree from the School of Electrical Engineering, Seoul National University. His research interests include VLSI design and multiprocessor implementation of DSP algorithms and computer-aided design for digital signal processing systems, especially wordlength optimization for fixed-point DSP systems. Wonyong Sung (S 84 M 87) received the B.S. degree in electronic engineering from the Seoul National University in 1978, the M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST) in 1980, and the Ph.D. degree in electrical and computer engineering from the University of California, Santa Barbara, in He has been a member of the faculty of the Seoul National University since From 1980 to 1983, he worked in the Central Research Laboratory of the Gold Star (currently LG Electronics) in Korea. During his Ph.D. study, he developed parallel processing algorithms, vector and multiprocessor implementation, and low-complexity FIR filter design. From May 1993 to June 1994, he consulted with the Alta Group for the development of the fixed point optimizer, automatic wordlength determination and scaling software. His major research interests are the development of fixed-point optimization tools, implementation of VLSI for digital signal processing, and multiprocessor based implementations. Since January of 1998, he has worked as a Chief of the SEED (system engineering and design center) at Seoul National University.

Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP)

Floating-point to Fixed-point Conversion. Digital Signal Processing Programs (Short Version for FPGA DSP) Floating-point to Fixed-point Conversion for Efficient i Implementation ti of Digital Signal Processing Programs (Short Version for FPGA DSP) Version 2003. 7. 18 School of Electrical Engineering Seoul

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

MANY image and video compression standards such as

MANY image and video compression standards such as 696 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 9, NO 5, AUGUST 1999 An Efficient Method for DCT-Domain Image Resizing with Mixed Field/Frame-Mode Macroblocks Changhoon Yim and

More information

Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations

Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations 2326 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 59, NO 10, OCTOBER 2012 Analytical Approach for Numerical Accuracy Estimation of Fixed-Point Systems Based on Smooth Operations Romuald

More information

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently

More information

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. K. Ram Prakash 1, A.V.Sanju 2 1 Professor, 2 PG scholar, Department of Electronics

More information

Fixed Point Representation And Fractional Math. By Erick L. Oberstar Oberstar Consulting

Fixed Point Representation And Fractional Math. By Erick L. Oberstar Oberstar Consulting Fixed Point Representation And Fractional Math By Erick L. Oberstar 2004-2005 Oberstar Consulting Table of Contents Table of Contents... 1 Summary... 2 1. Fixed-Point Representation... 2 1.1. Fixed Point

More information

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B)

COMPUTER ARCHITECTURE AND ORGANIZATION. Operation Add Magnitudes Subtract Magnitudes (+A) + ( B) + (A B) (B A) + (A B) Computer Arithmetic Data is manipulated by using the arithmetic instructions in digital computers. Data is manipulated to produce results necessary to give solution for the computation problems. The Addition,

More information

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders 770 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 48, NO. 8, AUGUST 2001 FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders Hyeong-Ju

More information

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Area And Power Efficient LMS Adaptive

More information

Wordlength Optimization

Wordlength Optimization EE216B: VLSI Signal Processing Wordlength Optimization Prof. Dejan Marković ee216b@gmail.com Number Systems: Algebraic Algebraic Number e.g. a = + b [1] High level abstraction Infinite precision Often

More information

Rapid Prototyping System for Teaching Real-Time Digital Signal Processing

Rapid Prototyping System for Teaching Real-Time Digital Signal Processing IEEE TRANSACTIONS ON EDUCATION, VOL. 43, NO. 1, FEBRUARY 2000 19 Rapid Prototyping System for Teaching Real-Time Digital Signal Processing Woon-Seng Gan, Member, IEEE, Yong-Kim Chong, Wilson Gong, and

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

Optimization Method for Broadband Modem FIR Filter Design using Common Subexpression Elimination

Optimization Method for Broadband Modem FIR Filter Design using Common Subexpression Elimination Optimization Method for Broadband Modem FIR Filter Design using Common Subepression Elimination Robert Pasko 1, Patrick Schaumont 2, Veerle Derudder 2, Daniela Durackova 1 1 Faculty of Electrical Engineering

More information

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:

More information

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 2, APRIL 1997 429 Express Letters A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation Jianhua Lu and

More information

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient Digital Filter Synthesis Considering Multiple Graphs for a Coefficient Jeong-Ho Han, and In-Cheol Park School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea jhhan.kr@gmail.com,

More information

Optimization Method for Broadband Modem FIR Filter Design using Common Subexpression Elimination

Optimization Method for Broadband Modem FIR Filter Design using Common Subexpression Elimination Optimization Method for Broadband Modem FIR Filter Design using Common Subepression Elimination Robert Pasko *, Patrick Schaumont **, Veerle Derudder **, Daniela Durackova * * Faculty of Electrical Engineering

More information

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator A.Sindhu 1, K.PriyaMeenakshi 2 PG Student [VLSI], Dept. of ECE, Muthayammal Engineering College, Rasipuram, Tamil Nadu,

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group

More information

THE orthogonal frequency-division multiplex (OFDM)

THE orthogonal frequency-division multiplex (OFDM) 26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,

More information

Low Power Floating-Point Multiplier Based On Vedic Mathematics

Low Power Floating-Point Multiplier Based On Vedic Mathematics Low Power Floating-Point Multiplier Based On Vedic Mathematics K.Prashant Gokul, M.E(VLSI Design), Sri Ramanujar Engineering College, Chennai Prof.S.Murugeswari., Supervisor,Prof.&Head,ECE.,SREC.,Chennai-600

More information

REAL-TIME DIGITAL SIGNAL PROCESSING

REAL-TIME DIGITAL SIGNAL PROCESSING REAL-TIME DIGITAL SIGNAL PROCESSING FUNDAMENTALS, IMPLEMENTATIONS AND APPLICATIONS Third Edition Sen M. Kuo Northern Illinois University, USA Bob H. Lee Ittiam Systems, Inc., USA Wenshun Tian Sonus Networks,

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

Multiframe Blocking-Artifact Reduction for Transform-Coded Video

Multiframe Blocking-Artifact Reduction for Transform-Coded Video 276 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 4, APRIL 2002 Multiframe Blocking-Artifact Reduction for Transform-Coded Video Bahadir K. Gunturk, Yucel Altunbasak, and

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

VHDL Implementation of Multiplierless, High Performance DWT Filter Bank

VHDL Implementation of Multiplierless, High Performance DWT Filter Bank VHDL Implementation of Multiplierless, High Performance DWT Filter Bank Mr. M.M. Aswale 1, Prof. Ms. R.B Patil 2,Member ISTE Abstract The JPEG 2000 image coding standard employs the biorthogonal 9/7 wavelet

More information

Variable Temporal-Length 3-D Discrete Cosine Transform Coding

Variable Temporal-Length 3-D Discrete Cosine Transform Coding 758 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 6, NO. 5, MAY 1997 [13] T. R. Fischer, A pyramid vector quantizer, IEEE Trans. Inform. Theory, pp. 568 583, July 1986. [14] R. Rinaldo and G. Calvagno, Coding

More information

PERFORMANCE ANALYSIS OF INTEGER DCT OF DIFFERENT BLOCK SIZES USED IN H.264, AVS CHINA AND WMV9.

PERFORMANCE ANALYSIS OF INTEGER DCT OF DIFFERENT BLOCK SIZES USED IN H.264, AVS CHINA AND WMV9. EE 5359: MULTIMEDIA PROCESSING PROJECT PERFORMANCE ANALYSIS OF INTEGER DCT OF DIFFERENT BLOCK SIZES USED IN H.264, AVS CHINA AND WMV9. Guided by Dr. K.R. Rao Presented by: Suvinda Mudigere Srikantaiah

More information

SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS

SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS Liang-Gee Chen Po-Cheng Wu Tzi-Dar Chiueh Department of Electrical Engineering National

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

Fixed Point LMS Adaptive Filter with Low Adaptation Delay Fixed Point LMS Adaptive Filter with Low Adaptation Delay INGUDAM CHITRASEN MEITEI Electronics and Communication Engineering Vel Tech Multitech Dr RR Dr SR Engg. College Chennai, India MR. P. BALAVENKATESHWARLU

More information

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider Tamkang Journal of Science and Engineering, Vol. 3, No., pp. 29-255 (2000) 29 Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider Jen-Shiun Chiang, Hung-Da Chung and Min-Show

More information

Twiddle Factor Transformation for Pipelined FFT Processing

Twiddle Factor Transformation for Pipelined FFT Processing Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,

More information

A deblocking filter with two separate modes in block-based video coding

A deblocking filter with two separate modes in block-based video coding A deblocing filter with two separate modes in bloc-based video coding Sung Deu Kim Jaeyoun Yi and Jong Beom Ra Dept. of Electrical Engineering Korea Advanced Institute of Science and Technology 7- Kusongdong

More information

AMONG various transform techniques for image compression,

AMONG various transform techniques for image compression, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 3, JUNE 1997 459 A Cost-Effective Architecture for 8 8 Two-Dimensional DCT/IDCT Using Direct Method Yung-Pin Lee, Student Member,

More information

Performance analysis of Integer DCT of different block sizes.

Performance analysis of Integer DCT of different block sizes. Performance analysis of Integer DCT of different block sizes. Aim: To investigate performance analysis of integer DCT of different block sizes. Abstract: Discrete cosine transform (DCT) has been serving

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Yui-Lam CHAN and Wan-Chi SIU

Yui-Lam CHAN and Wan-Chi SIU A NEW ADAPTIVE INTERFRAME TRANSFORM CODING USING DIRECTIONAL CLASSIFICATION Yui-Lam CHAN and Wan-Chi SIU Department of Electronic Engineering Hong Kong Polytechnic Hung Hom, Kowloon, Hong Kong ABSTRACT

More information

Two High Performance Adaptive Filter Implementation Schemes Using Distributed Arithmetic

Two High Performance Adaptive Filter Implementation Schemes Using Distributed Arithmetic Two High Performance Adaptive Filter Implementation Schemes Using istributed Arithmetic Rui Guo and Linda S. ebrunner Abstract istributed arithmetic (A) is performed to design bit-level architectures for

More information

Pipelined Fast 2-D DCT Architecture for JPEG Image Compression

Pipelined Fast 2-D DCT Architecture for JPEG Image Compression Pipelined Fast 2-D DCT Architecture for JPEG Image Compression Luciano Volcan Agostini agostini@inf.ufrgs.br Ivan Saraiva Silva* ivan@dimap.ufrn.br *Federal University of Rio Grande do Norte DIMAp - Natal

More information

MULTICHANNEL image processing is studied in this

MULTICHANNEL image processing is studied in this 186 IEEE SIGNAL PROCESSING LETTERS, VOL. 6, NO. 7, JULY 1999 Vector Median-Rational Hybrid Filters for Multichannel Image Processing Lazhar Khriji and Moncef Gabbouj, Senior Member, IEEE Abstract In this

More information

Using Streaming SIMD Extensions in a Fast DCT Algorithm for MPEG Encoding

Using Streaming SIMD Extensions in a Fast DCT Algorithm for MPEG Encoding Using Streaming SIMD Extensions in a Fast DCT Algorithm for MPEG Encoding Version 1.2 01/99 Order Number: 243651-002 02/04/99 Information in this document is provided in connection with Intel products.

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

II. MOTIVATION AND IMPLEMENTATION

II. MOTIVATION AND IMPLEMENTATION An Efficient Design of Modified Booth Recoder for Fused Add-Multiply operator Dhanalakshmi.G Applied Electronics PSN College of Engineering and Technology Tirunelveli dhanamgovind20@gmail.com Prof.V.Gopi

More information

Evaluating MMX Technology Using DSP and Multimedia Applications

Evaluating MMX Technology Using DSP and Multimedia Applications Evaluating MMX Technology Using DSP and Multimedia Applications Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * November 22, 1999 The University of Texas at Austin Department of Electrical

More information

Improving Area Efficiency of Residue Number System based Implementation of DSP Algorithms

Improving Area Efficiency of Residue Number System based Implementation of DSP Algorithms Improving Area Efficiency of Residue Number System based Implementation of DSP Algorithms M.N.Mahesh, Satrajit Gupta Electrical and Communication Engg. Indian Institute of Science Bangalore - 560012, INDIA

More information

Design and Implementation of Effective Architecture for DCT with Reduced Multipliers

Design and Implementation of Effective Architecture for DCT with Reduced Multipliers Design and Implementation of Effective Architecture for DCT with Reduced Multipliers Susmitha. Remmanapudi & Panguluri Sindhura Dept. of Electronics and Communications Engineering, SVECW Bhimavaram, Andhra

More information

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder Syeda Mohtashima Siddiqui M.Tech (VLSI & Embedded Systems) Department of ECE G Pulla Reddy Engineering College (Autonomous)

More information

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL Tharanidevi.B 1, Jayaprakash.R 2 Assistant Professor, Dept. of ECE, Bharathiyar Institute of Engineering for Woman, Salem, TamilNadu,

More information

An efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients

An efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients Title An efficient multiplierless approximation of the fast Fourier transm using sum-of-powers-of-two (SOPOT) coefficients Author(s) Chan, SC; Yiu, PM Citation Ieee Signal Processing Letters, 2002, v.

More information

Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials

Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials Tuukka Toivonen and Janne Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering

More information

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest

Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest Measuring Improvement When Using HUB Formats to Implement Floating-Point Systems under Round-to- Nearest Abstract: This paper analyzes the benefits of using half-unitbiased (HUB) formats to implement floatingpoint

More information

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material

More information

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation Floating Point Arithmetic fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation for example, fixed point number

More information

Design and Implementation of 3-D DWT for Video Processing Applications

Design and Implementation of 3-D DWT for Video Processing Applications Design and Implementation of 3-D DWT for Video Processing Applications P. Mohaniah 1, P. Sathyanarayana 2, A. S. Ram Kumar Reddy 3 & A. Vijayalakshmi 4 1 E.C.E, N.B.K.R.IST, Vidyanagar, 2 E.C.E, S.V University

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer Arithmetic ECE 666 Part 4-C Floating-Point Arithmetic - III Israel Koren ECE666/Koren Part.4c.1 Floating-Point Adders

More information

Data Hiding in Video

Data Hiding in Video Data Hiding in Video J. J. Chae and B. S. Manjunath Department of Electrical and Computer Engineering University of California, Santa Barbara, CA 9316-956 Email: chaejj, manj@iplab.ece.ucsb.edu Abstract

More information

A Multiple-Precision Division Algorithm

A Multiple-Precision Division Algorithm Digital Commons@ Loyola Marymount University and Loyola Law School Mathematics Faculty Works Mathematics 1-1-1996 A Multiple-Precision Division Algorithm David M. Smith Loyola Marymount University, dsmith@lmu.edu

More information

Using Shift Number Coding with Wavelet Transform for Image Compression

Using Shift Number Coding with Wavelet Transform for Image Compression ISSN 1746-7659, England, UK Journal of Information and Computing Science Vol. 4, No. 3, 2009, pp. 311-320 Using Shift Number Coding with Wavelet Transform for Image Compression Mohammed Mustafa Siddeq

More information

In this article, we present and analyze

In this article, we present and analyze [exploratory DSP] Manuel Richey and Hossein Saiedian Compressed Two s Complement Data s Provide Greater Dynamic Range and Improved Noise Performance In this article, we present and analyze a new family

More information

Low-Complexity Block-Based Motion Estimation via One-Bit Transforms

Low-Complexity Block-Based Motion Estimation via One-Bit Transforms 702 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 7, NO. 4, AUGUST 1997 [8] W. Ding and B. Liu, Rate control of MPEG video coding and recording by rate-quantization modeling, IEEE

More information

FPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT

FPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT FPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT Rajalekshmi R Embedded Systems Sree Buddha College of Engineering, Pattoor India Arya Lekshmi M Electronics and Communication

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Code Generation for TMS320C6x in Ptolemy

Code Generation for TMS320C6x in Ptolemy Code Generation for TMS320C6x in Ptolemy Sresth Kumar, Vikram Sardesai and Hamid Rahim Sheikh EE382C-9 Embedded Software Systems Spring 2000 Abstract Most Electronic Design Automation (EDA) tool vendors

More information

Digital Signal Processing Introduction to Finite-Precision Numerical Effects

Digital Signal Processing Introduction to Finite-Precision Numerical Effects Digital Signal Processing Introduction to Finite-Precision Numerical Effects D. Richard Brown III D. Richard Brown III 1 / 9 Floating-Point vs. Fixed-Point DSP chips are generally divided into fixed-point

More information

Image denoising in the wavelet domain using Improved Neigh-shrink

Image denoising in the wavelet domain using Improved Neigh-shrink Image denoising in the wavelet domain using Improved Neigh-shrink Rahim Kamran 1, Mehdi Nasri, Hossein Nezamabadi-pour 3, Saeid Saryazdi 4 1 Rahimkamran008@gmail.com nasri_me@yahoo.com 3 nezam@uk.ac.ir

More information

Data Wordlength Optimization for FPGA Synthesis

Data Wordlength Optimization for FPGA Synthesis Data Wordlength Optimization for FPGA Synthesis Nicolas HERVÉ, Daniel MÉNARD and Olivier SENTIEYS IRISA University of Rennes 6, rue de Kerampont 22300 Lannion, France {first-name}.{name}@irisa.fr Abstract

More information

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA Vagner S. Rosa Inst. Informatics - Univ. Fed. Rio Grande do Sul Porto Alegre, RS Brazil vsrosa@inf.ufrgs.br Eduardo

More information

Video Compression System for Online Usage Using DCT 1 S.B. Midhun Kumar, 2 Mr.A.Jayakumar M.E 1 UG Student, 2 Associate Professor

Video Compression System for Online Usage Using DCT 1 S.B. Midhun Kumar, 2 Mr.A.Jayakumar M.E 1 UG Student, 2 Associate Professor Video Compression System for Online Usage Using DCT 1 S.B. Midhun Kumar, 2 Mr.A.Jayakumar M.E 1 UG Student, 2 Associate Professor Department Electronics and Communication Engineering IFET College of Engineering

More information

Image Segmentation Techniques for Object-Based Coding

Image Segmentation Techniques for Object-Based Coding Image Techniques for Object-Based Coding Junaid Ahmed, Joseph Bosworth, and Scott T. Acton The Oklahoma Imaging Laboratory School of Electrical and Computer Engineering Oklahoma State University {ajunaid,bosworj,sacton}@okstate.edu

More information

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC

ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC ERROR MODELLING OF DUAL FIXED-POINT ARITHMETIC AND ITS APPLICATION IN FIELD PROGRAMMABLE LOGIC Chun Te Ewe, Peter Y. K. Cheung and George A. Constantinides Department of Electrical & Electronic Engineering,

More information

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P

AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER. Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P AN IMPROVED FUSED FLOATING-POINT THREE-TERM ADDER Mohyiddin K, Nithin Jose, Mitha Raj, Muhamed Jasim TK, Bijith PS, Mohamed Waseem P ABSTRACT A fused floating-point three term adder performs two additions

More information

02 - Numerical Representations

02 - Numerical Representations September 3, 2014 Todays lecture Finite length effects, continued from Lecture 1 Floating point (continued from Lecture 1) Rounding Overflow handling Example: Floating Point Audio Processing Example: MPEG-1

More information

CS6303 COMPUTER ARCHITECTURE LESSION NOTES UNIT II ARITHMETIC OPERATIONS ALU In computing an arithmetic logic unit (ALU) is a digital circuit that performs arithmetic and logical operations. The ALU is

More information

Chapter 2. Data Representation in Computer Systems

Chapter 2. Data Representation in Computer Systems Chapter 2 Data Representation in Computer Systems Chapter 2 Objectives Understand the fundamentals of numerical data representation and manipulation in digital computers. Master the skill of converting

More information

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8 Data Representation At its most basic level, all digital information must reduce to 0s and 1s, which can be discussed as binary, octal, or hex data. There s no practical limit on how it can be interpreted

More information

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC Thangamonikha.A 1, Dr.V.R.Balaji 2 1 PG Scholar, Department OF ECE, 2 Assitant Professor, Department of ECE 1, 2 Sri Krishna

More information

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER. A.Anusha 1 R.Basavaraju 2 anusha201093@gmail.com 1 basava430@gmail.com 2 1 PG Scholar, VLSI, Bharath Institute of Engineering

More information

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter Implementation of a Low Power Decimation Filter Using /3-Band IIR Filter Khalid H. Abed Department of Electrical Engineering Wright State University Dayton Ohio, 45435 Abstract-This paper presents a unique

More information

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Floating Point Considerations

Floating Point Considerations Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the

More information

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017 VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier 1 Katakam Hemalatha,(M.Tech),Email Id: hema.spark2011@gmail.com 2 Kundurthi Ravi Kumar, M.Tech,Email Id: kundurthi.ravikumar@gmail.com

More information

A faster way to downscale during JPEG decoding to a fourth

A faster way to downscale during JPEG decoding to a fourth A faster way to downscale during JPEG decoding to a fourth written by written by Stefan Kuhr 1 Introduction The algorithm that is employed in the JPEGLib for downscaling to a fourth during decoding uses

More information

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 6, December 2013, pp. 805~814 ISSN: 2088-8708 805 Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating

More information

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation Optimizing the Deblocking Algorithm for H.264 Decoder Implementation Ken Kin-Hung Lam Abstract In the emerging H.264 video coding standard, a deblocking/loop filter is required for improving the visual

More information

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT Wai Chong Chia, Li-Minn Ang, and Kah Phooi Seng Abstract The 3D Set Partitioning In Hierarchical Trees (SPIHT) is a video

More information

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications IIR filter design using CSA for DSP applications Sagara.K.S 1, Ravi L.S 2 1 PG Student, Dept. of ECE, RIT, Hassan, 2 Assistant Professor Dept of ECE, RIT, Hassan Abstract- In this paper, a design methodology

More information

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor Abstract Increasing prominence of commercial, financial and internet-based applications, which process decimal data, there

More information

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS. INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS Arulalan Rajan 1, H S Jamadagni 1, Ashok Rao 2 1 Centre for Electronics Design and Technology, Indian Institute of Science, India (mrarul,hsjam)@cedt.iisc.ernet.in

More information

Lossless Image Compression having Compression Ratio Higher than JPEG

Lossless Image Compression having Compression Ratio Higher than JPEG Cloud Computing & Big Data 35 Lossless Image Compression having Compression Ratio Higher than JPEG Madan Singh madan.phdce@gmail.com, Vishal Chaudhary Computer Science and Engineering, Jaipur National

More information

HIGH SPEED REALISATION OF DIGITAL FILTERS

HIGH SPEED REALISATION OF DIGITAL FILTERS HIGH SPEED REALISATION OF DIGITAL FILTERS A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF PHILOSOPHY IN ELECTRICAL AND ELECTRONIC ENGINEERING AT THE UNIVERSITY OF HONG KONG BY TSIM TS1M MAN-TAT, JIMMY DEPARTMENT

More information

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications , Vol 7(4S), 34 39, April 204 ISSN (Print): 0974-6846 ISSN (Online) : 0974-5645 Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications B. Vignesh *, K. P. Sridhar

More information

Design of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength

Design of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength 166 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 20, NO. 1, JANUARY 2002 Design of Hierarchical Crossconnect WDM Networks Employing a Two-Stage Multiplexing Scheme of Waveband and Wavelength

More information

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator M.Chitra Evangelin Christina Associate Professor Department of Electronics and Communication Engineering Francis Xavier

More information

The Serial Commutator FFT

The Serial Commutator FFT The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this

More information

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical

More information

A Library of Parameterized Floating-point Modules and Their Use

A Library of Parameterized Floating-point Modules and Their Use A Library of Parameterized Floating-point Modules and Their Use Pavle Belanović and Miriam Leeser Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115, USA {pbelanov,mel}@ece.neu.edu

More information

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the

More information

Divide: Paper & Pencil

Divide: Paper & Pencil Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend -1000 10 101 1010 1000 10 Remainder See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or

More information