On Designs of Radix Converters Using Arithmetic Decompositions

Similar documents
Application of LUT Cascades to Numerical Function Generators

BDD Representation for Incompletely Specified Multiple-Output Logic Functions and Its Applications to the Design of LUT Cascades

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

Numerical Function Generators Using Edge-Valued Binary Decision Diagrams

A Parallel Branching Program Machine for Emulation of Sequential Circuits

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

An Architecture for IPv6 Lookup Using Parallel Index Generation Units

PROGRAMMABLE NUMERICAL FUNCTION GENERATORS: ARCHITECTURES AND SYNTHESIS METHOD. Tsutomu Sasao, Shinobu Nagayama, Jon T. Butler

A Memory-Based Programmable Logic Device Using Look-Up Table Cascade with Synchronous Static Random Access Memories

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Carry-Free Radix-2 Subtractive Division Algorithm and Implementation of the Divider

Implementing FIR Filters

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

Floating Point Square Root under HUB Format

ARITHMETIC operations based on residue number systems

BDD Representation for Incompletely Specified Multiple-Output Logic Functions and Its Applications to Functional Decomposition

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries

Multiple-valued Combinational Circuits with Feedback*

Paper ID # IC In the last decade many research have been carried

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

Minimization of Multiple-Valued Functions in Post Algebra

Iterative Synthesis Techniques for Multiple-Valued Logic Functions A Review and Comparison

On-line Algorithms for Complex Number Arithmetic

Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3

A comparative study of Floating Point Multipliers Using Ripple Carry Adder and Carry Look Ahead Adder

Design and Implementation of CVNS Based Low Power 64-Bit Adder

VLSI for Multi-Technology Systems (Spring 2003)

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Design and Implementation of 3-D DWT for Video Processing Applications

Delay Time Analysis of Reconfigurable. Firewall Unit

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

Fault Tolerant Parallel Filters Based On Bch Codes

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Comparison of Decision Diagrams for Multiple-Output Logic Functions

CET4805 Component and Subsystem Design II. EXPERIMENT # 5: Adders. Name: Date:

Power and Area Efficient Implementation for Parallel FIR Filters Using FFAs and DA

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>


Area-Time Efficient Square Architecture

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

MCM Based FIR Filter Architecture for High Performance

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

VLSI Implementation of Fast Addition Using Quaternary Signed Digit Number System

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

Piecewise Arithmetic Expressions of Numeric Functions and Their Application to Design of Numeric Function Generators

CHAPTER 4. DIGITAL DOWNCONVERTER FOR WiMAX SYSTEM

Fast Block LMS Adaptive Filter Using DA Technique for High Performance in FGPA

II. MOTIVATION AND IMPLEMENTATION

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

COPYRIGHTED MATERIAL INDEX

A Single/Double Precision Floating-Point Reciprocal Unit Design for Multimedia Applications

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

High Speed Multiplication Using BCD Codes For DSP Applications

Tutorial 2 Implementing Circuits in Altera Devices

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

Handouts. FPGA-related documents

ABSTRACT I. INTRODUCTION. 905 P a g e

Boolean Algebra and Logic Gates

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

An Efficient Implementation of Floating Point Multiplier

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA

Handouts. 1. Project Guidelines and DSP Function Generator Design Specifications. (We ll discuss the project at the beginning of lab on Wednesday)

Implementation of a Fast Sign Detection Algoritm for the RNS Moduli Set {2 N+1-1, 2 N -1, 2 N }, N = 16, 64

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

THE DESIGN OF HIGH PERFORMANCE BARREL INTEGER ADDER S.VenuGopal* 1, J. Mahesh 2

Simulation Results Analysis Of Basic And Modified RBSD Adder Circuits 1 Sobina Gujral, 2 Robina Gujral Bagga

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

Low Power Design Techniques

VLSI Design Of a Novel Pre Encoding Multiplier Using DADDA Multiplier. Guntur(Dt),Pin:522017

Lecture 7. Standard ICs FPGA (Field Programmable Gate Array) VHDL (Very-high-speed integrated circuits. Hardware Description Language)

Programmable Architectures and Design Methods for Two-Variable Numeric Function Generators 1

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

Sine/Cosine using CORDIC Algorithm

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

Digital Design Using Digilent FPGA Boards -- Verilog / Active-HDL Edition

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Implementation of Floating Point Multiplier Using Dadda Algorithm

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-II COMBINATIONAL CIRCUITS

Verilog for High Performance

VARUN AGGARWAL

On-Line Error Detecting Constant Delay Adder

Low-Power FIR Digital Filters Using Residue Arithmetic

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

FIR Filter Architecture for Fixed and Reconfigurable Applications

High speed Integrated Circuit Hardware Description Language), RTL (Register transfer level). Abstract:

DISTRIBUTED ARITHMETIC (DA) provides multiplierless

x 1 x 2 ( x 1, x 2 ) X 1 X 2 two-valued output function. f 0 f 1 f 2 f p-1 x 3 x 4

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR.

Chap.3 3. Chap reduces the complexity required to represent the schematic diagram of a circuit Library

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

High Speed Radix 8 CORDIC Processor

Transcription:

On Designs of Radix Converters Using Arithmetic Decompositions Yukihiro Iguchi 1 Tsutomu Sasao Munehiro Matsuura 1 Dept. of Computer Science, Meiji University, Kawasaki 1-51, Japan Dept. of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka -5, Japan February, Abstract In arithmetic circuits for digital signal processing, radixes other than two are often used to make circuits faster. In such cases, radix converters are necessary. However, in general, radix converters tend to be complex. This paper considers design methods for p-nary to binary converters. It introduces a new design technique called arithmetic decomposition. It also compares the amount of hardware and performance of radix converters implemented on s. 1 Introduction Arithmetic operations of digital systems usually use radix two []. However, in digital signal processing, for high speed operations, p-nary (p > ) numbers are often used [1, ]. In such cases, the conversion between binary numbers and p-nary numbers are necessary. Such operation is radix conversion [, ]. Various methods exist to convert p-nary numbers into binary numbers. Many of them require large amount of computations. Especially when the radix conversion is implemented by a random logic circuit, the network tends to be quite complex [5]. Radix converters can be implemented by table lookup. That is, to store the conversion table in the memory. This method is fast but requires a large amount of memory. In [9], LUT cascade realizations [] of binary to ternary converters, ternary to binary converters, binary to decimal converters, and decimal to binary converters are presented. In [11], the concept of Weighted-Sum functions (WS functions) is used to design radix converters by using LUT cascades. In this paper, we consider the design of circuits that convert p-nary numbers into binary numbers by using arithmetic decomposition [1]. We also consider the implementations on s (Field Programmable Gate Arrays). For the readability, we use examples for p =, however the method can be easily extended to any value of p. Radix Converter.1 Radix Conversion Definition.1 Let a p-nary number of n-digit be x = (x,x n,...,x ) p, and let a q-nary number of m-digit be y =(y m 1,y m,...,y ) q. Given a vector x, the radix conversion is the operation that obtains y that satisfies the relation: x i p i = i= m 1 y j q j, (.1) j= where x i P, y j Q, P = {,1,...,p 1}, and Q = {,1,...,q 1}. Let y =(y m 1,y m,...,y ), y i {,1} be the output functions of p-nary to binary converter. Then, when p is a prime number, y i depends on all the inputs x i (i =,1,...,). Whenp is not a power of two, we have an incompletely specified function. When we implement a p- nary to binary converter, unused combinations exist. Usually, we assign to the undefined outputs. Example.1 In the case of ternary to binary converter, we use the binary-coded-ternary code to represent a ternary number. That is is represented by (), 1 is represented by (1), and is represented by (1). Note that (11) is the unused code. Table.1 is the truth table of two-digit ternary to binary converter. In the binary-coded-ternary representation, (11) is an undefined input, and corresponding output is don t care. In Table.1, the inputs in the binary-codedternary representation are denoted by z =(z,z,z 1,z ). The inputs in the ternary representations are denoted by x =(x 1,x ). The output in the binary representations are denoted by y =(y,y,y 1,y ). (End of Example). Direct Method A straightforward way to implement a radix converter is to realize the circuit for equation (.1). Example. Consider the -digit ternary to binary converter (terbin). In this case, we implement x x 5 x 5 x x x 1 x 1 x. (.)

Table.1: Truth table of a ternary to binary converter. Binary-coded Ternary Binary Decimal -ternary z z z 1 z x 1 x y y y 1 y 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 1 1 x x 1 x x 5 * x * x5 5 * x 1 * x 1 * 5 Figure.1: -digit ternary to binary converter: Direct Method. Fig..1 shows the circuit for terbin produced by a logic synthesis program. Note that x is implemented by x x and 1 x 1 is implemented by x 1 x 1,but i x i ( < i ) are implemented by multipliers. Also, a cascade adder is used to obtain the result. Since the coefficients are constants, the multipliers can be replaced by adders. However, the wiring of resulting circuit is quite complex. (End of Example) 9 1 1 11 Arithmetic Decomposition In the direct implementation of a p-nary to binary converter, the amount of hardware and the propagation delay increase with the number of input digits. To reduce the amount of hardware, we can use arithmetic decomposition [1]. In this section, we introduce arithmetic decompositions that are suitable for radix converters. 1.1 WS functions and Their Arithmetic Decomposition The Weighted Sum function (WS function) is a mathematical model of radix converters, bit-counting circuits, and convolution operations [11, 1]. Definition.1 An n input WS function [11] is defined as WS( x)= i= w i x i, (.1) where x =(x,x n,...,x 1,x ) is the input vector, w = (w,w n,...,w 1,w ) is the weight vector, and each element is an integer. Theorem.1 A WS function can be represented as a sum of two WS functions as follows: WS( x)= i= w i x i = αws A ( x)ws B ( x), where WS A ( x) = i= a ix i,ws B ( x) = i= b ix i, and α is an integer. This is the arithmetic decomposition, and α is a decomposition coefficient.. Arithmetic Decompositions using Different Coefficients A radix converter can be represented as a WS function. Thus, by using arithmetic decompositions with different decomposition coefficients α, we can implement various radix converters. In this part, we consider two cases: One uses k as the decomposition coefficient, and the other uses p k as the decomposition coefficient. When the decomposition coefficient is k : In this case, the radix converter is realized as WS( x)= k WS A ( x)ws B ( x). Since the multiplication of k can be implemented by the shift (wiring), it is implemented compactly. However, WS B depends on all the input variables, and the total size of the circuit is not so small. When the decomposition coefficient is p k : In this case, the radix converter is realized as WS( x)=p k WS A ( x)ws B ( x). The multiplication of p k increases the number of outputs for p k WS A. However, the numbers of inputs for WS A and WS B are half of the original function, so the total network will be much smaller. Example.1 Let us design the -digit ternary to binary converter (terbin). Consider two cases where the decomposition coefficients are α= = and α= = 1. The

Table.1: Coefficients of arithmetic decompositions of the powers of. Decomposition Coefficients i i α = = α = = 1 1 1 1 1 1 1 9 9 1 9 1 1 1 1 1 1 5 51 1 9 11 5 1 9 1 11 1 ternary number is represented by the binary-coded-ternary code. Table.1 shows the coefficients of arithmetic decompositions of i, (i =,1,,...,). Note that these coefficients are equal to the weights for WS A ( x) and WS B ( x). We assume that 11-input cells are available for cascade realization. From Table.1, we have two different realizations for terbin. When the decomposition coefficient is. WS( x)= WS A ( x)ws B ( x). WS A ( x)=x 11x x 5 1x x x x 1 x. WS B ( x) =11x 5x 51x 5 1x x 9x x 1 1x. WS A ( x) depends on the inputs x x. The number of inputs is. Since the output takes values from to (1 11 )=9, bits are necessary to represent the output. WS A ( x) has inputs and outputs, so it is implemented by a single cell. WS B ( x) depends on all the inputs x x, so the number of inputs is 1. Since the output takes the values from to (1 9 151 5 11)=, 9 bits are necessary to represent the output. W S B ( x) has 1 inputs and 9 outputs. It is implemented by a cascade with 11-input cells. The multiplication by canbesimplyimplemented by shifting bits positions. We add the upper bits of WS B and the outputs of WS A by a -bit adder. Fig..1 shows the network, which uses memory with 9.5K bits and a -bit adder. When the decomposition coefficient is. WS( x)= WS A ( x)ws B ( x). WS A ( x)= x x 1 x 5 x. WS B ( x)= x x 1 x 1 x. WS A ( x) depends on inputs x x, so the number of the inputs is. Since the output takes the values from to(1 9 ) =, bits are necessary to represent the output. Thus, WS A ( x) is implemented by a single cell. x x 1 x5 x x x 9 9 x 11 -bit adder 9 5 Figure.1: -digit ternary to binary converter: Arithmetic decomposition with coefficient. x x x x 5 -bit adder 5 Figure.: -digit ternary to binary converter: Arithmetic decomposition with coefficient. WS B ( x) depends on inputs x x, so the number of inputs is. The output takes values from to (1 9 )=. The output range of WS A is. So bits are necessary to represent the output. We directly implement WS A by a cell. We add WS A with W S B by a -bit adder. Fig.. shows the network, which uses 5K bits memories and a -bit adder. (End of Example). Arithmetic Decomposition using the Binary Representation of Inputs In this part, we will introduce an arithmetic decomposition with respect to the binary representation of inputs. Definition. Letibeaninteger.BIT(i, j) denotes the j- th bit of the binary representation of i, where the LSB is the -th bit. Example. BIT (,1)=1,BIT(,)=,BIT(1,1)=, and BIT(1, )=1. An integer number i can be represented by log i bits. Thus, we have the relation: log i 1 i = j= j BIT (i, j).

BIT( x,) BIT( x,) 1 BIT( x,1) BIT( x,1) 1 -bit adder Figure.: -digit ternary to binary converter: Decomposed using the binary representation of inputs. From this, we have the following: Theorem. A p-nary to binary converter can be represented as the form (Proof) WS( x) = WS( x)= i= log p 1 j= p i x i = = i= j p i BIT(x i, j) i= log p 1 p i j BIT (x i, j) j= log p 1 j= j p i BIT (x i, j) i= (Q.E.D.) Example. Consider the -digit ternary to binary converter (terbin). By Theorem., WS( x) can be represented as: WS( x) = i= i BIT(x i,1) i= i BIT(x i,). Fig.. is the circuit corresponding to the above decomposition. Each module has inputs. Since 5 1 =, each module has 1 outputs. The multiplication by two is implemented by shifting one bit position. The circuit uses K bits of memories and a -bit adder. We can further reduce the circuit by using Theorem.1, where is the decomposition coefficient: WS( x)= [ 1 [ i= i= i BIT (x i,1) i BIT (x i,) i= i= i BIT(x i,1)] i BIT(x i,)]. BIT( x,) BIT( x,) BIT( x,) BIT( x,) BIT( x,1) BIT( x,1) BIT( x,1) BIT( x,1) 1 1 1 -bit adder 1 Figure.: -digit ternary to binary converter: Decomposed using the binary representation of inputs and further decomposed with coefficient. 1 1 (i=,1,,,,5) (i=,,,9,1,11,1) Figure.1: -digit ternary to binary converter: Outputs are partitioned into two groups. Fig.. is the circuit corresponding the above decomposition, where each logic block has only inputs. In this case, the total amount of memory is only 5 bits. (End of Example) Partition of Outputs The decomposition of a logic function is to design the circuit by partitioning the inputs, while the partition of a logic function is to design the circuit by partitioning the outputs. The arithmetic decomposition requires an adder in the outputs, while the partition of the outputs requires no adder in the outputs. So, the circuit can be faster. Example.1 [9] Consider the -digit ternary to binary converter (terbin). Let the output values for the don t care inputs be. Then, the column multiplicity of the decomposition chart of the radix converter will be = 551. This shows that the single cascade realization of the radix converter requires cells with log 551 1 = 1 inputs, which is rather large. So, we partition the output to reduce the amount of hardware. Assume that LUT cascades with 1-input cells are used. We can implement the radix converter as shown in Fig..1. In this implementation, the outputs are partitioned into two groups: the upper bits and the lower bits are implemented separately. As shown in Fig..1, the amount of memory in the cascades is 1 ( ) ()= 5, (bits). (End of Example)

Direct Method START Cascade Method and Arithmetic Decomposition Method START Table 5.1: Amount of hardware and performance of -digit ternary to binary converters on Cyclone II. Radix: p #of digits: n Generator Source File Radix: p #of digits: n Generator Source File Memory Contents File Design Method LE MK EM Delay [nsec] DM1 With EM Fig..1. DM W/O EM Fig..1 195. AD1 Fig..1. AD Fig.. 1. AD BIT Fig.. 1 1. AD BIT Fig.. 1. PAR MK only Fig..1 11. Development Tool QuartusII v..1 Configuration Data STOP Development Tool QuartusII v..1 Configuration Data STOP Figure 5.1: Development system for radix converters. 5 Implementation on s To see the effectiveness of the approach, we implemented various designs of ternary to binary converters on s, and compared the amount of hardware and performance. 5.1 s and Their Development System We used Altera Cyclone II (EPC5T1C) device, having Embedded Multipliers (EMs) that perform the multiply-and-sum operations, embedded memories (MKs), and logic elements (LEs). Each MK contains 9 bits. We used Altera Quartus II V..1 as the development tool. We also developed a radix converter synthesis system shown in Fig. 5.1 that generates codes describing various designs, and data for MKs. In the s, LUTs (cells) were implemented by MKs, while adders were implemented by LEs. 5. -digit Ternary to Binary Converters Table 5.1 compares different designs of -digit ternary to binary converters (terbin). Direct Method (DM): The system generated Verilog- HDL code from the specification: radix p and the number of digits n. DM1 directly implements i= i x i. Fig..1 is the circuit generated by the Quartus. After mapping, the Quartus replaced the multipliers with EMs, and adders with LEs. DM also corresponds to Fig..1. In this case, however, the Quartus replaced multipliers with LEs instead of EMs. So, the circuit consists of LEs only. It has 195 LEs, which means 19 LEs replaced EMs. It is faster than DM1, since LEs perform constant multiplications faster than EMs. Arithmetic Decomposition Method (AD): The system generated code and data for MKs. AD1 corresponds to Fig..1, which was obtained with the decomposition coefficient. The Quartus replaced four LUTs with MKs, and the adder with LEs. AD corresponds to Fig.., which was obtained with the decomposition coefficient. The Quartus replaced two LUTs with two MKs, and the adder with LEs. AD corresponds to Fig.., which was obtained with the arithmetic decomposition using binary representation of inputs. The Quartus replaced two LUTs with two MKs, and the adder with 1 LEs. AD corresponds to Fig.., which was obtained by the arithmetic decomposition with the coefficient and using binary representation of inputs. The Quartus replaced four LUTs with four MKs, and adders with LEs. It is slower than AD since the adder is more complex. Partition of Outputs Method (PAR): The system generated code and data for MKs. PAR corresponds to Fig..1 consisting of LUTs. The Quartus replaced LUTs with 11 MKs. In the case of terbin, we can conclude that AD is the best realizations: It is the fastest and requires smaller amount of hardware. 5

Table 5.: Amount of hardware and performance of 1-digit ternary to binary converters on Cyclone II. Design Method LE MK EM Delay [nsec] DM1 With EM Fig..1 9 15 5. DM W/O EM Fig..1 5. AD Fig.. 1. AD BIT Fig.. 19 1. AD BIT Fig.. 5 1. :Used EPCT1C :Used EPCF5C 5. 1-digit Ternary to Binary Converters Table 5. compares 5 different designs of 1-digit ternary to binary converters (1terbin). Direct Method (DM): DM1 is similar to Fig..1, but uses 15 EMs. DM is also similar to Fig..1. Also in this case, it is faster than DM1. Arithmetic Decomposition Method (AD): AD is similar to Fig.., but the decomposition coefficient is. In this case, we need a 1-input -output LUT, a 1-input 1-output LUT, and a -bit adder. The Quartus replaced these LUTs with MKs, and the adder with LEs. So, we had to use a larger, EPCT1C which contains MKs. AD is similar to Fig.., but uses a pair of 1-input 19-output LUTs and a -bit adder. To replace these LUTs, the Quartus required MKs. So, we had to use a larger, EPCF5C which contains 5 MKs. AD is similar to Fig.., but uses the decomposition coefficient. It uses four LUTs with inputs. The Quartus replaced these LUTs with four MKs, and the adders with 5 LEs. In the case of 1terbin, AD and AD are faster, but require larger s, so AD is the best choice. Conclusion and Comments In this paper, we presented arithmetic decompositions to design p-nary to binary converter. We used ternary to binary converters to illustrate the idea. We also implemented the converts on s to confirm the effectiveness of the methods. Note that Fig..1 is a non-disjoint decomposition, while Fig.., Fig.., and Fig.. are disjoint decompositions. The disjoint decomposition in Fig.. is easy to find from equation (.) or Fig..1, while the disjoint decomposition in Fig.. is not so easy to find. Also, the decomposition in Fig.. produces similar but different sub-circuits, while the decomposition in Fig.. produces two identical sub-circuits. The circuit in Fig.. is faster than Fig... These techniques can be combined to design radix converts with more digits, and other arithmetic circuits []. Acknowledgments This research is supported in part by the Grant in Aid for Scientific Research of MEXT, and the Kitakyushu Area Innovative Cluster Project of MEXT. References [1] T. Hanyu and M. Kameyama, A MHz pipelined multiplier using 1.5 V-supply multiplevalued MOS current-mode circuits with dual-rail source-coupled logic, IEEE Journal of Solid-State Circuits, 11, (1995), 9-15. [] C. H. Huang, A fully parallel mixed-radix conversion algorithm for residue number applications, IEEE Trans. Comput., vol., pp. 9-, 19. [] K. Ishida, N. Homma, T. Aoki, and T. Higuchi, Design and verification of parallel multipliers using arithmetic description language: ARITH, th International Symposium on Multiple-Valued Logic, Toronto, Canada, May, pp.-9. [] I. Koren, Computer Arithmetic Algorithms, nd Edition, A. K. Peters, Natick, MA,. [5] S. Muroga, VLSI System Design, John Wiley & Sons, 19, pp. 9- [] D. Olson, and K. W. Current, Hardware implementation of supplementary symmetrical logic circuit structure concepts, th IEEE International Symposium on Multiple- Valued Logic Portland, Oregon, May -5,. [] T. Sasao, Switching Theory for Logic Synthesis, Kluwer Academic Publishers, 1999. [] T. Sasao, M. Matsuura, and Y. Iguchi, A cascade realization of multiple-output function for reconfigurable hardware, International Workshop on Logic and Synthesis(IWLS1), Lake Tahoe, CA, June 1-15, 1, pp. 5-. [9] T. Sasao, Radix converters: Complexity and implementation by LUT cascades, 5th International Symposium on Multiple-Valued Logic, Calgary, Canada, May 19-1, 5, pp.5-. [1] T. Sasao, Y. Iguchi, and T. Suzuki, On LUT cascade realizations of FIR filters, DSD5, th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, Porto, Portugal, Aug. - Sept., 5, pp.-. [11] T. Sasao, Analysis and synthesis of weighted-sum functions, IEEE Trans. on CAD (to be published).