Procedural Functional Partitioning for Low Power

Size: px
Start display at page:

Download "Procedural Functional Partitioning for Low Power"

Transcription

1 Procedural Functional Partitioning for Low Power Enoch Hwang Frank Vahid Yu-Chin Hsu Department of Computer Science Department of Computer Science La Sierra University, Riverside, CA University of California, Riverside, CA Abstract Power consumption in VLSI systems has become a critical metric for design evaluation. Although power reduction techniques can be applied at every level of design abstraction, most techniques are applied to the lower levels such as the register-transfer or gate levels and are limited to reducing power only to a portion of the entire circuit. We demonstrate power reduction using a coarsegrained procedural functional partitioning technique. This technique allows us to easily partition the entire processor (controller and datapath) by separating entire procedures into several smaller, mutually exclusive, interacting processors at a much higher level of the design abstraction. Power reduction is accomplished because only one smaller processor needs to be active at a time. Our results show that procedural functional partitioning can reduce average power consumption by as much as 78% with an average of 51%. 1. Introduction Power reduction of VLSI systems is becoming an increasingly important goal for system design. The shrinking sizes of integrated circuits calls for reduced power consumption, in order to extend battery life and prevent excessive heat generation. While power reduction techniques can be applied at nearly every design abstraction level, most of the recent power estimation and reduction tools focus on the lower levels of abstraction, such as the register-transfer (RT) or gate levels [1]. There is therefore still much opportunity to reduce power by developing tools that focus on higher levels, such as the behavioral level, where major reductions may be gained. Power consumption of a CMOS circuit is composed mainly from dynamic power [2] due to capacitive charging and discharging when a signal toggles as computed from the equation 1 2 P d = C V f N (1) 2 where C is the loading capacitance in the circuit, V is the supply voltage, f is the clock frequency, and N is the switching frequency of the gate. For a given behavioral level circuit design, all the variables except for N are usually fixed by a design decision. Thus, the power reduction of a circuit at this level is obtained mainly by reducing the switching frequency N from the outputs of the gates, and this is made possible by shutting down unnecessary portions of the circuit. This is accomplished either by adding new control circuitry to the circuit or by partitioning the circuit. Shut down techniques that add new control circuitry include [3], [4], [5], [6], [7], [8], [9], [1] and [11]. [3] isolates the portion of the circuit that is responsible for most of the computation complexity and tries to shut down the rest of the circuit as much as possible. [4], [5] and [6] reduces the switching activities in flip-flops, registers, and memories. [7], [8], [9], [1] and [11] modifies the datapath to reduce switching activities during calculations. Partitioning techniques include [12], [13], [14] and [15]. [12] and [13] partitions resources in the datapath, whereas [14] partitions the controller. The idle parts, whether in the datapath or the controller, can then be shut down. All the above techniques, however, try to reduce the switching activities within a localized subset of gates either in the datapath or the controller alone. [15] partitions both the controller and the datapath so that when a part is idle, switching activities are reduced for the entire processor and not just to a localized subset of gates. The technique described in [15], however, is applied to the finite state machine with datapath (FSMD) model where the resources have already been scheduled, so the timings are already determined and fixed. In this paper, we demonstrate power reductions at the behavioral level using procedural functional partitioning. Procedural functional partitioning allows us to easily partition the entire processor (both the controller and the datapath together) into several smaller, mutually exclusive, interacting processors at an even higher level of the design abstraction. The smaller inactive processors are then shut down and power reduction is accomplished because only one smaller processor is active at a time. Furthermore, this technique is applied before any of the resources are scheduled so timing issues are not of a concern. Our results show that procedural functional partitioning can reduce average power consumption by as much as 78% with an average of 51%. In addition to power reduction, this technique has been shown to reduce I/O and synthesis runtimes by nearly an order of magnitude, as well as being suited for hardware/software partitioning [16]. The rest of this paper is organized as follows. In section 2, we give a background of how functional ISLPED2 2/18/ 11:32 AM Page 1 of 5

2 partitioning can reduce power. Our procedural functional partitioning technique is discussed in section 3. Section 4 shows our experimental results, and we conclude in section Background Circuit partitioning can occur either before synthesis (as in functional partitioning) or after synthesis (as in structural partitioning). When a circuit is synthesized from a behavioral description to a gate level netlist, only one control unit and one datapath is generated. The implication of having only one control unit and one datapath is that when there is a signal change at the primary input, the entire datapath and control unit may be affected. This results in much unnecessary switching activities and therefore, wasted power. In structural partitioning, as shown in Figure 1 (a), we first synthesize a behavioral process written in a behavioral hardware description language (HDL), down to a registertransfer or gate level netlist. Next, we partition the netlist into two or more parts. From a power reduction perspective, structural partitioning does not reduce switching activity. The reason is that synthesis converts one process into one custom hardware processor, consisting of a controller and a datapath. Even though partitioning creates more than one physical partition, there is still logically only one datapath and one control unit. When a primary input signal changes, it may affect many of the gates in the datapath regardless of which partition they are in. Similarly, all the gates in the control unit must also be active, regardless of which part they are in, in order to provide the correct control signals to the datapath. In functional partitioning, the behavioral process, written in HDL, is first partitioned into several smaller processes, as shown in Figure 1 (b). Each process is then synthesized to its own custom processor, each with its own controller and datapath. From a power reduction perspective, functional partitioning can significantly reduce switching activity. The reason is that each processor, having its own controller and datapath, is smaller than one large processor implementing the entire process, and only one processor is executing a computation at any given time while the other processors will be idle. When a processor is Behavioral Specification High-Level Synthesis Structural Cont- Un- Data (a) Partitioning path rol it Behavioral Fn 1 Fn Specification 2 Functional Partitioning Fn 1 High-Level Synthesis Figure 1: Partitioning: (a) structurally; (b) functionally. (b) Fn 2 idle, we have, in effect, shut down both the controller and the datapath for that processor. Thus, power reduction is possible through functional partitioning because it reduces the overall switching activities of the entire system by localizing the activities within smaller processors, hence, power consumed per operation is less. 3. Procedural functional partitioning In contrast to FSMD functional partitioning [15] which performs a partitioning of FSMD states, procedural functional partitioning performs a coarse-grained partitioning of procedures and functions. Given a HDL description of a design with two or more procedures and functions, we partition this HDL design into two or more parts by roughly dividing the procedures and functions equally between the parts. In our current model there is no explicit power metric in the partitioning cost function. Since we are dealing with homogeneous parts, we have found that minimizing communication between parts, while maintaining balanced part sizes, is a good approach to minimizing power. Our future work will replace this indirect approach with a more direct power metric. The parts are then individually synthesized down to the gate level with each part having its own controller and datapath. Finally, the parts are reconnected back together via a common communication bus as explained in section 3.2. Working together, these parts are functionally equivalent to the original unpartitioned design. Partitioning will introduce new switching activity for interprocessor communication. Therefore, the problem that must be solved is one of partitioning such that the reduction in switching activity for computations far outweighs the switching activity increase for communication. Communication will also increase the execution time because extra cycles must be added for each handshaking. This, however, is not a problem in regards to preserving the cycle-by-cycle behavior of the original unpartitioned design because our technique is applied to the HDL design before scheduling. 3.1 Example Given the pseudo-code in Figure 2, this unpartitioned process calls three separate procedures, mod, divide, and is_prime. These procedures execute sequentially but they are all consuming power for the entire duration of the execution of the process. We can partition this process into three parts as in Figure 3 by placing mod and divide in part two, is_prime in part three and the remaining code in part one. Part one controls the entire program flow, performing the primary I/O, and calling the procedures in the other parts when required. Initially, part one executes while parts two and three remains idle. Thus, only part one is consuming power. When the procedures Mod and Divide are required, part one passes the control to part two, after which only part two is active. When part two completes its execution, it returns the results and control back to part one. ISLPED2 2/18/ 11:32 AM Page 2 of 5

3 Part one then calls part three to perform the is_prime function. Finally, part three returns back to part one and execution stops. Figure 4 and Figure 5 show a plot of the switching activities for the unpartitioned and partitioned examples as shown in Figure 2 and Figure 3 respectively. Here we actually see the decreased in switching activities as a result of the partitioning. In the unpartitioned case, Figure 4, the number of toggles can go as high as 2 in a clock cycle. The total power consumption is 597 µjoules. Whereas, in the partitioned case, Figure 5, the maximum number of toggles in a clock cycle is only 65. The total power consumption for this case is only 2,243 µjoules, a 56% reduction. In the partitioned graph, Figure 5, parts two and three are called twice each. Part one is the main controlling module which is active only during the communication with the other two parts. We see that while part two is active, parts one and three do not have any switching activities whatsoever. Similarly, when part three is active, parts one and two are completely inactive. The only time when all three parts have switching activities is when the handshaking is occurring over the communication bus. 3.2 Communication bus A common communication bus is added between all the parts as shown in Figure 6. This bus is used for all handshaking and data transfers between the parts. The bus input x; count := 2; while ((count * count) <= x) loop mod(x,count,mod_result); divide(x,count,divide_result); is_prime(count,prime_result1); is_prime(divide_result,prime_result2); if (mod_result = ) and (prime_result1 = 1) and (prime_result2 = 1) then answer1 <= divide_result; answer2 <= count; exit; else count := count + 1; end if; end loop; Figure 2: Sample unpartitioned pseudo-code for Fac1. part 1 while ((count * count) <= x) loop call and wait for result from part 2; call and wait for result from part 3; if end loop; part 2 wait for part 1 to call; get parameters x and count; mod(x,count,mod_result); divide(x,count,divide_result); return mod_result and divide_result; part 3 wait for part 1 to call; get parameters count and divide_result; is_prime(count,prime_result1); is_prime(divide_result,prime_result2); return prime_result1 and prime_result2; Figure 3: Sample partitioned pseudo-code for Fac1. is composed of n address lines and m data lines. The size of n is determined by the number of parts being connected together, and m is determined by the maximum of all the procedure call parameter sizes. Every procedure call will have some input and output parameters and each parameter has a known bit-width. We simply sum the bit-widths for all the parameters in a procedure call to get the total bit-width for that particular procedure call. Since the data bus is shared between all the parts, we therefore need the maximum of the total bitwidths for all the procedure calls. Different bus configurations will result in different power consumption as reported in [19]. In our current experiment, we have used only a simple bus configuration. We should get better results if we also try to optimize the bus configuration. Only one part will control the bus at a time, with the others providing high-impedance values. There is activity on the communication bus only when one part is passing the control to another part. During this short handshaking time, all parts are active. A system executes as a series of procedure calls. If part 1 calls part 2, part 1 sends part 2 s address over the address bus, followed by input parameters to part 2. When part 2 sees its address on the address bus, it wakes up and captures the input parameters and then Number of Toggles Time (cycles) Figure 4: Plot of switching activities for the Fac1 unpartitioned example. Total power is 597 µjoules. Number of Toggles Part 1 Part 2 Part Time (cycles) Figure 5: Plot of switching activities for the Fac1 partitioned example. Total power is 2243 µjoules. External ports Part 1 Part 2 Part n Data Lines Address Lines... Figure 6: Communication bus architecture. ISLPED2 2/18/ 11:32 AM Page 3 of 5

4 executes. While part 2 is executing, all the other parts are idle. Upon completion, part 2 sends part 1 s address over the address bus, followed by any output parameters. Upon seeing its address on the address bus, part 1 wakes up, captures the output parameters, and resumes execution. With this model, a system executes as a series of procedure calls. Such an execution model has the feature that only one part is active at a time. As long as there is no activity on the communication bus, the inputs to all the parts will be at a constant value. If the inputs to a part do not change, then there will be no switching activities within that part. Thus, all switching activities will be localized to just the one active part at a time as shown in Figure Experiment We want to compare the power consumption of the partitioned system with that of the unpartitioned system. First, we describe a system at the behavioral level using VHDL. For the unpartitioned system, we simply synthesize the behavioral description to the gate level and calculate the power consumed. For the partitioned system, we first partition the behavioral description into two or more functional parts as explained in section 3. We then add the communication bus with the necessary handshaking protocol to each part. The parts are individually synthesized to the gate level. The same synthesis optimizations applied to the unpartitioned system are also applied to the partitioned system. At the gate level, we join these parts back together by adding common wires for the communication bus. Power consumption is then calculated. In our experiments, we described eight examples at the behavioral level using VHDL. After applying our partitioning technique, the system is synthesized and simulated to obtain the switching activity data. Power results are calculated using the switching activity obtained from the simulation and the loading capacitance of the netlist obtained from the technology library. The unpartitioned and partitioned systems are compared in terms of their average and total power usage, area, and execution time. The results take into consideration the fact that the capacitance for the communication bus is larger than the internal capacitance. In our power calculation, we have used a bus capacitance that is four times the internal Table 1: Example statistics. Ex gate count comm loops Unpart Part= P1+P2+P3+... calls Fac = /n/n 1/2/2 Fac = /n/n/n 1/1/1/1 Ch = /1/1 4/3/n Ch = /1/1/1 1/3/3/n Ch = /3/3/1 1/1/1/n Diffeq = /12 1/n Volsyn = /n n/n NLoops = /n n/n capacitance. Table 1 shows the statistics for the examples. Fac is a factorization program as shown in Figure 2. Ch evaluates the Chinese Remainder Theorem. Diffeq is an example from the HLSynth MCNC benchmark. Volsyn is a volumemeasuring medical instrument controller. NLoops is an example with nested while loops. For the Fac example, two different ways of partitioning the system were used. The first way is shown in Figure 3. For the Ch example, three different ways of partitioning the system was done. The Unpart and Part columns show the gate count for the unpartitioned and partitioned system respectively. In the Part column, the gate count for the individual parts are further broken down. The comm calls column shows the number of times each part is called via the communication bus. For example, for the Fac1 example, part one is called one time and parts two and three are called n times to denote that it is dependent on the input value. The loops column shows whether there is a loop in that part and if so how many times it loops around. Again, n denotes that it is dependent on the input value. The results are summarized in Table 2. The Area and Time columns show the percent increase in area (in terms of gate count) and execution time respectively. The absolute average and total power for the partitioned systems are shown in columns 4 and 5 respectively. The percent average and total power savings are shown in columns 6 and 7 respectively. In all cases, both the average and total power is reduced. The reduction in average power ranges from 27% to as much as 78% with an average of 51%. The reduction in total power ranges from 5% to 66% with an average of 41%. The tradeoff for the area on the average is 32% and 22% on average for the execution time. The reason for the size and execution time increase is because of the extra communication overhead. It is interesting to note that in the Fac2 example, the gate count for the partitioned system is increased by only 4%. This increase is very insignificant. In fact, a decrease Table 2: Power reduction results. % Overhead Absolute Partitioned Ex Area Time Average Total % % Power Power (µw) (µj) % Power Savings Average Power % Total Power % Fac Fac Ch Ch Ch Diffeq Volsyn NLoops Average ISLPED2 2/18/ 11:32 AM Page 4 of 5

5 was observed in [2]. A possible reason is that the synthesizer can optimize a small design much better than a large design. For both the Fac and Ch examples, we see that power is further reduced just by adding an extra part. For the Fac example, as much as 1% total power was further reduced just by partitioning four parts instead of three parts. By adding an extra part, we have increased the total size because of the communication overhead. However, the individual size of each part is smaller. This causes the switching activity to be even more localized and confined within fewer gates. Thus, more reduction in the power is seen. From this result, we see that it is better to have more smaller parts rather than bigger but less parts, as long as the size and performance overhead are kept within the limits. The rationale is that smaller parts will have less switching activities when the part is active. The rest of the dormant parts do not contribute any dynamic power due to capacitive charging and discharging because there are no switching activities. Of course, more parts mean more communication on the communication bus and longer execution time. 5. Conclusions We have considered the problem of reducing power consumption at the behavioral level using a coarse-grained procedural functional partitioning technique that produces two or more mutually exclusive parts. Unlike many previous power reduction shutdown techniques that focus only on either the datapath or the controller separately, we partition the entire processor to shut down both the controller and the datapath. Furthermore, our technique is applied at a much higher level. We take a process with several procedures and partition the process by putting entire procedures into different parts. We achieved up to 78% average power reduction with an average of 51%. The area overhead is 32% with a 22% increase in the execution time. Proceedings of the International Conference on Computer Design, pp , October [9] Vivek Tiwari, Sharad Malik, & Pranav Ashar, Guarded Evaluation: Pushing Power Management to Logic Synthesis/Design, International Symposium on Low Power Design, [1] A. Raghunathan, & N. Jha, ILP formulation for Low Power Based on Minimizing Switched Capacitance during Data Path Allocation, Proc. of the Int. Sym. on Circuits & Systems, [11] T. Lang, E. Musoll, & J. Cortadella. Redundant Adder for Reduced Output Transitions, Universitat Politècnica de Catalunya DAC Technical Report 22, [12] J. Henkel, "A low power hardware/software partitioning approach for corebased embedded systems," DAC, pp , [13] Vaishnav & Pedram, Delay Optimal Partitioning Targeting Low Power VLSI Circuits, Intl. Conf. on CAD, pp , [14] L. Benini, P. Vuillod, G. De Micheli & C. Coelho, Synthesis of Low- Power Selectively-Clocked Systems from High-Level Specification, International Symposium on System Synthesis, pp , Nov [15] Enoch Hwang, Frank Vahid, & Yu-Chin Hsu, FSMD Functional Partitioning for Low Power, Proceedings of the Design, Automation and Test in Europe, pp , March [16] F. Vahid, T. Le, & Y.C. Hsu, A Comparison of Functional and Structural Partitioning, International Symposium on System Synthesis, pp , November [17] Y. Hsu, T. Liu, F. Tsai, S. Lin, & C. Yu, Digital design from concept to prototype in hours, in Asia-Pacific Conference on Circuits and Systems, December [18] PureSpeed Simulator, An event-driven simulator from FrontLine Design Automation, Inc. [19] T. Givargis and F. Vahid, "Interface Exploration for Reduced Power in Core-Based Systems," International Symposium on System Synthesis, pp , December [2] F. Vahid, "I/O and Performance Tradeoffs with the FunctionBus during Multi-FPGA Partitioning," Int. Sym. on FPGA, pp , February, References [1] Srinivas Devadas & Sharad Malik, A Survey of Optimization Techniques Targeting Low power VLSI Circuits, Proceedings of the Design Automation Conference, pp , [2] A. Chandrakasan, T. Sheng, & R. Brodersen, Low Power CMOS Digital Design, Journal of Solid State Circuits, Vol. 27, No. 4, pp , April [3] G. Lakshminarayana, A. Raghunathan, K. Khouri, N. Jha, and S. Dey, "Common-Case Computation: A High-Level Technique for Power and Performance Optimization," DAC, pp , June [4] Wen-Tsong Shiue and Chaitali Chakrabarti, "Memory exploration for low power, embedded systems," DAC, pp , June [5] T. Lang, E. Musoll, & J. Cortadella. Reducing Energy Consumption of Flip-Flops, Universitat Politècnica de Catalunya DAC Technical Report 4, [6] J. Chan and M. Pedram, Register allocation and binding for low power, Proceedings of the Design Automation Conference, pp , June [7] M Muench, N When, B Wurth, R Mehra and J Sproch, Automating RT- Level Operand Isolation to Minimize Power Consumption in s, Proceedings of the Design, Automation and Test in Europe, March 2. [8] Mazhar Alidina, Jose Monteiro, Srinivas Devadas, & Abhijit Ghosh, Precomputation-Based Sequential Logic Optimization for Low Power, ISLPED2 2/18/ 11:32 AM Page 5 of 5

Stacked FSMD: A Power Efficient Micro-Architecture for High Level Synthesis

Stacked FSMD: A Power Efficient Micro-Architecture for High Level Synthesis Stacked FSMD: A Power Efficient Micro-Architecture for High Level Synthesis Khushwinder Jasrotia, Jianwen Zhu Department of Electrical and Computer Engineering University of Toronto, Ontario MS 3G4, Canada

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses

Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Tony Givargis and David Eppstein Department of Information and Computer Science Center for Embedded Computer

More information

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 8, AUGUST

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 8, AUGUST IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 23, NO 8, AUGUST 2004 1175 Register Binding-Based RTL Power Management for Control-Flow Intensive Designs Jiong Luo, Lin

More information

Instruction-Based System-Level Power Evaluation of System-on-a-Chip Peripheral Cores

Instruction-Based System-Level Power Evaluation of System-on-a-Chip Peripheral Cores 856 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 6, DECEMBER 2002 Instruction-Based System-Level Power Evaluation of System-on-a-Chip Peripheral Cores Tony Givargis, Associate

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

MOST computations used in applications, such as multimedia

MOST computations used in applications, such as multimedia IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 9, SEPTEMBER 2005 1023 Pipelining With Common Operands for Power-Efficient Linear Systems Daehong Kim, Member, IEEE, Dongwan

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Eliminating False Loops Caused by Sharing in Control Path

Eliminating False Loops Caused by Sharing in Control Path Eliminating False Loops Caused by Sharing in Control Path ALAN SU and YU-CHIN HSU University of California Riverside and TA-YUNG LIU and MIKE TIEN-CHIEN LEE Avant! Corporation In high-level synthesis,

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Instruction Subsetting: Trading Power for Programmability

Instruction Subsetting: Trading Power for Programmability Instruction Subsetting: Trading Power for Programmability William E. Dougherty, David J. Pursley, Donald E. Thomas [wed,pursley,thomas]@ece.cmu.edu Department of Electrical and Computer Engineering Carnegie

More information

Trace-driven System-level Power Evaluation of Systemon-a-chip

Trace-driven System-level Power Evaluation of Systemon-a-chip Trace-driven System-level Power Evaluation of Systemon-a-chip Peripheral Cores Tony D. Givargis, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside, CA 92521

More information

Improving the Efficiency of Monte Carlo Power Estimation

Improving the Efficiency of Monte Carlo Power Estimation 584 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Improving the Efficiency of Monte Carlo Power Estimation Chih-Shun Ding, Cheng-Ta Hsieh, and Massoud Pedram

More information

High-level Variable Selection for Partial-Scan Implementation

High-level Variable Selection for Partial-Scan Implementation High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

A Three-Step Approach to the Functional Partitioning of Large Behavioral Processes

A Three-Step Approach to the Functional Partitioning of Large Behavioral Processes A Three-Step Approach to the Functional Partitioning of Large Behavioral Processes Frank Vahid Department of Computer Science and Engineering University of California, Riverside, CA 92521 Abstract Earlier

More information

Efficient Power Estimation Techniques for HW/SW Systems

Efficient Power Estimation Techniques for HW/SW Systems Efficient Power Estimation Techniques for HW/SW Systems Marcello Lajolo Anand Raghunathan Sujit Dey Politecnico di Torino NEC USA, C&C Research Labs UC San Diego Torino, Italy Princeton, NJ La Jolla, CA

More information

A First-step Towards an Architecture Tuning Methodology for Low Power

A First-step Towards an Architecture Tuning Methodology for Low Power A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Department of Computer Science and Engineering University of California, Riverside {gstitt,

More information

ED&TC 97 on CD-ROM Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided

ED&TC 97 on CD-ROM Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided Accurate High Level Datapath Power Estimation James E. Crenshaw and Majid Sarrafzadeh Department of Electrical and Computer Engineering Northwestern University, Evanston, IL 60208 Abstract The cubic switching

More information

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico

More information

Low Power Bus Binding Based on Dynamic Bit Reordering

Low Power Bus Binding Based on Dynamic Bit Reordering Low Power Bus Binding Based on Dynamic Bit Reordering Jihyung Kim, Taejin Kim, Sungho Park, and Jun-Dong Cho Abstract In this paper, the problem of reducing switching activity in on-chip buses at the stage

More information

Power Efficient Arithmetic Operand Encoding

Power Efficient Arithmetic Operand Encoding Power Efficient Arithmetic Operand Encoding Eduardo Costa, Sergio Bampi José Monteiro UFRGS IST/INESC P. Alegre, Brazil Lisboa, Portugal ecosta,bampi@inf.ufrgs.br jcm@algos.inesc.pt Abstract This paper

More information

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique P. Durga Prasad, M. Tech Scholar, C. Ravi Shankar Reddy, Lecturer, V. Sumalatha, Associate Professor Department

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Debatosh Debnath Dept. of Computer Science & Engineering Oakland University, Rochester Michigan 48309, U.S.A. debnath@oakland.edu

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier

CHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier CHAPTER 3 METHODOLOGY 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier The design analysis starts with the analysis of the elementary algorithm for multiplication by

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

A New Optimal State Assignment Technique for Partial Scan Designs

A New Optimal State Assignment Technique for Partial Scan Designs A New Optimal State Assignment Technique for Partial Scan Designs Sungju Park, Saeyang Yang and Sangwook Cho The state assignment for a finite state machine greatly affects the delay, area, and testabilities

More information

ALGORITHM FOR POWER MINIMIZATION IN SCAN SEQUENTIAL CIRCUITS

ALGORITHM FOR POWER MINIMIZATION IN SCAN SEQUENTIAL CIRCUITS ALGORITHM FOR POWER MINIMIZATION IN SCAN SEQUENTIAL CIRCUITS 1 Harpreet Singh, 2 Dr. Sukhwinder Singh 1 M.E. (VLSI DESIGN), PEC University of Technology, Chandigarh. 2 Professor, PEC University of Technology,

More information

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

Static Compaction Techniques to Control Scan Vector Power Dissipation

Static Compaction Techniques to Control Scan Vector Power Dissipation Static Compaction Techniques to Control Scan Vector Power Dissipation Ranganathan Sankaralingam, Rama Rao Oruganti, and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays

The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays The Impact of Pipelining on Energy per Operation in Field-Programmable Gate Arrays Steven J.E. Wilton 1, Su-Shin Ang 2 and Wayne Luk 2 1 Dept. of Electrical and Computer Eng. University of British Columbia

More information

Reduced Delay BCD Adder

Reduced Delay BCD Adder Reduced Delay BCD Adder Alp Arslan Bayrakçi and Ahmet Akkaş Computer Engineering Department Koç University 350 Sarıyer, İstanbul, Turkey abayrakci@ku.edu.tr ahakkas@ku.edu.tr Abstract Financial and commercial

More information

At-Speed Scan Test with Low Switching Activity

At-Speed Scan Test with Low Switching Activity 21 28th IEEE VLSI Test Symposium At-Speed Scan Test with Low Switching Activity Elham K. Moghaddam Department of ECE, University of Iowa, Iowa City, IA 52242 ekhayatm@engineering.uiowa.edu Janusz Rajski

More information

Design and Low Power Implementation of a Reorder Buffer

Design and Low Power Implementation of a Reorder Buffer Design and Low Power Implementation of a Reorder Buffer J.D. Fisher, C. Romo, E. John, W. Lin Department of Electrical and Computer Engineering, University of Texas at San Antonio One UTSA Circle, San

More information

PICo Embedded High Speed Cache Design Project

PICo Embedded High Speed Cache Design Project PICo Embedded High Speed Cache Design Project TEAM LosTohmalesCalientes Chuhong Duan ECE 4332 Fall 2012 University of Virginia cd8dz@virginia.edu Andrew Tyler ECE 4332 Fall 2012 University of Virginia

More information

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Anurag Tiwari and Karen A. Tomko Department of ECECS, University of Cincinnati Cincinnati, OH 45221-0030, USA {atiwari,

More information

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering A Review: Design of 16 bit Arithmetic and Logical unit using Vivado 14.7 and Implementation on Basys 3 FPGA Board Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor,

More information

ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III

ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III ECE 587 Hardware/Software Co-Design Spring 2018 1/28 ECE 587 Hardware/Software Co-Design Lecture 23 Hardware Synthesis III Professor Jia Wang Department of Electrical and Computer Engineering Illinois

More information

Evaluating Power Consumption of Parameterized Cache and Bus Architectures in System-on-a-Chip Designs

Evaluating Power Consumption of Parameterized Cache and Bus Architectures in System-on-a-Chip Designs 500 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 Evaluating Power Consumption of Parameterized Cache and Bus Architectures in System-on-a-Chip Designs Tony

More information

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient ISSN (Online) : 2278-1021 Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient PUSHPALATHA CHOPPA 1, B.N. SRINIVASA RAO 2 PG Scholar (VLSI Design), Department of ECE, Avanthi

More information

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, BC, Canada, V6T

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs A Technology Backgrounder Actel Corporation 955 East Arques Avenue Sunnyvale, California 94086 April 20, 1998 Page 2 Actel Corporation

More information

Register Transfer Level in Verilog: Part I

Register Transfer Level in Verilog: Part I Source: M. Morris Mano and Michael D. Ciletti, Digital Design, 4rd Edition, 2007, Prentice Hall. Register Transfer Level in Verilog: Part I Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

Specification (VHDL) Specification (VHDL) 2.2 Functional partitioning. Spec1 (VHDL) Spec2 (VHDL) Behavioral Synthesis (MEBS)

Specification (VHDL) Specification (VHDL) 2.2 Functional partitioning. Spec1 (VHDL) Spec2 (VHDL) Behavioral Synthesis (MEBS) A Comparison of Functional and Structural Partitioning Frank Vahid Thuy Dm Le Yu-chin Hsu Department of Computer Science University of California, Riverside, CA 92521 vahid@cs.ucr.edu Abstract Incorporating

More information

RTL LEVEL POWER OPTIMIZATION OF ETHERNET MEDIA ACCESS CONTROLLER

RTL LEVEL POWER OPTIMIZATION OF ETHERNET MEDIA ACCESS CONTROLLER RTL LEVEL POWER OPTIMIZATION OF ETHERNET MEDIA ACCESS CONTROLLER V. Baskar 1 and K.V. Karthikeyan 2 1 VLSI Design, Sathyabama University, Chennai, India 2 Department of Electronics and Communication Engineering,

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

Vdd Programmability to Reduce FPGA Interconnect Power

Vdd Programmability to Reduce FPGA Interconnect Power Vdd Programmability to Reduce FPGA Interconnect Power Fei Li, Yan Lin and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 ABSTRACT Power is an increasingly important

More information

Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams

Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams Daniel Gomez-Prado Dusung Kim Maciej Ciesielski Emmanuel Boutillon 2 University of Massachusetts Amherst, USA. {dgomezpr,ciesiel,dukim}@ecs.umass.edu

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm Atish A. Peshattiwar & Tejaswini G. Panse Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, E-mail : atishp32@gmail.com,

More information

Design of memory efficient FIFO-based merge sorter

Design of memory efficient FIFO-based merge sorter LETTER IEICE Electronics Express, Vol.15, No.5, 1 11 Design of memory efficient FIFO-based merge sorter Youngil Kim a), Seungdo Choi, and Yong Ho Song Department of Electronics and Computer Engineering,

More information

A Controller Testability Analysis and Enhancement Technique

A Controller Testability Analysis and Enhancement Technique A Controller Testability Analysis and Enhancement Technique Xinli Gu Erik Larsson, Krzysztof Kuchinski and Zebo Peng Synopsys, Inc. Dept. of Computer and Information Science 700 E. Middlefield Road Linköping

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic A Novel Design of High Speed and Area Efficient De-Multiplexer Using Pass Transistor Logic K.Ravi PG Scholar(VLSI), P.Vijaya Kumari, M.Tech Assistant Professor T.Ravichandra Babu, Ph.D Associate Professor

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Abstract With the growing use of cluster systems in file

More information

Design and Development of Vedic Mathematics based BCD Adder

Design and Development of Vedic Mathematics based BCD Adder International Journal of Applied Information Systems (IJAIS) ISSN : 229-0868 Volume 6 No. 9, March 201 www.ijais.org Design and Development of Vedic Mathematics based BCD Adder C. Sundaresan School of

More information

Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design

Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design International Journal of Engineering Research and General Science Volume 2, Issue 3, April-May 2014 Performance Evaluation of Guarded Static CMOS Logic based Arithmetic and Logic Unit Design FelcyJeba

More information

Stacked FSMD: A New Microarchitecture Model for High-Level Synthesis. Khushwinder Singh Jasrotia

Stacked FSMD: A New Microarchitecture Model for High-Level Synthesis. Khushwinder Singh Jasrotia Stacked FSMD: A New Microarchitecture Model for High-Level Synthesis by Khushwinder Singh Jasrotia A thesis submitted in conformity with the requirements for the Degree of Master of Applied Science in

More information

Low-Power Technology for Image-Processing LSIs

Low-Power Technology for Image-Processing LSIs Low- Technology for Image-Processing LSIs Yoshimi Asada The conventional LSI design assumed power would be supplied uniformly to all parts of an LSI. For a design with multiple supply voltages and a power

More information

Power Optimization in FPGA Designs

Power Optimization in FPGA Designs Mouzam Khan Altera Corporation mkhan@altera.com ABSTRACT IC designers today are facing continuous challenges in balancing design performance and power consumption. This task is becoming more critical as

More information

LOW POWER FPGA IMPLEMENTATION OF REAL-TIME QRS DETECTION ALGORITHM

LOW POWER FPGA IMPLEMENTATION OF REAL-TIME QRS DETECTION ALGORITHM LOW POWER FPGA IMPLEMENTATION OF REAL-TIME QRS DETECTION ALGORITHM VIJAYA.V, VAISHALI BARADWAJ, JYOTHIRANI GUGGILLA Electronics and Communications Engineering Department, Vaagdevi Engineering College,

More information

CAD Technology of the SX-9

CAD Technology of the SX-9 KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package

More information

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007

Minimizing Power Dissipation during. University of Southern California Los Angeles CA August 28 th, 2007 Minimizing Power Dissipation during Write Operation to Register Files Kimish Patel, Wonbok Lee, Massoud Pedram University of Southern California Los Angeles CA August 28 th, 2007 Introduction Outline Conditional

More information

RTL Scan Design for Skewed-Load At-Speed Test under Power Constraints

RTL Scan Design for Skewed-Load At-Speed Test under Power Constraints RTL Scan Design for Skewed-Load At-Speed Test under Power Constraints Ho Fai Ko and Nicola Nicolici Department of Electrical and Computer Engineering McMaster University, Hamilton, ON, L8S 4K1, Canada

More information

A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power

A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Department of Computer Science and Engineering University of California, Riverside http://www.cs.ucr.edu/~vahid

More information

Designing Heterogeneous FPGAs with Multiple SBs *

Designing Heterogeneous FPGAs with Multiple SBs * Designing Heterogeneous FPGAs with Multiple SBs * K. Siozios, S. Mamagkakis, D. Soudris, and A. Thanailakis VLSI Design and Testing Center, Department of Electrical and Computer Engineering, Democritus

More information

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol SIM 2011 26 th South Symposium on Microelectronics 167 A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol 1 Ilan Correa, 2 José Luís Güntzel, 1 Aldebaro Klautau and 1 João Crisóstomo

More information

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE Anni Benitta.M #1 and Felcy Jeba Malar.M *2 1# Centre for excellence in VLSI Design, ECE, KCG College of Technology, Chennai, Tamilnadu

More information

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder Syeda Mohtashima Siddiqui M.Tech (VLSI & Embedded Systems) Department of ECE G Pulla Reddy Engineering College (Autonomous)

More information

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis*

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* R. Ruiz-Sautua, M. C. Molina, J.M. Mendías, R. Hermida Dpto. Arquitectura de Computadores y Automática Universidad Complutense

More information

Low-Power FIR Digital Filters Using Residue Arithmetic

Low-Power FIR Digital Filters Using Residue Arithmetic Low-Power FIR Digital Filters Using Residue Arithmetic William L. Freking and Keshab K. Parhi Department of Electrical and Computer Engineering University of Minnesota 200 Union St. S.E. Minneapolis, MN

More information

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS 1 B.SARGUNAM, 2 Dr.R.DHANASEKARAN 1 Assistant Professor, Department of ECE, Avinashilingam University, Coimbatore 2 Professor & Director-Research,

More information

Shift Invert Coding (SINV) for Low Power VLSI

Shift Invert Coding (SINV) for Low Power VLSI Shift Invert oding (SINV) for Low Power VLSI Jayapreetha Natesan* and Damu Radhakrishnan State University of New York Department of Electrical and omputer Engineering New Paltz, NY, U.S. email: natesa76@newpaltz.edu

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

VHDL Implementation of High-Performance and Dynamically Configured Multi- Port Cache Memory

VHDL Implementation of High-Performance and Dynamically Configured Multi- Port Cache Memory 2010 Seventh International Conference on Information Technology VHDL Implementation of High-Performance and Dynamically Configured Multi- Port Cache Memory Hassan Bajwa, Isaac Macwan, Vignesh Veerapandian

More information

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca Datapath Allocation Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca e-mail: baruch@utcluj.ro Abstract. The datapath allocation is one of the basic operations executed in

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

DESIGN AND IMPLEMENTATION OF BIT TRANSITION COUNTER

DESIGN AND IMPLEMENTATION OF BIT TRANSITION COUNTER DESIGN AND IMPLEMENTATION OF BIT TRANSITION COUNTER Amandeep Singh 1, Balwinder Singh 2 1-2 Acadmic and Consultancy Services Division, Centre for Development of Advanced Computing(C-DAC), Mohali, India

More information

Synthesis of Combinational and Sequential Circuits with Verilog

Synthesis of Combinational and Sequential Circuits with Verilog Synthesis of Combinational and Sequential Circuits with Verilog What is Verilog? Hardware description language: Are used to describe digital system in text form Used for modeling, simulation, design Two

More information

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

EECS150 - Digital Design Lecture 10 Logic Synthesis

EECS150 - Digital Design Lecture 10 Logic Synthesis EECS150 - Digital Design Lecture 10 Logic Synthesis February 13, 2003 John Wawrzynek Spring 2003 EECS150 Lec8-synthesis Page 1 Logic Synthesis Verilog and VHDL started out as simulation languages, but

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN ISSN 0976 6464(Print)

More information

INTERCONNECT TESTING WITH BOUNDARY SCAN

INTERCONNECT TESTING WITH BOUNDARY SCAN INTERCONNECT TESTING WITH BOUNDARY SCAN Paul Wagner Honeywell, Inc. Solid State Electronics Division 12001 State Highway 55 Plymouth, Minnesota 55441 Abstract Boundary scan is a structured design technique

More information