Design of a clockless MSP430 core using mixed asynchronous design flow

Size: px
Start display at page:

Download "Design of a clockless MSP430 core using mixed asynchronous design flow"

Transcription

1 LETTER IEICE Electronics Express, Vol.14, No.8, 1 12 Design of a clockless MSP430 core using mixed asynchronous design flow Ziho Shin 1,3a), Myeong-Hoon Oh 1,3b), Jeong-Gun Lee 2, Hag Young Kim 3, and Young Woo Kim 1,3 1 Dept. of Computer SW, University of Science and Technology (UST), 217, Gajeong-ro, Yuseong-gu, Daejeon, 34113, Republic of Korea 2 Dept. of Computer Engineering, Hallym University, 1, Hallimdaehak-gil, Chuncheon-si, Gangwon-do, 24252, Republic of Korea 3 Cloud Computing Research Group, Electronics and Telecommunication Research Institute (ETRI), 218, Gajeong-ro, Yuseong-gu, Daejeon, 34129, Republic of Korea a) zshin@ust.ac.kr b) mhoonoh@etri.re.kr, Corresponding Author Abstract: There are various limitations on the supporting tools and design methodologies for the implementation of an asynchronous delay-insensitive model. In this paper, we propose a new design flow by exploiting a mixed model, which combines a bounded delay model and a delay-insensitive model. To develop the design flow, we use an asynchronous finite-state machine for the bounded delay model and the null convention logic for the delay-insensitive model. Further, we designed an MSP430 core to verify the proposed design flow and the results of simulation show that it exhibits a performance improvement of 30.34% over its synchronous counterpart. Keywords: asynchronous circuit, AFSM, NCL, UNCLE, delay insensitive, bounded delay Classification: Integrated circuits References [1] Y. I. Ismail: Interconnect design and limitations in nanoscale technologies, IEEE ISCAS (2008) 780 (DOI: /ISCAS ). [2] C. J. Anderson, et al.: Physical design of a fourth-generation POWER GHz microprocessor, Proc. ISSCC2001 (2001) 232 (DOI: /ISSCC ). [3] J. Sparso and S. Furber: Principles of Asynchronous Circuit Design A Systems Perspective (Springer US, New York, 2001). [4] K. M. Fant: Logically Determined Design Clockless System Design with Null Convention Logic (John Wiley & Sons, Hoboken, 2005). [5] R. B. Reese, et al.: Uncle-An RTL approach to asynchronous design, Proc. 18th ASYNC (2012) 65 (DOI: /ASYNC ). [6] G. De Micheli: Synchronous logic synthesis: Algorithms for cycle-time minimization, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 10 (1991) 63 (DOI: / ). 1

2 [7] P.-H. Ho: Industrial clock synthesis, ISPD (2009) (DOI: / ). [8] E. G. Friedman: Clock distribution network in synchronous digital integrated circuits, Proc. IEEE 89 (2001) 665 (DOI: / ). [9] M.-H. Oh, et al.: Architectural design issues in a clockless 32 Bit processor using an asynchronous HDL, ETRI J. 35 (2013) 480 (DOI: /etrij ). [10] C. J. Myers: Asynchronous Circuit Design (John Wiley & Sons, New York, 2001) 88. [11] K. Y. Yun: Synthesis of asynchronous controllers for heterogeneous systems, Ph.D Dissertation, Stanford University (1994). [12] S. M. Nowick: Automatic synthesis of burst-mode asynchronous controllers, Ph.D. Dissertation, Stanford University (1994). [13] R. B. Reese: Uncle (Unified NCL Environment) (Mississippi State University, 2011). [14] A. Kondratyev and K. Lwin: Design of asynchronous circuits using synchronous CAD tools, IEEE Des. Test Comput. 19 (2002) 107 (DOI: /MDT ). [15] Z. Xia, et al.: An asynchronous FPGA based on dual/single-rail hybrid architecture, Proc. ERSA (2012) 139. [16] P. A. Beerel, et al.: Proteus: An ASIC flow for GHz asynchronous designs, IEEE Des. Test Comput. 28 (2011) 36 (DOI: /MDT ). [17] C. A. R. Hoare: Communicating sequential processes, Commun. ACM 21 (1978) 666 (DOI: / ). [18] R. O. Ozdag and P. A. Beerel: High-speed QDI asynchronous pipelines, Proc. 8th ASYNC (2002) 13 (DOI: /ASYNC ). [19] Texas Instrument: MSP430x2xx Family User s Guide (Texas Instrument, 2013). [20] M.-H. Oh, et al.: Design of low-power asynchronous MSP430 processor core using AFSM based controllers, Proc. 23rd ITC-CSCC (2008) [21] L. Nazhandali, et al.: SenseBench: Toward an accurate evaluation of sensor network processor, Proc. IEEE Int. Symp. Workload Characterization (2005) 197 (DOI: /IISWC ). 1 Introduction Traditional synchronous circuit designs have limitations owing to the single global clock signal, such as the large power consumption of the clock network, performance degradation caused by clock skews, and meta-stability problems arising from the multiple clock domains [1]. On the other hand, since asynchronous circuits have no inherent global clock, they are fundamentally free from the limitations of the synchronous circuits. Instead of a global clock, an asynchronous circuit can guarantee its functional correctness using a distributed handshake control for localized synchronization. Moreover, the power consumption of a processor can be reduced due to the elimination of the clock network, which accounts for 70% of the entire power consumption [2]. Therefore, the asynchronous circuit has low-power characteristics. Additionally, the asynchronous circuit emits less electro-magnetic noise (EMI), because it does not use the globalized common and periodic control signal. 2

3 Asynchronous circuit designs employ a handshake protocol to transfer data between their internal modules. The handshake protocol uses a request (Req) signal, which indicates data validity, and an acknowledgement (Ack) signal, which represents the completion of data delivery. According to the utilization of the two control signals, handshake protocols can be classified into two categories: four-phase signaling and two-phase signaling [3]. A four-phase signaling protocol uses only the rising edges of the control signals to synchronize neighboring modules and needs a return-to-zero. A two-phase signaling protocol utilizes both the rising and falling edges of the control signals to perform the handshake protocol. Since this type of signaling does not need to have a return-to-zero phase, hypothetically, it can result in higher performance when compared to a four-phase signaling protocol. However, the two-phase signaling protocol has the disadvantage of design complexity. Further, asynchronous circuits are categorized into various delay models, such as the bounded delay model, speed-independent model, and delay-insensitive model [3]. The operation of the bounded delay model is similar to that of a synchronous circuit. In order to implement the bounded delay model, the delays of gates and wires should be calculated, and the worst-case scenarios should be analyzed. Thus, this model can utilize the data path circuits of the synchronous design without any modifications. Since the speed-independent model does not need to consider the delays of wires and delays of gates are modeled as unbounded, it can show much asynchrony than the bounded delay model. Nonetheless, the speed-independent model has a limitation: When multiple input changes occur, proper control inputs should be selected. Hence, this model has increased design complexity. The delay-insensitive model is ideal for asynchronous circuits. Unbounded delays of wires and gates are assumed. This model employs a multi-bit data encoding scheme to detect the completion of data dependent operations; however, this scheme can cause area overhead, when a designer tries to implement this model. Nevertheless, the delay-insensitive model provides operational stability under the variations of process, voltage, and temperature and furthermore the operating time of the circuit depends on the applied data. The NULL Convention Logic (NCL) [4] has been introduced as a technique for the implementation of circuits using this model. An NCL-based circuit, which uses a four-phase signaling protocol, inserts a NULL state between the transferred data as a spacer; hence, it is called NULL convention logic. To represent the NULL state and data (logic high 1 or logic low 0 ), the NCL utilizes dual-rail or quad-rail schemes instead of a single-rail scheme. In the above mentioned NCL-based circuit, a Req signal is encoded and embedded into the data line based on a dual-rail scheme and an Ack signal is received through the Ack network when the request has been processed. Additionally, with the primitive NCL (threshold) gates introduced in [4], every Boolean equations can be expressed using the gates owing to their universal characteristics. Therefore, designers can implement delay-insensitive circuits using any Boolean equation with the set of the primitive NCL gates. 3

4 The Unified NULL Convention Logic Environment (Uncle) has been introduced as an open-source tool set, which supports the implementation of NCL circuits [5]. In this paper, we propose a new design flow, which combines a bounded delay model and a delay-insensitive model. Further, we designed an MSP430 core using the proposed design flow. Finally, the performance of the designed core is evaluated and compared to that of a synchronous counterpart. This paper is organized as follows. Section 2 presents the conventional asynchronous circuit design methodologies. In Section 3, we propose a new design flow using a mixed delay model. Section 4 describes the design and implementation of an MSP430 core using the proposed design flow. Section 5 describes the simulation environment and analyzes the simulation results. Finally, in Section 6, the conclusion and future works are presented. 2 Related work The design of a synchronous circuit focuses mainly on the optimization of a sequential logic based on clock cycle time and the optimization of clock networks [6, 7, 8], whereas, the design of an asynchronous circuit uses graph theories for the optimization of a control logic. Furthermore, since asynchronous circuit does not have globalized control signal, it should consider the hazard or race conditions on the architectural level of view [9]. An Asynchronous Finite-State Machine (AFSM) [10] is similar to a Mealy-type FSM, whose output values depend on both the current state and the current inputs. The AFSM defines the changes of states according to the inputs rather than the signal transitions. Therefore, the AFSM has a restriction: it should be settled into the new state before the next input changes. This AFSM can be utilized for the bounded delay model. In order to support the synthesis of the AFSM design, 3D [11] and MINIMALIST [12] tools have been developed. The 3D tool supports conditional branches and a directed don t care state to eliminate the design constraints [11]. As a tool for the delay-insensitive model, Uncle [13] has been developed as an open-source tool set for the design of NCL circuits and it guarantees a self-timed operation of the circuits derived from the model. In [14], Kondratyev introduced the design of asynchronous circuits using synchronous CAD Tools based on the NCL. However, [14] does not provide automated synthesis of Ack signal generation and simulation methodology for the generated netlists. Moreover, another drawback of [14], it is currently unavailable to use. In the research from Xia group [15], they suggested hybrid architecture for interfacing between single-rail and delay-insensitive dual-rail circuits. They have applied the dual-rail encoding idea to the critical path. However, in their work, they did not suggest an NCL based dual-rail as well as they did not apply any handshake protocol to single-rail data path. Proteus project was also introduced as a design flow for a delay insensitive model [16]. The Proteus provides high-level language interface which is a translator for Communication Sequential Process (CSP) [17]. The Proteus focuses on the 4

5 Fig. 1. The UNCLE, NCL design flow dual-rail domino logic based on pre-charged half-buffer (PCHB) custom cells, in order to get high-performance [18]. Thus, their approach is suitable for the fullcustom design. Additionally, it is not an open-source tool. Since their work requires full custom cells and their tool is not open to public, their work has a disadvantage on the design flexibility. The Uncle also provides an automatic mapping function, which translates the register transfer level (RTL)-based design to the NCL gates netlist. The concept of the design flow of the Uncle is shown in Fig. 1. When a designer inputs an RTL code into the Uncle, it initiates the conventional synthesis CAD tool to translate the RTL codes to a single-rail and-or-not netlist. Consequently, the Uncle expands the single-rail netlist to its dual-rail version; then, the resulting dual-rail and-or-not netlist, which is composed of the predefined primitive gates from the Uncle, is mapped to the NCL gates. This predefined primitive gate library is called andor2.db. Since the Uncle tool only supports andor2.db that is dedicated to the NCL designs, the library includes only small number of primitive gates that are required for synthesizing NCL circuits. In consequence, owing to this limitation, the use of the library could affect design flexibility and it might lead to performance degradation in other types of circuit design. Subsequently, the Uncle generates Ack networks and verifies their validity. Finally, the Uncle runs simulation using a dedicated simulator called Uncle_sim. After finalizing the mapping process, the Uncle adds registers on the input and output sides to guarantee the delay-insensitivity of the generated NCL netlist. Because of this process, a designer has to insert a global clock and reset signals deliberately before the Uncle mapping process. If an original RTL design does not include both the global clock and reset signals, the Uncle cannot generate the NCL netlist. The netlist generated by the Uncle can be applied only to the data path, since the Uncle does not consider the interaction with the control path of an existing system; this is the disadvantage of using the Uncle. As another drawback of the Uncle, it supports the translation of only one module at a time. Therefore, if a designer intends to translate a design that is composed of multiple sub-modules, each sub-module design should be mapped one-by-one and the interconnections between the translated modules should be made manually. Consequently, the Uncle does not have a solution for congeniality between the existing data paths of a single-rail design and an NCL-based dual-rail design. Furthermore, it is not easy to verify the consistency of the circuit functionality at each step of design using the Uncle_sim simulator, because it only supports the simulation of the circuit netlist produced at the final step of design. 5

6 In this paper, in order to mitigate the aforementioned disadvantages of the Uncle, we propose a design flow for the asynchronous mixed delay model with three new beneficial features: 1. Support for a mixed delay model: In our proposed flow, the AFSM design methodology is employed to support a bounded delay model for single rail control circuit design while the NCL-based Uncle flow is used to support a delay insensitive model. Afterwards, we forward the control signal of the NCL Ack network to the four-phase AFSM handshake protocol. Finally, we utilize C-elements to synchronize the communications between the delay-insensitive data path and the control path. 2. Data path interfacing: In order to support the communication between the data path and the control path, we design the translation logic of the data path to ensure compatibility with the existing system. This translation logic can support the interface between the NCL-based dual-rail data path and the data path of a single-rail scheme. 3. Timing simulation environment and verification method: We modify the command script of the Uncle to generate a Standard Delay Format (SDF) file and we write an SDF annotated Verilog simulation model for the purpose of timing simulation over the conventional CAD tool. 3 Suggested design flow 3.1 Control path design The Uncle aims to support a data-driven style of design. Therefore, the output from the Uncle does not consider the interaction with the control path from an existing control-driven style of design. In this paper, we suggest a combination of NCL-based delay-insensitive data path and AFSM-based control path as shown in Fig. 2(a). The matched delay cells in the control path are required no longer when the control path are combined with the NCL based delay-insensitive data path and those delay cells should be eliminated to maintain the self-timed characteristics obtained from the NCL-based delay-insensitive data path. Further, the Ack signal from the NCL Ack network should be connected to the AFSM-based handshake protocol in order to facilitate the communication between the NCL-based data path and the AFSM-based control path. To achieve the stability of the control signal, we insert the C-element, which is described in Fig. 2(b). (a) (b) Fig. 2. Control path structure: (a) AFSM control path (b) AFSM control path and NCL based data path signaling method 6

7 Fig. 3. Data path translation structure Fig. 4. Mixed signaling with SDTL and DSTL 3.2 Data path translation The Uncle uses a dual-rail data path design for the implementation of the delayinsensitive model. However, existing data paths are mostly designed using a singlerail scheme. Moreover, the Uncle does not support the automatic mapping of multiple modules simultaneously. Owing to this limitation, the translation logic between the single-rail to dual-rail (STD) and dual-rail to single-rail (DTS) schemes are needed to provide the harmonious composition of the heterogeneous circuit styles. Fig. 3 shows the designed translation logic at the level of abstraction, i.e., a single-rail to dual-rail translation logic (SDTL) and a dual-rail to single-rail translation logic (DSTL). Additionally, when encoding the data, since an NCLbased system has NULL states, the SDTL and the DSTL should include circuits for capturing and generating a NULL state. The integration of the overall method is presented in Fig. 4 with detailed circuit structures of SDTL and DSTL. 3.3 Functional simulation methodology The Uncle_sim is used for the functional simulation of the NCL-based netlist. However, it does not support a timing simulation environment at each step of design refinement; it only supports the simulation of the netlists produced at the final step. Therefore, if a designer faces a functional error in the final step, it is not possible to simulate the intermediate netlist from each step of design. Therefore, simulation methodology using conventional CAD tools is required. The command script of the Uncle is modified to generate the SDF file so as to ensure compatibility between the conventional CAD tools and the Uncle. Subsequently, the SDF file is annotated into the Verilog simulation model. The functional simulation flow of an NCL-based circuit using the conventional CAD tool is as follow: When the Uncle produces the NCL-based netlist, the designer re-synthesizes the netlist through the andor2.db gate library, which is provided by the Uncle for translating the NCL gates to andor2.db-based gates. Consequently, the designer can obtain the SDF file for the NCL gates that are implemented using the andor2.db library. Finally, the SDF annotation of the synthesized netlist is performed by writing a Verilog simulation model and running the simulation using the conventional CAD tools. 7

8 Fig. 5. Proposed asynchronous mixed delay model design flow Fig. 6. Control flow of the MSP430 core To summarize, Fig. 5 represents the suggested design flow of the asynchronous mixed delay model that integrates design flow of the Uncle with the above mentioned three new features. In Section 4, we describe the design of a 16-bit processor core using the proposed design flow. 4 Processor architecture & design 4.1 Overview of TI MSP430 MSP430 [19] is a 16-bit processor, which has applications in fields like Internet of Things (IoT) as a low-power microcontroller (MCU). The MSP430 provides a relatively simple instruction set architecture (ISA) and low-power characteristics with an open-compiling environments. The MSP430 core executes 27 reduced instruction set computer (RISC)-type instructions and it supports 7 addressing modes. The 27 supported instructions can be categorized using the number of operands they use: dual-operand (Instruction Group II, 2 operands), single-operand (Instruction Group I, 1 operand), and jumps (Instruction Group III, 0 operand). Theoretically, every instruction can use all the addressing modes without any restriction; therefore, there can be a smaller code size for building various functions as compared to other MCUs. 4.2 Architecture for MSP430 Complex instruction set computer (CISC) architecture is suitable for the MSP430 core [20] in order to support various addressing modes and various opcode sizes for each instruction. The CISC-based MSP430 architecture can utilize the data path flexibly depending on the given instructions and addressing modes. Fig. 6 shows the suggested control flow of the MSP430 core. When an instruction is loaded into the core, it is decoded. Afterwards, three different paths are determined according to the opcode and addressing mode. Further, the MSP430 core executes the instruction through the determined path. After the execution, the result is written back to the register and finalized. 8

9 Fig. 7. The MSP430 data path The control flow is composed of five steps: instruction fetch and decode state (IFID), source fetch (OF1), destination fetch (OF2), zero operand instruction group (Jump), and write back stage (EXWB). The suggested block diagram of the data path for the MSP430 core is shown in Fig. 7 and it represents the groups in accordance with the control flow presented in Fig. 6. When an instruction is given to the IFID module, it is decoded into the arithmetic and logical unit (ALU) Opcode, SRC_index, DST_index, jump offset, and addressing mode. These are used to generate the control signal of each multiplexer and the indexes indicate where data come from and store to. Then, the data is sent to the OF1, OF2, jump, and EXWB modules. Subsequently, the input data and the instruction are processed through various data paths according to the corresponding instruction and addressing mode. 4.3 Implementation In this paper, we designed the MSP430 core in three different ways: 1. The proposed NCL and AFSM based mixed delay model asynchronous core using the suggested design flow as shown in Fig. 5 (NCL+AFSM). 2. AFSM-based bounded delay model asynchronous core (AFSM) 3. Synchronous core (SYNC) These three cores are synthesized into the gate level using the andor2.db gate library provided from the Uncle for fair comparison. In order to improve the performance of the NCL+AFSM version, the ALU is designed in a delay insensitive style with the Uncle, since the ALU is one of the data path module showing most data-dependent processing time in the MSP430 core. Through the delay insensitive implementation of the ALU, average case performance can be obtained. On the other hand, the control path is designed using the AFSM with a four-phase handshake protocol to handle communication with the data path. To eliminate the excessive restrictions on concurrent operations in the bounded delay model based AFSM control path, the 3D tool is utilized for the logic synthesis. The handshake control signal for the AFSM-based control path, except for the self-timed ALU, regulates the worst-case timing for corresponding to the data path sub-modules, by using the matched delay cells. These delay cells were used to provide the design margin for safe operation. Further, in order to meet setup/hold time constraints for the latches, the matched delay cells are restricted not to be optimized on the synthesis process. A C-element was used between the AFSM control path and the NCL Ack network and for achieving the average-case datadependent computation time of the self-timed ALU. During the design of the data path, except for the ALU, the remaining parts follow the synchronous design methodology; hence, they are designed as a bounded delay model. 9

10 Fig. 8. Designed core architecture: Asynchronous mixed delay model Fig. 8 represents the architecture of the designed NCL+AFSM version of the core. In order to provide the interface between the self-timed ALU, which is based on a dual-rail scheme and other parts of the data path, which are based on a singlerail scheme, the SDTL and DSTL described in section 3.2 are inserted into the boundary between single-rail circuits and dual-rail circuits in the self-timed ALU. The core implementation of the AFSM version uses the same control path as the NCL+AFSM version. Further, the data path is simply designed as a single-rail scheme, including the ALU. We inserted the matched delay cells into the control path to manage the timing of the handshake signal. In case of the IFID module, the matched delay is estimated by summing the worst-case delays for the IF part, ID part, and controlling PC part, including a design margin for reliable circuit operations. The matched delays for OF1, OF2, and EXWB are also calculated in the same manner. The SYNC version of the core shares the data path of the AFSM version and utilizes the global clock to manage the FSM of the control path. The clock cycle is determined using the delay of OF2, which is the worst-case module along the entire data path with a design margin. The results of synthesis of the three cores are as follows. The core designed with the proposed mixed delay model occupies more cell area than the synchronous and AFSM cores by approximately 80%, because the designed core employs the SDTL and DSTL for interacting between the data paths of the single-rail and the dual-rail circuits. Further, we did not optimize the synthesis process when we translated the NCL to andor2.db-based netlist for the purpose of guaranteeing the delay-insensitive characteristics from the Uncle and providing fair comparison between 3 cores. 5 Simulation 5.1 Simulation environment Three different versions of the MSP430 core were modeled using Verilog HDL and synthesized at the gate level using the library provided by the Uncle [13] to determine the equivalent simulation behavior of each version. As a basic synthesis tool, we used the Synopsys Design Compiler. This synthesis tool was used for both the data path and control path, only the data path, and the data path except ALU for the SYNC version, AFSM version, and NCL+AFSM version, respectively. The 3D tool was used to synthesize the control path of the AFSM version and NCL+AFSM version. The ALU for the NCL+AFSM version was generated by the Uncle as described in Clause

11 (a) EXWBExecution Rise time to EXWBExcutionDone Rise Time: ALUopcode: 4 AFSM: 15ns, AFSM+NCL: 4.2ns, SYNC clock cycle: 30.60ns (b) EXWBExecution Rise time to EXWBExcutionDone Rise Time: ALUopcode: 2 AFSM: 15ns, AFSM+NCL: 7.38ns, SYNC clock cycle: 30.60ns (c) EXWBExecution Rise time to EXWBExcutionDone Rise Time: ALUopcode: 5 AFSM: 15ns, AFSM+NCL: 14.72ns, SYNC clock cycle: 30.60ns Fig. 9. The waveform from three cores: self-timed feature Then, we annotated the SDF files to Verilog HDL gate level netlists for each version of the core as described in Section 3.3 and performed the timing simulation using the Cadence NC-Verilog. To confirm the functionality and to evaluate the performance of each version, we applied benchmark programs [21], which have been used frequently in IoT services such as networking and sensor data processing. 5.2 Simulation result Fig. 9 shows the captured waveforms of three core design examples in order to focus on the completion time of the EXWB modules (see Fig. 7). In case of the AFSM version, the execution time of the ALU, which is calculated as the delay from the rising time of EXWBExecution (Req) (See Fig. 8) signal to the rising time of EXWBExecutionDone (Ack) (See Fig. 8), is fixed at 15 ns, even if the operation changes. The SYNC version also has an optimized fixed clock cycle of 30.6 ns, which is determined by worst-case timing of OF2 module. Due to the OF2 module has worst-case delay from the entire data path. Meanwhile, in the NCL+AFSM version, the delay varies according to the given instruction (4.2 ns in Fig. 9(a), 7.38 ns in Fig. 9(b)). The maximum delay of the NCL+AFSM version was measured to be ns as shown in Fig. 9(c), which is almost the same as the delay of the AFSM version. However, it is still lower than that of the AFSM version, if the AFSM design margin is considered. Accordingly, it is confirmed that the NCL+AFSM based ALU has a flexible data-dependent delay under a given instruction and data. 11

12 Fig. 10. Benchmark program simulation results Fig. 11. Benchmark program instruction set analysis Fig. 10 presents the completion time of each version during the execution of the four benchmark programs. The NCL+AFSM version shows a performance improvement of 27.02% at least and 34.4% at most when compared to the SYNC version at THOLD and BUF_CRC benchmark programs, respectively. As shown in Fig. 11, the BUF_CRC program is organized over 80% of an arithmetic operation (AR_OP) and a special operation (SP_OP) of the ALU out of the entire program. This result shows that when the benchmark program accesses the self-timed ALU to a maximum extent, the performance of the entire system will be improved accordingly. 6 Conclusion and future works In this paper, we propose a new design flow for the asynchronous mixed delay model: an AFSM for the bounded delay model and the Uncle for the delayinsensitive model. Then, we designed the MSP430 core using the proposed design flow for targeting IoT applications. The proposed design flow can support immaculate interfacing between the dual-rail and single-rail encoded data paths and it can provide communication between the data driven data path and the control path. Additionally, it guarantees the self-timed characteristics obtained from the delay-insensitive model. We verified the self-timed performance through the timing simulation and observed that the designed core exhibits a performance improvement of 30.34% over the synchronous core. In the near future, we will perform static timing analysis in order to check timing constraints with the layout synthesis. In addition, we will implement our design on the FPGA and we will verify the performance and functionality in a real working chip. Acknowledgments This work was supported by the ICT R&D program of MSIP/IITP. [B , Low-power and High-density Micro Server System Development for Cloud Infrastructure] and Basic Science Research Program through the National Research Foundation (2015R1D1A3A ). 12

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

The design of a simple asynchronous processor

The design of a simple asynchronous processor The design of a simple asynchronous processor SUN-YEN TAN 1, WEN-TZENG HUANG 2 1 Department of Electronic Engineering National Taipei University of Technology No. 1, Sec. 3, Chung-hsiao E. Rd., Taipei,10608,

More information

TEMPLATE BASED ASYNCHRONOUS DESIGN

TEMPLATE BASED ASYNCHRONOUS DESIGN TEMPLATE BASED ASYNCHRONOUS DESIGN By Recep Ozgur Ozdag A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

Implementation of ALU Using Asynchronous Design

Implementation of ALU Using Asynchronous Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units

Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units Euiseok Kim, Hiroshi Saito Jeong-Gun Lee Dong-Ik Lee Hiroshi Nakamura Takashi Nanya Dependable

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Design of 8 bit Pipelined Adder using Xilinx ISE

Design of 8 bit Pipelined Adder using Xilinx ISE Design of 8 bit Pipelined Adder using Xilinx ISE 1 Jayesh Diwan, 2 Rutul Patel Assistant Professor EEE Department, Indus University, Ahmedabad, India Abstract An asynchronous circuit, or self-timed circuit,

More information

TIMA Lab. Research Reports

TIMA Lab. Research Reports ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,

More information

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai Abstract: ARM is one of the most licensed and thus widespread processor

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

Synchronization In Digital Systems

Synchronization In Digital Systems 2011 International Conference on Information and Network Technology IPCSIT vol.4 (2011) (2011) IACSIT Press, Singapore Synchronization In Digital Systems Ranjani.M. Narasimhamurthy Lecturer, Dr. Ambedkar

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Scanline-based rendering of 2D vector graphics

Scanline-based rendering of 2D vector graphics Scanline-based rendering of 2D vector graphics Sang-Woo Seo 1, Yong-Luo Shen 1,2, Kwan-Young Kim 3, and Hyeong-Cheol Oh 4a) 1 Dept. of Elec. & Info. Eng., Graduate School, Korea Univ., Seoul 136 701, Korea

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput

Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Designing NULL Convention Combinational Circuits to Fully Utilize Gate-Level Pipelining for Maximum Throughput Scott C. Smith University of Missouri Rolla, Department of Electrical and Computer Engineering

More information

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined

More information

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,

More information

FPGA for Software Engineers

FPGA for Software Engineers FPGA for Software Engineers Course Description This course closes the gap between hardware and software engineers by providing the software engineer all the necessary FPGA concepts and terms. The course

More information

Area-Efficient Design of Asynchronous Circuits Based on Balsa Framework for Synchronous FPGAs

Area-Efficient Design of Asynchronous Circuits Based on Balsa Framework for Synchronous FPGAs Area-Efficient Design of Asynchronous ircuits Based on Balsa Framework for Synchronous FPGAs ERSA 12 Distinguished Paper Yoshiya Komatsu, Masanori Hariyama, and Michitaka Kameyama Graduate School of Information

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING QUESTION BANK NAME OF THE SUBJECT: EE 2255 DIGITAL LOGIC CIRCUITS YEAR / SEM: II / IV UNIT I BOOLEAN ALGEBRA AND COMBINATIONAL

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous

More information

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it.

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it. MODELING LANGUAGES AND ABSTRACT MODELS Giovanni De Micheli Stanford University Chapter 3 in book, please read it. Outline Hardware modeling issues: Representations and models. Issues in hardware languages.

More information

Area Delay Power Efficient Carry-Select Adder

Area Delay Power Efficient Carry-Select Adder Area Delay Power Efficient Carry-Select Adder Pooja Vasant Tayade Electronics and Telecommunication, S.N.D COE and Research Centre, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Abstract -This paper presents an analysis and comparison

Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Abstract -This paper presents an analysis and comparison Investigation and Comparison of Thermal Distribution in Synchronous and Asynchronous 3D ICs Brent Hollosi 1, Tao Zhang 2, Ravi S. P. Nair 3, Yuan Xie 2, Jia Di 1, and Scott Smith 3 1 Computer Science &

More information

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,

More information

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL International Journal of Electronics, Communication & Instrumentation Engineering Research and Development (IJECIERD) ISSN 2249-684X Vol.2, Issue 3 (Spl.) Sep 2012 42-47 TJPRC Pvt. Ltd., VLSI DESIGN OF

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

EE 3170 Microcontroller Applications

EE 3170 Microcontroller Applications EE 3170 Microcontroller Applications Lecture 4 : Processors, Computers, and Controllers - 1.2 (reading assignment), 1.3-1.5 Based on slides for ECE3170 by Profs. Kieckhafer, Davis, Tan, and Cischke Outline

More information

Novel Design of Dual Core RISC Architecture Implementation

Novel Design of Dual Core RISC Architecture Implementation Journal From the SelectedWorks of Kirat Pal Singh Spring May 18, 2015 Novel Design of Dual Core RISC Architecture Implementation Akshatha Rai K, VTU University, MITE, Moodbidri, Karnataka Basavaraj H J,

More information

Low Power GALS Interface Implementation with Stretchable Clocking Scheme

Low Power GALS Interface Implementation with Stretchable Clocking Scheme www.ijcsi.org 209 Low Power GALS Interface Implementation with Stretchable Clocking Scheme Anju C and Kirti S Pande Department of ECE, Amrita Vishwa Vidyapeetham, Amrita School of Engineering Bangalore,

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Advanced FPGA Design Methodologies with Xilinx Vivado

Advanced FPGA Design Methodologies with Xilinx Vivado Advanced FPGA Design Methodologies with Xilinx Vivado Alexander Jäger Computer Architecture Group Heidelberg University, Germany Abstract With shrinking feature sizes in the ASIC manufacturing technology,

More information

The Design of MCU's Communication Interface

The Design of MCU's Communication Interface X International Symposium on Industrial Electronics INDEL 2014, Banja Luka, November 0608, 2014 The Design of MCU's Communication Interface Borisav Jovanović, Dejan Mirković and Milunka Damnjanović University

More information

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER Bhuvaneswaran.M 1, Elamathi.K 2 Assistant Professor, Muthayammal Engineering college, Rasipuram, Tamil Nadu, India 1 Assistant Professor, Muthayammal

More information

NISC Application and Advantages

NISC Application and Advantages NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical

More information

In the previous lecture, we examined how to analyse a FSM using state table, state diagram and waveforms. In this lecture we will learn how to design

In the previous lecture, we examined how to analyse a FSM using state table, state diagram and waveforms. In this lecture we will learn how to design 1 In the previous lecture, we examined how to analyse a FSM using state table, state diagram and waveforms. In this lecture we will learn how to design a fininte state machine in order to produce the desired

More information

In the previous lecture, we examined how to analyse a FSM using state table, state diagram and waveforms. In this lecture we will learn how to design

In the previous lecture, we examined how to analyse a FSM using state table, state diagram and waveforms. In this lecture we will learn how to design In the previous lecture, we examined how to analyse a FSM using state table, state diagram and waveforms. In this lecture we will learn how to design a fininte state machine in order to produce the desired

More information

Simultaneous Slack Matching, Gate Sizing and Repeater Insertion for Asynchronous Circuits

Simultaneous Slack Matching, Gate Sizing and Repeater Insertion for Asynchronous Circuits Simultaneous Slack Matching, Gate Sizing and Repeater Insertion for Asynchronous Circuits Gang Wu and Chris Chu Department of Electrical and Computer Engineering, Iowa State University, IA Email: {gangwu,

More information

Architecture of an Asynchronous FPGA for Handshake-Component-Based Design

Architecture of an Asynchronous FPGA for Handshake-Component-Based Design 1632 PAPER Special Section on Reconfigurable Systems Architecture of an Asynchronous FPGA for Handshake-Component-Based Design Yoshiya KOMATSU a), Nonmember, Masanori HARIYAMA, Member, and Michitaka KAMEYAMA,

More information

Feedback Techniques for Dual-rail Self-timed Circuits

Feedback Techniques for Dual-rail Self-timed Circuits This document is an author-formatted work. The definitive version for citation appears as: R. F. DeMara, A. Kejriwal, and J. R. Seeber, Feedback Techniques for Dual-Rail Self-Timed Circuits, in Proceedings

More information

Design Guidelines for Optimal Results in High-Density FPGAs

Design Guidelines for Optimal Results in High-Density FPGAs White Paper Introduction Design Guidelines for Optimal Results in High-Density FPGAs Today s FPGA applications are approaching the complexity and performance requirements of ASICs. In some cases, FPGAs

More information

Real-time processing for intelligent-surveillance applications

Real-time processing for intelligent-surveillance applications LETTER IEICE Electronics Express, Vol.14, No.8, 1 12 Real-time processing for intelligent-surveillance applications Sungju Lee, Heegon Kim, Jaewon Sa, Byungkwan Park, and Yongwha Chung a) Dept. of Computer

More information

Design of Parallel Self-Timed Adder

Design of Parallel Self-Timed Adder Design of Parallel Self-Timed Adder P.S.PAWAR 1, K.N.KASAT 2 1PG, Dept of EEE, PRMCEAM, Badnera, Amravati, MS, India. 2Assistant Professor, Dept of EXTC, PRMCEAM, Badnera, Amravati, MS, India. ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

FSM-based Digital Design using Veriiog HDL

FSM-based Digital Design using Veriiog HDL FSM-based Digital Design using Veriiog HDL Peter Minns lan Elliott Northumbria University, UK John Wiley & Sons, Ltd Contents Preface Acknowledgements xi xv 1 Introduction to Finite-State Machines and

More information

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University The Processor (1) Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction

More information

New Approach for Affine Combination of A New Architecture of RISC cum CISC Processor

New Approach for Affine Combination of A New Architecture of RISC cum CISC Processor Volume 2 Issue 1 March 2014 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Website: www.ijmemr.org New Approach for Affine Combination of A New Architecture

More information

Introduction to asynchronous circuit design. Motivation

Introduction to asynchronous circuit design. Motivation Introduction to asynchronous circuit design Using slides from: Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic,

More information

Sunburst Design - Comprehensive SystemVerilog Design & Synthesis by Recognized Verilog & SystemVerilog Guru, Cliff Cummings of Sunburst Design, Inc.

Sunburst Design - Comprehensive SystemVerilog Design & Synthesis by Recognized Verilog & SystemVerilog Guru, Cliff Cummings of Sunburst Design, Inc. World Class SystemVerilog & UVM Training Sunburst Design - Comprehensive SystemVerilog Design & Synthesis by Recognized Verilog & SystemVerilog Guru, Cliff Cummings of Sunburst Design, Inc. Cliff Cummings

More information

Lecture 3 Introduction to VHDL

Lecture 3 Introduction to VHDL CPE 487: Digital System Design Spring 2018 Lecture 3 Introduction to VHDL Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030 1 Managing Design

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

(ii) Simplify and implement the following SOP function using NOR gates:

(ii) Simplify and implement the following SOP function using NOR gates: DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING EE6301 DIGITAL LOGIC CIRCUITS UNIT I NUMBER SYSTEMS AND DIGITAL LOGIC FAMILIES PART A 1. How can an OR gate be

More information

Modified Micropipline Architecture for Synthesizable Asynchronous FIR Filter Design

Modified Micropipline Architecture for Synthesizable Asynchronous FIR Filter Design Modified Micropipline Architecture for Synthesizable Asynchronous FIR Filter Design Basel Halak and Hsien-Chih Chiu, ECS, Southampton University, Southampton, SO17 1BJ, United Kingdom Email: {bh9, hc13g09}

More information

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL

FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL FPGA Implementation of Efficient Carry-Select Adder Using Verilog HDL Abstract: Lingappagari Raju M.Tech, VLSI & Embedded Systems, SR International Institute of Technology. Carry Select Adder (CSLA) is

More information

Reliable Physical Unclonable Function based on Asynchronous Circuits

Reliable Physical Unclonable Function based on Asynchronous Circuits Reliable Physical Unclonable Function based on Asynchronous Circuits Kyung Ki Kim Department of Electronic Engineering, Daegu University, Gyeongbuk, 38453, South Korea. E-mail: kkkim@daegu.ac.kr Abstract

More information

Jung-Lin Yang. Ph.D. and M.S. degree in the Dept. of Electrical and Computer Engineering University of Utah expected spring 2003

Jung-Lin Yang. Ph.D. and M.S. degree in the Dept. of Electrical and Computer Engineering University of Utah expected spring 2003 Jung-Lin Yang Business Address: 50 South Campus Drive, RM 3280 Salt Lake City, UT 84112 (801) 581-8378 Home Address: 1115 Medical Plaza Salt Lake City, UT 84112 (801) 583-0596 (801) 949-8263 http://www.cs.utah.edu/~jyang

More information

ISSN Vol.08,Issue.07, July-2016, Pages:

ISSN Vol.08,Issue.07, July-2016, Pages: ISSN 2348 2370 Vol.08,Issue.07, July-2016, Pages:1312-1317 www.ijatir.org Low Power Asynchronous Domino Logic Pipeline Strategy Using Synchronization Logic Gates H. NASEEMA BEGUM PG Scholar, Dept of ECE,

More information

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry,

More information

Optimization of Robust Asynchronous Circuits by Local Input Completeness Relaxation. Computer Science Department Columbia University

Optimization of Robust Asynchronous Circuits by Local Input Completeness Relaxation. Computer Science Department Columbia University Optimization of Robust Asynchronous ircuits by Local Input ompleteness Relaxation heoljoo Jeong Steven M. Nowick omputer Science Department olumbia University Outline 1. Introduction 2. Background: Hazard

More information

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations Chapter 4 The Processor Part I Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations

More information

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC

POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC 181 POWER ANALYSIS OF CRITICAL PATH DELAY DESIGN USING DOMINO LOGIC R.Yamini, V.Kavitha, S.Sarmila, Anila Ramachandran,, Assistant Professor, ECE Dept, M.E Student, M.E. Student, M.E. Student Sri Eshwar

More information

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL Sharmin Abdullah, Nusrat Sharmin, Nafisha Alam Department of Electrical & Electronic Engineering Ahsanullah University of Science & Technology

More information

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis

EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis EECS150 - Digital Design Lecture 5 - Verilog Logic Synthesis Jan 31, 2012 John Wawrzynek Spring 2012 EECS150 - Lec05-verilog_synth Page 1 Outline Quick review of essentials of state elements Finite State

More information

RIZALAFANDE CHE ISMAIL TKT. 3, BLOK A, PPK MIKRO-e KOMPLEKS PENGAJIAN KUKUM. SYNTHESIS OF COMBINATIONAL LOGIC (Chapter 8)

RIZALAFANDE CHE ISMAIL TKT. 3, BLOK A, PPK MIKRO-e KOMPLEKS PENGAJIAN KUKUM. SYNTHESIS OF COMBINATIONAL LOGIC (Chapter 8) RIZALAFANDE CHE ISMAIL TKT. 3, BLOK A, PPK MIKRO-e KOMPLEKS PENGAJIAN KUKUM SYNTHESIS OF COMBINATIONAL LOGIC (Chapter 8) HDL-BASED SYNTHESIS Modern ASIC design use HDL together with synthesis tool to create

More information

RTL Coding General Concepts

RTL Coding General Concepts RTL Coding General Concepts Typical Digital System 2 Components of a Digital System Printed circuit board (PCB) Embedded d software microprocessor microcontroller digital signal processor (DSP) ASIC Programmable

More information

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor Designing the datapath Chapter 4 The Processor Designing the datapath Introduction CPU performance determined by Instruction Count Clock Cycles per Instruction (CPI) and Cycle time Determined by Instruction Set Architecure (ISA)

More information

High-Level Design for Asynchronous Logic

High-Level Design for Asynchronous Logic High-Level Design for Asynchronous Logic Ross Smith, Michiel Ligthart Theseus Logic {ross.smith, michiel.ligthart}@theseus.com Abstract Asynchronous, self-timed, logic is often eschewed in digital design

More information

Register Transfer Level in Verilog: Part I

Register Transfer Level in Verilog: Part I Source: M. Morris Mano and Michael D. Ciletti, Digital Design, 4rd Edition, 2007, Prentice Hall. Register Transfer Level in Verilog: Part I Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National

More information

Lecture #1: Introduction

Lecture #1: Introduction Lecture #1: Introduction Kunle Olukotun Stanford EE183 January 8, 20023 What is EE183? EE183 is continuation of EE121 Digital Logic Design is a a minute to learn, a lifetime to master Programmable logic

More information

Formulation for Performing Multi Bit Binary Addition using Parallel, Single-Rail Self-Timed Adder without Any Carry Chain Propagation

Formulation for Performing Multi Bit Binary Addition using Parallel, Single-Rail Self-Timed Adder without Any Carry Chain Propagation Formulation for Performing Multi Bit Binary Addition using Parallel, Single-Rail Self-Timed Adder without Any Carry Chain Propagation Y. Gouthami PG Scholar, Department of ECE, MJR College of Engineering

More information

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique P. Durga Prasad, M. Tech Scholar, C. Ravi Shankar Reddy, Lecturer, V. Sumalatha, Associate Professor Department

More information

Parallel, Single-Rail Self-Timed Adder. Formulation for Performing Multi Bit Binary Addition. Without Any Carry Chain Propagation

Parallel, Single-Rail Self-Timed Adder. Formulation for Performing Multi Bit Binary Addition. Without Any Carry Chain Propagation Parallel, Single-Rail Self-Timed Adder. Formulation for Performing Multi Bit Binary Addition. Without Any Carry Chain Propagation Y.Gowthami PG Scholar, Dept of ECE, MJR College of Engineering & Technology,

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Department of Computer Science and Engineering UNIT-III PROCESSOR AND CONTROL UNIT PART A 1. Define MIPS. MIPS:One alternative to time as the metric is MIPS(Million Instruction Per Second) MIPS=Instruction

More information

P2FS: supporting atomic writes for reliable file system design in PCM storage

P2FS: supporting atomic writes for reliable file system design in PCM storage LETTER IEICE Electronics Express, Vol.11, No.13, 1 6 P2FS: supporting atomic writes for reliable file system design in PCM storage Eunji Lee 1, Kern Koh 2, and Hyokyung Bahn 2a) 1 Department of Software,

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

Recent Advances in Designing Clockless Digital Systems

Recent Advances in Designing Clockless Digital Systems Recent Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia columbia.edu Chair, Computer Engineering Program Department of Computer Science (and Elect. Eng.) Columbia

More information

101-1 Under-Graduate Project Digital IC Design Flow

101-1 Under-Graduate Project Digital IC Design Flow 101-1 Under-Graduate Project Digital IC Design Flow Speaker: Ming-Chun Hsiao Adviser: Prof. An-Yeu Wu Date: 2012/9/25 ACCESS IC LAB Outline Introduction to Integrated Circuit IC Design Flow Verilog HDL

More information

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2 IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 04, 2015 ISSN (online): 2321-0613 Design of Low Power Asynchronous Parallel Adder Benedicta Roseline. R 1 Kamatchi. S 2

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

VLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27,

VLSI Testing. Virendra Singh. Bangalore E0 286: Test & Verification of SoC Design Lecture - 7. Jan 27, VLSI Testing Fault Simulation Virendra Singh Indian Institute t of Science Bangalore virendra@computer.org E 286: Test & Verification of SoC Design Lecture - 7 Jan 27, 2 E-286@SERC Fault Simulation Jan

More information

ACTUAL-DELAY CIRCUITS ON FPGA: TRADING-OFF LUTS FOR SPEED. Evangelia Kassapaki, Pavlos M. Mattheakis and Christos P. Sotiriou

ACTUAL-DELAY CIRCUITS ON FPGA: TRADING-OFF LUTS FOR SPEED. Evangelia Kassapaki, Pavlos M. Mattheakis and Christos P. Sotiriou ACTUAL-DELAY CIRCUITS ON FPGA: TRADING-OFF LUTS FOR SPEED Evangelia Kassapaki, Pavlos M. Mattheakis and Christos P. Sotiriou Institute of Computer Science, FORTH, Crete, Greece. email: kassapak@ics.forth.gr,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation. ISSN 2319-8885 Vol.03,Issue.32 October-2014, Pages:6436-6440 www.ijsetr.com Design and Modeling of Arithmetic and Logical Unit with the Platform of VLSI N. AMRUTHA BINDU 1, M. SAILAJA 2 1 Dept of ECE,

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

IMPLEMENTATION OF SOC CORE FOR IOT ENGINE

IMPLEMENTATION OF SOC CORE FOR IOT ENGINE IMPLEMENTATION OF SOC CORE FOR IOT ENGINE P.Jennifer 1 and S.Ramasamy 2 1 ME. VLSI Design, RMK Engineering College, Anna University 2 Professor, Department of ECE, RMK Engineering College, Anna University

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 5, Sep-Oct 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 5, Sep-Oct 2014 RESEARCH ARTICLE OPEN ACCESS A Survey on Efficient Low Power Asynchronous Pipeline Design Based on the Data Path Logic D. Nandhini 1, K. Kalirajan 2 ME 1 VLSI Design, Assistant Professor 2 Department of

More information

Modeling Asynchronous Communication at Different Levels of Abstraction Using SystemC

Modeling Asynchronous Communication at Different Levels of Abstraction Using SystemC Modeling Asynchronous Communication at Different Levels of Abstraction Using SystemC Shankar Mahadevan Inst. for Informatics and Mathematical Modeling Tech. Univ. of Denmark (DTU) Lyngby, Denmark sm@imm.dtu.dk

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

Synthesis of Combinational and Sequential Circuits with Verilog

Synthesis of Combinational and Sequential Circuits with Verilog Synthesis of Combinational and Sequential Circuits with Verilog What is Verilog? Hardware description language: Are used to describe digital system in text form Used for modeling, simulation, design Two

More information

Processor (I) - datapath & control. Hwansoo Han

Processor (I) - datapath & control. Hwansoo Han Processor (I) - datapath & control Hwansoo Han Introduction CPU performance factors Instruction count - Determined by ISA and compiler CPI and Cycle time - Determined by CPU hardware We will examine two

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 1: Digital logic circuits The digital computer is a digital system that performs various computational tasks. Digital computers use the binary number system, which has two

More information

Clockless IC Design using Handshake Technology. Ad Peeters

Clockless IC Design using Handshake Technology. Ad Peeters Clockless IC Design using Handshake Technology Ad Peeters Handshake Solutions Philips Electronics Philips Semiconductors Philips Corporate Technologies Philips Medical Systems Lighting,... Philips Research

More information