CAD Technology of the SX-9

Similar documents
Low-Power Technology for Image-Processing LSIs

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Packaging Technology of the SX-9

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Technology of the SX-9 (2) - Internode Switch -

Cluster-based approach eases clock tree synthesis

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Power Optimization in FPGA Designs

Digital Design Methodology

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

Case study of Mixed Signal Design Flow

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

ASIC world. Start Specification Design Verification Layout Validation Finish

System ASIC Technologies for Mounting Large Memory and Microcomputers

6T- SRAM for Low Power Consumption. Professor, Dept. of ExTC, PRMIT &R, Badnera, Amravati, Maharashtra, India 1

101-1 Under-Graduate Project Digital IC Design Flow

Actel s SX Family of FPGAs: A New Architecture for High-Performance Designs

INTERCONNECT TESTING WITH BOUNDARY SCAN

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

System Verification of Hardware Optimization Based on Edge Detection

Embedded SRAM Technology for High-End Processors

Analog ASICs in industrial applications

CHAPTER 1 INTRODUCTION

for Image Applications

COE 561 Digital System Design & Synthesis Introduction

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Chapter 5: ASICs Vs. PLDs

Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs

Design Guidelines for Optimal Results in High-Density FPGAs

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

Overview of Digital Design Methodologies

EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs)

Physical Implementation

Synchronization In Digital Systems

Evolution of CAD Tools & Verilog HDL Definition

Outline. EECS150 - Digital Design Lecture 6 - Field Programmable Gate Arrays (FPGAs) FPGA Overview. Why FPGAs?

Clock Design of 300MHz 128-bit 2-way Superscalar Microprocessor

Overview of Digital Design with Verilog HDL 1

An Overview of Standard Cell Based Digital VLSI Design

A Low Power DDR SDRAM Controller Design P.Anup, R.Ramana Reddy

Will Silicon Proof Stay the Only Way to Verify Analog Circuits?

An overview of standard cell based digital VLSI design

CAD for VLSI. Debdeep Mukhopadhyay IIT Madras

Design is neither a form alone or function alone, it is aesthetic synthesis of the both. Ferdinand Porsche

Packaging Technology for Image-Processing LSI

Development of Intelligent Power Devices for Automotive Applications

CHAPTER 12 ARRAY SUBSYSTEMS [ ] MANJARI S. KULKARNI

Commercialization of the Cloud Platform Suite and Endeavors toward a High-Efficiency Server

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin

UNIT IV CMOS TESTING

FPGA Programming Technology

Delay Modeling and Static Timing Analysis for MTCMOS Circuits

Chapter 2 On-Chip Protection Solution for Radio Frequency Integrated Circuits in Standard CMOS Process

SMAFTI Package Technology Features Wide-Band and Large-Capacity Memory

A Low-Power ECC Check Bit Generator Implementation in DRAMs

Comprehensive Place-and-Route Platform Olympus-SoC

More Course Information

TEXAS INSTRUMENTS ANALOG UNIVERSITY PROGRAM DESIGN CONTEST MIXED SIGNAL TEST INTERFACE CHRISTOPHER EDMONDS, DANIEL KEESE, RICHARD PRZYBYLA SCHOOL OF

Hardware Design with VHDL PLDs IV ECE 443

Constraint Verification

Content Addressable Memory performance Analysis using NAND Structure FinFET

AccuCore STA DSPF Backannotation Timing Verification Design Flow

IMPROVES. Initial Investment is Low Compared to SoC Performance and Cost Benefits

Columbia Univerity Department of Electrical Engineering Fall, 2004

TOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

William Stallings Computer Organization and Architecture 8th Edition. Chapter 5 Internal Memory

COMP3221: Microprocessors and. and Embedded Systems. Overview. Lecture 23: Memory Systems (I)

ECEN 449 Microprocessor System Design. FPGAs and Reconfigurable Computing

FishTail: The Formal Generation, Verification and Management of Golden Timing Constraints

VERY LOW POWER MICROPROCESSOR CELL

Development of Low Power ISDB-T One-Segment Decoder by Mobile Multi-Media Engine SoC (S1G)

Workspace for '4-FPGA' Page 1 (row 1, column 1)

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

Spiral 2-8. Cell Layout

Best Practices for Incremental Compilation Partitions and Floorplan Assignments

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

Analysis of 8T SRAM with Read and Write Assist Schemes (UDVS) In 45nm CMOS Technology

VLSI Design Automation. Calcolatori Elettronici Ing. Informatica

VLSI Design Automation

This Part-A course discusses techniques that are used to reduce noise problems in the design of large scale integration (LSI) devices.

Design Methodologies and Tools. Full-Custom Design

Is Power State Table Golden?

STUDY OF SRAM AND ITS LOW POWER TECHNIQUES

Memory Supplement for Section 3.6 of the textbook

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

On-Chip Variation (OCV) Kunal Ghosh

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS (FPGAS)

Semiconductor Memories: RAMs and ROMs

Design Methodologies

Eliminating Routing Congestion Issues with Logic Synthesis

ECE 486/586. Computer Architecture. Lecture # 2

Design Verification Lecture 01

Transcription:

KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package design of the SX-9 was subjected to various requirements, including: compatibility with larger scales; higher speed; power saving; further miniaturization; methods of verification; reduction of design lead time; improvement of design efficiency; and improvement of design quality. These requirements were met with the SX-9 through advanced integration of design techniques and CAD technology. Keywords CAD, LSI design, package design 1. Introduction The Supercomputer SX-9 is composed of: (1) high-speed, large-scale, high-reliability LSIs based on the CMOS LSI technology with the 65nm Cu process; and (2) the package that accommodates the LSIs in high density at a high speed. In particular, the LSI features a single-chip implementation of the vector processor (vector unit and scalar unit). Its core functions at an ultra-high speed of 3.2GHz, and the scale is as large as a few mega-instances. To make use of such high-performance supercomputers, design technology and CAD technology that can handle large scale, high speed, power saving and miniaturization are indispensable. Verification technology preventing the backsliding due to design mistakes is also critical. In the following sections, we will describe the design technology and CAD technology supporting the LSI and package designs of the SX-9. 2. Logic Verification Technology In the design of multifunction, high-performance systems, it is almost unavoidable that logical mistakes (logical bugs) will penetrate into the design process. These logical bugs should be eliminated completely in the design period so that the system can function as it was designed. With the SX-9, we have adopted logic verification techniques using assertion to improve the completeness of verifications and detect logical bugs efficiently. 2.1 Logic Verification Using Assertion Logic design (1) develops hardware functional specifications from the design concept and (2) details them into descriptions of hardware description language (HDL) with logical synthesis capability. As the LSI design mainly uses HDL as input, it is necessary to verify that it is designed as specified. However, because functional specifications are often created ambiguously using the Japanese language, diagrams and charts, they cannot be subjected as they are to automated processes for comparison with HDL. Meanwhile, assertion describes operations and restrictions using a dedicated language, and is capable of eliminating the ambiguity of functional specifications. When an assertion describing the functional specifications is used, it is possible to perform formal verifications and simulations directly with the HDL to detect violations of specifications. This allows the designer to verify the HDL in the functional specifications at an early stage. Using the assertion used in the design stage for verification also makes it possible to ensure consistency and improve the entire design process. 2.2 Formal Verification Using Assertion Formal verification is a technology for demonstrating the perfect equivalence between the specifications described in an assertion and the HDL, using mathematical techniques, without preparing a test bench like simulations. It can also analyze the structure of the HDL and automatically extract an assertion that detects, for example, deadlocks or flip-flops (F/ 34

Special Issue: Supercomputer SX-9 F) due to the absence of reset signal input. This makes preparation for verification unnecessary, and therefore makes efficient verification possible at an early stage. In addition, with the SX-9, we have also developed a technology for partially automatic generation of functional assertions using the truth table described in the functional specifications. This technology can reduce the burden on the designer related to assertion development and improve efficiency. While formal verification is capable of detecting logical bugs efficiently and without omission in the initial stage of verification, its property as a complete demonstration also presents the problem of explosive increase in analysis amounts when the scale and complexity of the verification target are increased, which eventually makes verification impossible. To deal with this problem, we conducted formal verification by taking the verification scale into consideration, for example dividing it function by function into the block level. 2.3 Logic Simulation Using Assertion Since formal verification cannot be used in the verification of large-scale models, we conducted logic verification at the system level of the SX-9, particularly that of large-scale LSIs, separately from formal verification, by introducing assertions into the logic simulations and operating them exhaustively and efficiently. Since a large-scale model includes a wide range of verification targets and involves much complexity, identification of logical bugs becomes very difficult if the expected results are not obtained in simulations. Therefore, we embedded assertions in various points in the model for use as checkers for detecting violation of specifications. We used the fact that the assertions of the functions close to logical bugs detect the violations early to identify the functions with which illegal operations occurred. This has improved the analysis of illegal operations even with the large-scale, complicated model. We also used the embedded assertions as monitors for the management of which functions have been evaluated in simulation and which have not. We used the results to create test benches for the functions corresponding to the non-evaluated assertions, to eliminate non-evaluated functions and improve the exhaustiveness of verification. 3. Large-scale Design Technology 3.1 Hierarchy Design Technology Aiming at designing large-scale, high-performances LSIs for use in the SX-9, we inherited and also improved the hierarchical layout technique used in the SX-8. The top-down hierarchical design technique used with the SX-8 (1) divided the chip layout into the top layer (TOP) and several lower layers (UNIT); (2) limited the data amount to be handled at once to reduce the TAT of individual processing; and (3) at the same time made the multiple layouts in the lower layers parallel to reduce the overall flow design period. We adopted a similar idea with the SX-9, but additionally (1) combined bottom-up and top-down techniques in order to handle the multiple layers; (2) improved the budget technique for creating restrictions on the delay to UNIT in order to reduce the loss problem accompanying the division of hierarchical design; and (3) improved the accuracy of delay in physical synthesis of the UNIT design in order to improve design efficiency. Furthermore, we decided not to divide and execute TOP and UNIT at the time of initial layout and wiring operations, but also developed a technology for building them at the chip level after these operations. With this technology, designated areas can be extracted from the part (or majority) for modification during layout change in the final stage, and these modifications can be applied to the CHIP level again. This method is different from the previous method in that it isolates a specific area from the chip layout data regardless of the logical hierarchy, modifies it and applied the result to the chip data, while the previous method initially isolates UNIT from TOP at a position matching the RTL (Register Transfer Level) hierarchy with the consciousness on the consistency, and performs the layout including optimization on the per-unit basis. The new method has made it possible to reduce TAT into a fraction compared to the case in which pure physical processing (e.g. dummy metal generation) is performed on a per-chip basis. 3.2 Clock Distribution and Repair Since power consumption poses a major problem with largescale LSIs, we applied clock gating (see Section 4.2 for details) to reduce power consumption. As a measure at the layout level, we developed a technology for configuring a clock distribution buffer, for which clock gating is applied to the lowskew clock distribution technology used with the SX-8, NEC TECHNICAL JOURNAL Vol.3 No.4/2008 ------- 35

automatically according to the F/F layout, as well as the techniques for distribution and wiring. The SX-9 also employs multi-step clock distribution of global and local clocks (a technique for skew reduction according to the current load adjustment) similar to the SX-8, and the clocks are distributed from the mesh structure to the F/Fs and macros using a distribution buffer. We attempted to reduce power consumption by controlling the distribution buffer through applying clock control signals as its input. The clock control signal does not consider the layout, and is distributed only according to logical consistency at the time of net list synthesis; but, after the layout, the distribution is reconfigured considering the physical positioning and shortening of the wiring of proper clocks (mainly by dividing the distribution buffer). Preservation of signal integrity poses a big problem with high-speed circuitry such as the SX-9. For error repair in such a circuit, we did not choose to repair every type of error individually, but to repair multiple types of errors, including antenna error violation, electromigration error, crosstalk error, waveform distortion, limitation error and fan-out limitation error (also including corrections of some setup/hold errors in the delay system) to reduce the TAT of repair flow. Since the use of a single repair technique, such as simple insertion of a buffer, is insufficient, we had to combine several techniques, such as improvement of layout according to error, change of gate type, insertion of new gate and correction of wiring. Estimating whether or not overlapped errors can be corrected with a single technique has made it possible for us to improve a large number of problems with fewer modifications. The power estimation technology is roughly divided into two technologies: leak power estimation and dynamic power estimation. The leak power can be calculated by defining the leak power of each cell and summing the leak powers of the cells used in a circuit. For the dynamic power, we first simulated the operation of the circuitry and obtained the rate at which each F/F works (toggle rate). Based on the obtained toggle rate, we used a power estimation tool to calculate the power consumption of each cell based on the charge/discharge current of its load capacitance, and obtained the sum of the calculated power consumption values. 4.2 Power Reduction Technology Among the power reduction technologies adopted in the design of the SX-9, this section deals with the main technologies, which are the clock gating and Multi-Vth technologies. Clock gating is applied to reduce dynamic power during operation. Since the clock signals sent to the F/Fs are usually permanently active, and the clock circuitry for the F/Fs functions even when the F/F data does not vary, wasteful power is consumed in this period. The clock gating technology stops the clock signals to the F/Fs without data change to reduce the power consumption of F/F clock circuitry ( Fig. 1 ). On the other hand, the Multi-Vth technology reduces the leak power of the transistors in the circuitry. The transistor s leak power has been increasing following the advancement of microminiaturization of semiconductor processes. The leak power can 4. Power-Saving Design Technology Reduction of the TCO (Total Cost of Ownership) is indispensable for a large-scale system like the SX-9, and reduction of the LSI power consumption is an especially important topic. On the other hand, since the power supply of the entire system is designed in parallel with the LSI design, it is necessary to prevent backsliding, such as discovery of insufficiency in power supply capability after completion of design, by determining power consumption at high accuracy in the initial stage of design. 4.1 Power Estimation Technology Fig. 1 Clock gating. 36

Special Issue: Supercomputer SX-9 be reduced by increasing the Vth (threshold voltage) of the transistors, but this would result in a drop of their operation speed. Therefore, we used low-vth cells with high operation speeds in the areas subject to severe delay performance requirement, and high-vth cells with low operation speeds but low leak power in the areas with certain headroom in the delay performance requirement. This has made it possible to improve both the leak power and delay performances, and we have named it Multi-Vth technology. 5. Delay Analysis Technology Since the SX-9 utilizes high-speed micro-processes, variance in manufacturing exerts important effects on the achievement of the target delay. Therefore, in the delay analysis for the SX-9, we eliminated excessive verification margin in the Setup/Hold verifications, and succeeded in achieving skew analysis accuracy on the order of picoseconds (1/1,000,000,000,000 sec.). 5.1 High-accuracy Clock Mesh Delay Calculation Technology With a design adopting clock architecture with a mesh structure in which multiple clock driver outputs are connected with wired connections, the delay in the mesh area has been unable to be calculated accurately with the traditional cell-base (gate level) delay analysis (STA) tools. In addition, in the GHzorder clock frequency domain used with the SX-9, the effects of inductance (L) are also not ignorable. Therefore, we developed hybrid delay verification flow, which (1) extracts LRC, including the inductance component, from the PLL to the clock mesh area; (2) obtains the delay value from the results of transient analysis using SPICE simulation; (3) obtains the delay of other data lines using the STA tool; and (4) integrates the two delay values obtained above. This flow can minimize the processing targets of SPICE simulation that tend to increase the TAT, and can implement a delay calculation environment that is well-balanced from the viewpoints of clock delay accuracy and TAT. 5.2 Delay Analysis Technology Considering Random Manufacturing Variance With recent microfabrication processes, delay analysis using the library of the worst/best cases between lots or between Fig. 2 Delay fluctuations due to random fabrication variance. wafers in the transistor/wiring manufacturing process is getting insufficient to guarantee actual chip operation due to random manufacturing variance inside the chips. Therefore, with the delay analysis of LSIs of the SX-9, we determined the on-chip delay fluctuation constant according to the extent of the clock driver placement positions, and reflected the result to the clock latency in order to achieve skew calculations considering random manufacturing variance. As a result, with a transfer bus between closely placed F/Fs, the delay fluctuations are small, because the TX/RX clock drivers are laid out in a small area, and their variances are low. Conversely, the delay fluctuations should be considered to be large on a bus connecting distant F/Fs ( Fig. 2 ). 6. Package Design Technology With the SX-9, several control LSIs and a large number of memory modules are mounted on a single PWB to transmit signals at high speeds. This consequently makes it important to use technology for verification of logical connections of wiring, and design technology for carrying out layout and wiring design efficiently. NEC TECHNICAL JOURNAL Vol.3 No.4/2008 ------- 37

6.1 Early Design Verification Technology To ensure correct connections of multiple wirings between the control LSIs and a large number of memory modules, we developed a tool for generating the wiring connection rules and a tool for verifying that the connection information in the net list complies with the connection rules. These tools have made possible efficient connection verifications, and have also improved design quality by enhancing the exhaustiveness of verifications. 6.2 Real-time Verification A large variety of components, including capacitors and registers as well as control LSIs and memory modules, are mounted in high density on a PWB. We developed the design flow so that it can verify, in real time during design, if the packaging intervals of the above components are enough as required for production and servicing to improve the layout design efficiency. The large number of high-speed signals transferred between the control LSIs and memory modules are wired as differential pairs. Various restrictions related to wiring length, wiring intervals, etc., are set on the two signal wirings in each pair, and restrictions on wiring length, etc., are also imposed on many signals other than these high-speed signals. We developed design flow that can verify if the above restrictions are observed in real time during design, thereby making it possible to achieve design observing wiring-related restrictions on differential operation, equal lengths, etc., efficiently. Authors' Profiles KONNO Yoshihiro IKAWA Yasuhiro SAWANO Tomoki KANAMARU Keisuke ONO Koki KUMAZAKI Masahito 7. Conclusion In the above, we introduced the features of the design techniques and the CAD technology used with the SX-9. Since supercomputers are expected to be subject to requirements for performance improvement, power consumption and reliability continually in the future, technology for designing the LSIs and packages necessary for these requirements, efficiently and with high quality, will increase in importance. To cope with this trend, we are determined to develop more excellent design technology and CAD technology so that we can contribute to the improvement of the competitiveness of our supercomputers in the future. 38