Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Size: px
Start display at page:

Download "Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China"

Transcription

1 CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

2 OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX Core Interface Link and Clocking Design Conclusions

3 Motivations Advances in fiber optic link technology and WDM have made raw bandwidth in abundance Switches/Routers are replacing the transmission link as the bottleneck of the network Switches/Routers with high speed (OC-192, 10Gb/s) and large number of I/O ports (128*128 or 256*256) are becoming a necessity Key issues for designing high speed router: Switching Fabric Interconnects Queuing Schemes Arbiters Multicast

4 Fabric Interconnects: Crossbar Crossbar (Crosspoint) Fabric is becoming the preferable interconnect fabric for high speed and scalable switching It has been proven that crossbar (even input-queued) can have as high throughput as any switch. A crossbar inherently supports multicast efficiently. QoS can be implemented reasonably easy. The key challenge is the scalability for high line rate and large number of ports CMOS technology can achieve high density and low cost

5 Architecture of the CMOS Crossbar Switch Crossbar Switch Core fulfills the switch/router function Controller configures the crossbar core switching High speed data link communicates between switch fabric and line card PLL provides on-chip precise clock High Speed Data Link High Speed Data Link Controller 1GHz 256*256 Crossbar Switch Core High Speed Data Link P L L High Speed Data Link

6 Two Approaches to Build the Core X-Y Based Crossbar MUX Based Crossbar INPUT Control word INPUT OUTPUT 2:1 MUX OUTPUT Control word Scalability : N 2 Speed : limited by Cap at input and output lines Control: N 2 bits Scalability : N 2 Speed : limited by Cap only at input line Control: N*Log 2 N bits

7 Problems of Designing Large Crossbar Switch The switch core scales to N 2 Design complexity increases The performance requirement increases much faster than that can be achieved through CMOS technology scaling The throughput can be satisfied by using multiple bit-slices (e.g. 8) of the core, however, the core size increases by 8 times Wire delay is also substantial in high performance chip

8 Our Approach Pipelined MUX Crossbar Digital Core Digital MUX tree based design technique guarantees the high performance as well as the low design complexity In order to integrate large crossbar switch, only 2 bit-slices are embedded in the digital core instead of 8 (60%+ area saving) 1GHz digital core is demanded for the 2 Gb/s interface, the MUX tree can be pipelined to fulfill the requirement Additional pipeline stage is added to drive long wire

9 SDFF embedded with MUX Vdd embedded 4- to-1 MUX P1 N1 X S NAND I3 P2 I5 Q QB Sel I4 N5 N2 N6 I6 D Clk N3 N4 I1 I2 High performance Semi-Dynamic Flip-Flop (SDFF) is used [Klass98, Stojanovic et.al. 99] One of the fastest Flip-Flops due to negative setup time Little overhead for embedding with MUX function

10 Pipeline Stages Partition Static SDFF embedded Static SDFF embedded 1st stage 2nd stage (a) Static 2-to-1 SDFF embedded Static 8-to-1 SDFF embedded 1st stage 2nd stage The pipeline of the 256-to-1 MUX can be partitioned as Natural: 16-to-1 MUX in 1 st stage; 16-to-1 MUX in 2 nd stage Balanced: 8-to-1 MUX in 1 st stage; 32-to-1 MUX 2 nd stage (b)

11 Driving Long Wire Adding Repeater cannot Satisfy the 1GHz Requirement The 1 st stage is critical due to the large capacitor at the input line Distributed R-C wire model is employed Repeater can be inserted to reduce the wire delay For 256 ports, even inserting the optimal size and number of repeater, the delay is still larger than 1ns D CLK0 Q0 Qb0 driv1 Delay time (ps) Driver delay Rwire Cwire W Static 2to1 MUX in each cell CLK1 DD Wire delay Repeater cell cell cell cell SDFF_em 4to1 MUX Q1 Port No.= 64 Port No.= 128 Port No.= 256 Repeater number

12 Adding One Pipeline Stage to Drive Long Wire Add one more stage for driving the long wire by D CLK0 inserting a Flip-Flop The whole 256*256 crossbar is divided into 4 128* sub-crossbar, so that the input line only need to drive 128 cells instead of 256 For 128 ports, sub-ns delay time is achievable Q0 Qb0 driv1 Delay time (ps) Driver delay Rwire Cwire W Static 2to1 MUX Wire delay in each cell CLK1 DD Flip-Flop cell cell cell cell FF CLK1 SDFF_em 4to1 MUX Repeater number Q1 Port No.= 64 Port No.= 128 Port No.= 256

13 3-stages Pipelined MUX Crossbar Floor-Planning The 256*256 crossbar consists of 4 sub-crossbars (128*128) running at 1GHz frequency 2 pipeline stages in each sub-crossbar 2 bit-slices are embedded matching with 2Gb/s data link IN [0:63] OUT [0:63] Legend R R OUT [192:255] IN [64:127] OUT [64:127] R R R R 2:1 2:1 R R R R R 0 ~ 3 : Sub-Crossbar 0~3; IN [192:255] 2:1 2:1 2:1: SDFF embedded with 2-to-1 Mux; R 2 R R: Register c R 128*128 Sub-Crossbar OUT [128:191] IN [128:191]

14 3-stages Pipelined MUX Crossbar Timing Diagram In sub-crossbar 0, inputs [0:127] are switching in the 1 st and 2 nd stages, while in sub-crossbar 3, inputs [128:255] are switching in the 2 nd and 3 rd stages Finally, the two groups of outputs are fed into SDFF_embedded with 2-to-1 MUX to complete the 256-to-1 MUX action IN [0:63] IN [64:127] Sub-Crossbar0 Static 2-to-1 1st stage SDFF Static 2nd stage Static 3rd stage IN [128:191] IN [192:255] Sub-Crossbar3 1st stage Static 2-to-1 2nd stage SDFF Static 3rd stage Static SDFF 2-to-1 Mux OUT [0:63] OUT [192:255]

15 The Sub-Crossbar Circuits Simulation Results Technology Layout size Sub-crossbar No. Post layout simulation 1st stage 2 nd stage 3 rd stage TSMC 0.25µm SCN5M Deep 5.9 mm * 3.1 mm Sub-crossbar0 Sub-crossbar3 959 ps 845 ps 866 ps 959 ps 236 ps 866 ps

16 Control Circuits Control bits are used to configure the corresponding MUX in the crossbar pipeline in the correct timing stage. For saving the pin counts, the control inputs are embedded within the data inputs, each incoming frame packet includes one byte control word and 64 bits of data The timing constraints can be satisfied by careful pipelining the control path

17 Control Circuits (cont d) Bang-Bang PD: samples the 2Gb/s inputs, converts to 2bits, each at 1Gb/s Re-synchronization: synchronizes each input signal to the main clock 2Gb/s Interface clock Bang - Bang P D 2 1Gb/s 2 Counter Pulse Gen Main clock Counter Pulse Gen D M U X to Data - Path DMUX: demultiplexs signal to data & control bits Counter: counts 4/36 and controls the DMUX Control DMUX 8 Resynchronization subcrossbar to Control - Path to Crossbar

18 Full Crossbar Core Layout and Specification Full 256*256 crossbar core with 2 bit-slices Technology: TSMC 0.25µm SCN5M Deep, 5 Layer Metal Layout size: 14 mm*8 mm Transistor counts: 2000k Supply voltage: 2.5v Clock frequency: 1GHz Power: 40W

19 Interface Link and Clocking Design The dual loop delay locked loop () design technique is adopted in the data link for data and clock recovery The main analog generates multiple clock phases for the interpolation in the full digital periphery loop A half rate bang-bang phase detector is used in the periphery loop to sample the 2Gb/s incoming signal by using 1GHz clock A 3 rd loop, an analog PLL, provides the 1GHz on-chip clock D L L Peri - Loops P L L

20 Interface Link and Clocking Design System CLK The system clock is at 250MHz, PLL provides the precise 1GHz clock for the whole chip Several periphery loops share one analog D 2Gb/s D 2Gb/s Bang-Bang 4 Peri - Loop Bang-Bang 4 2 1Gb/s F S M Phase Selector & Interpolator 2 1Gb/s F S M Phase Selector & Interpolator PFD 1/4 VCDL P L L CLK Distribution Network P D CP+LP CP+LF VCO Peri - Loop D L L......

21 Conclusions A 2Gb/s 256*256 CMOS Crossbar Switch Core is achievable with current process technology Significant area saving is obtained by using only 2 bit-slices in the crossbar switch core 3-stages pipelined MUX circuit is proposed to decrease the cycle time to less than 1ns Post layout simulation results show that each stage can run at a clock rate higher than 1GHz Full 256*256 crossbar core has been laid out to demonstrate the design PLL & dual circuits have been designed for the clocking and high speed link in the whole chip

Chapter - 7. Multiplexing and circuit switches

Chapter - 7. Multiplexing and circuit switches Chapter - 7 Multiplexing and circuit switches Multiplexing Multiplexing is used to combine multiple communication links into a single stream. The aim is to share an expensive resource. For example several

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Introduction to Field Programmable Gate Arrays

Introduction to Field Programmable Gate Arrays Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

IV. PACKET SWITCH ARCHITECTURES

IV. PACKET SWITCH ARCHITECTURES IV. PACKET SWITCH ARCHITECTURES (a) General Concept - as packet arrives at switch, destination (and possibly source) field in packet header is used as index into routing tables specifying next switch in

More information

High-speed Serial Interface

High-speed Serial Interface High-speed Serial Interface Lect. 16 Clock and Data Recovery 3 1 CDR Design Example ( 권대현 ) Clock and Data Recovery Circuits Transceiver PLL vs. CDR High-speed CDR Phase Detector Charge Pump Voltage Controlled

More information

A Four-Terabit Single-Stage Packet Switch with Large. Round-Trip Time Support. F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I.

A Four-Terabit Single-Stage Packet Switch with Large. Round-Trip Time Support. F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I. A Four-Terabit Single-Stage Packet Switch with Large Round-Trip Time Support F. Abel, C. Minkenberg, R. Luijten, M. Gusat, and I. Iliadis IBM Research, Zurich Research Laboratory, CH-8803 Ruschlikon, Switzerland

More information

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth

More information

Reconfigurable PLL for Digital System

Reconfigurable PLL for Digital System International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 6, Number 3 (2013), pp. 285-291 International Research Publication House http://www.irphouse.com Reconfigurable PLL for

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

Package level Interconnect Options

Package level Interconnect Options Package level Interconnect Options J.Balachandran,S.Brebels,G.Carchon, W.De Raedt, B.Nauwelaers,E.Beyne imec 2005 SLIP 2005 April 2 3 Sanfrancisco,USA Challenges in Nanometer Era Integration capacity F

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

File: 'ReportV37P-CT89533DanSuo.doc' CMPEN 411, Spring 2013, Homework Project 9 chip, 'Tiny Chip' fabricated through MOSIS program

File: 'ReportV37P-CT89533DanSuo.doc' CMPEN 411, Spring 2013, Homework Project 9 chip, 'Tiny Chip' fabricated through MOSIS program MOSIS Chip Test Report Dan Suo File: 'ReportV37P-CT89533DanSuo.doc' CMPEN 411, Spring 2013, Homework Project 9 chip, 'Tiny Chip' fabricated through MOSIS program Technology: 0.5um CMOS, ON Semiconductor

More information

Timing for Optical Transmission Network (OTN) Equipment. Slobodan Milijevic Maamoun Seido

Timing for Optical Transmission Network (OTN) Equipment. Slobodan Milijevic Maamoun Seido Timing for Optical Transmission Network (OTN) Equipment Slobodan Milijevic Maamoun Seido Agenda Overview of timing in OTN De-synchronizer (PLL) Requirements of OTN Phase Gain Loop Bandwidth Frequency Conversion

More information

High-speed Serial Interface

High-speed Serial Interface High-speed Serial Interface Lect. 8 SERES 1 High-Speed Circuits and Systems Lab., Yonsei University 2013-1 Block diagram Where are we today? Serializer Tx river Channel Rx Equalizer Sampler eserializer

More information

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1

Future of Interconnect Fabric A Contrarian View. Shekhar Borkar June 13, 2010 Intel Corp. 1 Future of Interconnect Fabric A ontrarian View Shekhar Borkar June 13, 2010 Intel orp. 1 Outline Evolution of interconnect fabric On die network challenges Some simple contrarian proposals Evaluation and

More information

Multi-gigabit Switching and Routing

Multi-gigabit Switching and Routing Multi-gigabit Switching and Routing Gignet 97 Europe: June 12, 1997. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science nickm@ee.stanford.edu http://ee.stanford.edu/~nickm

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

A 2 Gb/s Asymmetric Serial Link for High-Bandwidth Packet Switches

A 2 Gb/s Asymmetric Serial Link for High-Bandwidth Packet Switches A 2 Gb/s Asymmetric Serial Link for High-Bandwidth Packet Switches Ken K. -Y. Chang, William Ellersick, Shang-Tse Chuang, Stefanos Sidiropoulos, Mark Horowitz, Nick McKeown: Computer System Laboratory,

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp.

Networks for Multi-core Chips A A Contrarian View. Shekhar Borkar Aug 27, 2007 Intel Corp. Networks for Multi-core hips A A ontrarian View Shekhar Borkar Aug 27, 2007 Intel orp. 1 Outline Multi-core system outlook On die network challenges A simple contrarian proposal Benefits Summary 2 A Sample

More information

CAP+ CAP. Loop Filter

CAP+ CAP. Loop Filter STS-12/STS-3 Multirate Clock and Data Recovery Unit FEATURES Performs clock and data recovery for 622.08 Mbps (STS-12/OC-12/STM-4) or 155.52 Mbps (STS-3/OC-3/STM-1) NRZ data 19.44 MHz reference frequency

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

CSE 123A Computer Networks

CSE 123A Computer Networks CSE 123A Computer Networks Winter 2005 Lecture 8: IP Router Design Many portions courtesy Nick McKeown Overview Router basics Interconnection architecture Input Queuing Output Queuing Virtual output Queuing

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

AMchip architecture & design

AMchip architecture & design Sezione di Milano AMchip architecture & design Alberto Stabile - INFN Milano AMchip theoretical principle Associative Memory chip: AMchip Dedicated VLSI device - maximum parallelism Each pattern with private

More information

Streaming Encryption for a Secure Wavelength and Time Domain Hopped Optical Network

Streaming Encryption for a Secure Wavelength and Time Domain Hopped Optical Network treaming Encryption for a ecure Wavelength and Time Domain Hopped Optical Network Herwin Chan, Alireza Hodjat, Jun hi, Richard Wesel, Ingrid Verbauwhede {herwin, ahodjat, junshi, wesel, ingrid} @ ee.ucla.edu

More information

An Introduction to Programmable Logic

An Introduction to Programmable Logic Outline An Introduction to Programmable Logic 3 November 24 Transistors Logic Gates CPLD Architectures FPGA Architectures Device Considerations Soft Core Processors Design Example Quiz Semiconductors Semiconductor

More information

ASNT1016-PQA 16:1 MUX-CMU

ASNT1016-PQA 16:1 MUX-CMU 16:1 MUX-CMU 16 to 1 multiplexer (MUX) with integrated CMU (clock multiplication unit). PLL-based architecture featuring both counter and forward clocking modes. Supports multiple data rates in the 9.8-12.5Gb/s

More information

EECS150 - Digital Design Lecture 17 Memory 2

EECS150 - Digital Design Lecture 17 Memory 2 EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit

More information

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues

More information

10 Gbit/s Transmitter MUX with Re-timing GD16585/GD16589 (FEC)

10 Gbit/s Transmitter MUX with Re-timing GD16585/GD16589 (FEC) 10 Gbit/s Transmitter MUX with Re-timing GD16585/GD16589 (FEC) Preliminary General Description GD16585 and GD16589 are transmitter chips used in SDH STM-64 and SONET OC-192 optical communication systems.

More information

A mm 2 780mW Fully Synthesizable PLL with a Current Output DAC and an Interpolative Phase-Coupled Oscillator using Edge Injection Technique

A mm 2 780mW Fully Synthesizable PLL with a Current Output DAC and an Interpolative Phase-Coupled Oscillator using Edge Injection Technique A 0.0066mm 2 780mW Fully Synthesizable PLL with a Current Output DAC and an Interpolative Phase-Coupled Oscillator using Edge Injection Technique Wei Deng, Dongsheng Yang, Tomohiro Ueno, Teerachot Siriburanon,

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

16th Microelectronics Workshop Oct 22-24, 2003 (MEWS-16) Tsukuba Space Center JAXA

16th Microelectronics Workshop Oct 22-24, 2003 (MEWS-16) Tsukuba Space Center JAXA 16th Microelectronics Workshop Oct 22-24, 2003 (MEWS-16) Tsukuba Space Center JAXA 1 The proposed presentation explores the use of commercial processes, including deep-sub micron process technology, package

More information

Packet Switch Architectures Part 2

Packet Switch Architectures Part 2 Packet Switch Architectures Part Adopted from: Sigcomm 99 Tutorial, by Nick McKeown and Balaji Prabhakar, Stanford University Slides used with permission from authors. 999-000. All rights reserved by authors.

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

A Single Chip Shared Memory Switch with Twelve 10Gb Ethernet Ports

A Single Chip Shared Memory Switch with Twelve 10Gb Ethernet Ports A Single Chip Shared Memory Switch with Twelve 10Gb Ethernet Ports Takeshi Shimizu, Yukihiro Nakagawa, Sridhar Pathi, Yasushi Umezawa, Takashi Miyoshi, Yoichi Koyanagi, Takeshi Horie, Akira Hattori Hot

More information

RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing

RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing RHiNET-3/SW: an 0-Gbit/s high-speed network switch for distributed parallel computing S. Nishimura 1, T. Kudoh 2, H. Nishi 2, J. Yamamoto 2, R. Ueno 3, K. Harasawa 4, S. Fukuda 4, Y. Shikichi 4, S. Akutsu

More information

Nevis ADC Design. Jaroslav Bán. Columbia University. June 4, LAr ADC Review. LAr ADC Review. Jaroslav Bán

Nevis ADC Design. Jaroslav Bán. Columbia University. June 4, LAr ADC Review. LAr ADC Review. Jaroslav Bán Nevis ADC Design Columbia University June 4, 2014 Outline The goals of the project Introductory remarks The road toward the design Components developed in Nevis09, Nevis10 and Nevis12 Nevis13 chip Architecture

More information

CS310 Embedded Computer Systems. Maeng

CS310 Embedded Computer Systems. Maeng 1 INTRODUCTION (PART II) Maeng Three key embedded system technologies 2 Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for

More information

CHAPTER 4 DUAL LOOP SELF BIASED PLL

CHAPTER 4 DUAL LOOP SELF BIASED PLL 52 CHAPTER 4 DUAL LOOP SELF BIASED PLL The traditional self biased PLL is modified into a dual loop architecture based on the principle widely applied in clock and data recovery circuits proposed by Seema

More information

CCS Technical Documentation NHL-2NA Series Transceivers. Camera Module

CCS Technical Documentation NHL-2NA Series Transceivers. Camera Module CCS Technical Documentation NHL-2NA Series Transceivers Issue 1 07/02 Nokia Corporation NHL-2NA CCS Technical Documentation [This page left intentionally blank] Page 2 Nokia Corporation Issue 1 07/02 CCS

More information

Development and research of different architectures of I 2 C bus controller. E. Vasiliev, MIET

Development and research of different architectures of I 2 C bus controller. E. Vasiliev, MIET Development and research of different architectures of I 2 C bus controller E. Vasiliev, MIET I2C and its alternatives I²C (Inter-Integrated Circuit) is a multi-master serial computer bus invented by Philips

More information

A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment.

A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment. A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25 µm CMOS technology for applications in the LHC environment. 8th Workshop on Electronics for LHC Experiments 9-13 Sept.

More information

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS

TECHNOLOGY BRIEF. Double Data Rate SDRAM: Fast Performance at an Economical Price EXECUTIVE SUMMARY C ONTENTS TECHNOLOGY BRIEF June 2002 Compaq Computer Corporation Prepared by ISS Technology Communications C ONTENTS Executive Summary 1 Notice 2 Introduction 3 SDRAM Operation 3 How CAS Latency Affects System Performance

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory

More information

Axcelerator Family FPGAs

Axcelerator Family FPGAs Product Brief Axcelerator Family FPGAs u e Leading-Edge Performance 350+ MHz System Performance 500+ MHz Internal Performance High-Performance Embedded s 700 Mb/s LVDS Capable I/Os Specifications Up to

More information

Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999

Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999 Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999 Group Members: Overview Tom Fountain (fountain@cs.stanford.edu) T.J. Giuli (giuli@cs.stanford.edu) Paul Lassa (lassa@relgyro.stanford.edu)

More information

10 Gbit/s Transmitter MUX with Re-timing GD16585/GD16589 (FEC)

10 Gbit/s Transmitter MUX with Re-timing GD16585/GD16589 (FEC) an Intel company 10 Gbit/s Transmitter MUX with Re-timing GD16585/GD16589 (FEC) Preliminary General Description GD16585 and GD16589 are transmitter chips used in SDH STM-64 and SONET OC-192 optical communication

More information

Designing High-Speed ATM Switch Fabrics by Using Actel FPGAs

Designing High-Speed ATM Switch Fabrics by Using Actel FPGAs pplication Note C105 esigning High-Speed TM Switch Fabrics by Using ctel FPGs The recent upsurge of interest in synchronous Transfer Mode (TM) is based on the recognition that it represents a new level

More information

PHOTONIC ATM FRONT-END PROCESSOR

PHOTONIC ATM FRONT-END PROCESSOR PHOTONIC ATM FRONT-END PROCESSOR OBJECTIVES: To build a photonic ATM front-end processor including the functions of virtual channel identifier (VCI) over-write and cell synchronization for future photonic

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

The RM9150 and the Fast Device Bus High Speed Interconnect

The RM9150 and the Fast Device Bus High Speed Interconnect The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 4: Memory Hierarchy Memory Taxonomy SRAM Basics Memory Organization DRAM Basics Zeshan Chishti Electrical and Computer Engineering Dept Maseeh College of Engineering

More information

Design and Implementation of High Performance Application Specific Memory

Design and Implementation of High Performance Application Specific Memory Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics

More information

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture

Physical Implementation of the DSPIN Network-on-Chip in the FAUST Architecture 1 Physical Implementation of the DSPI etwork-on-chip in the FAUST Architecture Ivan Miro-Panades 1,2,3, Fabien Clermidy 3, Pascal Vivet 3, Alain Greiner 1 1 The University of Pierre et Marie Curie, Paris,

More information

Microcomputers. Outline. Number Systems and Digital Logic Review

Microcomputers. Outline. Number Systems and Digital Logic Review Microcomputers Number Systems and Digital Logic Review Lecture 1-1 Outline Number systems and formats Common number systems Base Conversion Integer representation Signed integer representation Binary coded

More information

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: Problem 1: CLD2 Problems. (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: C 0 = A + BD + C + BD C 1 = A + CD + CD + B C 2 = A + B + C + D C 3 = BD + CD + BCD + BC C 4

More information

3. The high voltage level of a digital signal in positive logic is : a) 1 b) 0 c) either 1 or 0

3. The high voltage level of a digital signal in positive logic is : a) 1 b) 0 c) either 1 or 0 1. The number of level in a digital signal is: a) one b) two c) four d) ten 2. A pure sine wave is : a) a digital signal b) analog signal c) can be digital or analog signal d) neither digital nor analog

More information

2. Link and Memory Architectures and Technologies

2. Link and Memory Architectures and Technologies 2. Link and Memory Architectures and Technologies 2.1 Links, Thruput/Buffering, Multi-Access Ovrhds 2.2 Memories: On-chip / Off-chip SRAM, DRAM 2.A Appendix: Elastic Buffers for Cross-Clock Commun. Manolis

More information

EECS150 - Digital Design Lecture 16 Memory 1

EECS150 - Digital Design Lecture 16 Memory 1 EECS150 - Digital Design Lecture 16 Memory 1 March 13, 2003 John Wawrzynek Spring 2003 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: Whenever a large collection of state elements is required. data &

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey

Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Digital Integrated Circuits A Design Perspective Jan M. Rabaey Outline (approximate) Introduction and Motivation The VLSI Design Process Details of the MOS Transistor Device Fabrication Design Rules CMOS

More information

6. Latches and Memories

6. Latches and Memories 6 Latches and Memories This chapter . RS Latch The RS Latch, also called Set-Reset Flip Flop (SR FF), transforms a pulse into a continuous state. The RS latch can be made up of two interconnected

More information

Field Programmable Gate Array (FPGA)

Field Programmable Gate Array (FPGA) Field Programmable Gate Array (FPGA) Lecturer: Krébesz, Tamas 1 FPGA in general Reprogrammable Si chip Invented in 1985 by Ross Freeman (Xilinx inc.) Combines the advantages of ASIC and uc-based systems

More information

1. NoCs: What s the point?

1. NoCs: What s the point? 1. Nos: What s the point? What is the role of networks-on-chip in future many-core systems? What topologies are most promising for performance? What about for energy scaling? How heavily utilized are Nos

More information

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing

Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Design of Optical Burst Switches based on Dual Shuffle-exchange Network and Deflection Routing Man-Ting Choy Department of Information Engineering, The Chinese University of Hong Kong mtchoy1@ie.cuhk.edu.hk

More information

AN-12. Use Quickswitch Bus Switches to Make Large, Fast Dual Port RAMs. Dual Port RAM. Figure 1. Dual Port RAM in Dual Microprocessor System

AN-12. Use Quickswitch Bus Switches to Make Large, Fast Dual Port RAMs. Dual Port RAM. Figure 1. Dual Port RAM in Dual Microprocessor System Q QUALITY SEMICONDUCTOR, INC. AN-12 Use Quickswitch Bus Switches to Make Large, Fast Dual Port s Application Note AN-12 Abstract Dual port s are effective devices for highspeed communication between microprocessors.

More information

Lab 2. Standard Cell layout.

Lab 2. Standard Cell layout. Lab 2. Standard Cell layout. The purpose of this lab is to demonstrate CMOS-standard cell design. Use the lab instructions and the cadence manual (http://www.es.lth.se/ugradcourses/cadsys/cadence.html)

More information

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks

Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks Ultra-Low Latency, Bit-Parallel Message Exchange in Optical Packet Switched Interconnection Networks O. Liboiron-Ladouceur 1, C. Gray 2, D. Keezer 2 and K. Bergman 1 1 Department of Electrical Engineering,

More information

Router Architectures

Router Architectures Router Architectures Venkat Padmanabhan Microsoft Research 13 April 2001 Venkat Padmanabhan 1 Outline Router architecture overview 50 Gbps multi-gigabit router (Partridge et al.) Technology trends Venkat

More information

High-Speed NAND Flash

High-Speed NAND Flash High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The

More information

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1

EE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1 EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What

More information

Chapter 8 Memory Basics

Chapter 8 Memory Basics Logic and Computer Design Fundamentals Chapter 8 Memory Basics Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Memory definitions Random Access

More information

Chapter 6. CMOS Functional Cells

Chapter 6. CMOS Functional Cells Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

Versatile RRAM Technology and Applications

Versatile RRAM Technology and Applications Versatile RRAM Technology and Applications Hagop Nazarian Co-Founder and VP of Engineering, Crossbar Inc. Santa Clara, CA 1 Agenda Overview of RRAM Technology RRAM for Embedded Memory Mass Storage Memory

More information

CS150 Fall 2012 Solutions to Homework 6

CS150 Fall 2012 Solutions to Homework 6 CS150 Fall 2012 Solutions to Homework 6 October 6, 2012 Problem 1 a.) Answer: 0.09 ns This delay is given in Table 65 as T ILO, specifically An Dn LUT address to A. b.) Answer: 0.41 ns In Table 65, this

More information

SEE Tolerant Self-Calibrating Simple Fractional-N PLL

SEE Tolerant Self-Calibrating Simple Fractional-N PLL SEE Tolerant Self-Calibrating Simple Fractional-N PLL Robert L. Shuler, Avionic Systems Division, NASA Johnson Space Center, Houston, TX 77058 Li Chen, Department of Electrical Engineering, University

More information

HEAT (Hardware enabled Algorithmic tester) for 2.5D HBM Solution

HEAT (Hardware enabled Algorithmic tester) for 2.5D HBM Solution HEAT (Hardware enabled Algorithmic tester) for 2.5D HBM Solution Kunal Varshney, Open-Silicon Ganesh Venkatkrishnan, Open-Silicon Pankaj Prajapati, Open-Silicon May 9, 9, 2016 1 Agenda High Bandwidth Memory

More information

CS152 Computer Architecture and Engineering Lecture 16: Memory System

CS152 Computer Architecture and Engineering Lecture 16: Memory System CS152 Computer Architecture and Engineering Lecture 16: System March 15, 1995 Dave Patterson (patterson@cs) and Shing Kong (shing.kong@eng.sun.com) Slides available on http://http.cs.berkeley.edu/~patterson

More information

MTJ-Based Nonvolatile Logic-in-Memory Architecture

MTJ-Based Nonvolatile Logic-in-Memory Architecture 2011 Spintronics Workshop on LSI @ Kyoto, Japan, June 13, 2011 MTJ-Based Nonvolatile Logic-in-Memory Architecture Takahiro Hanyu Center for Spintronics Integrated Systems, Tohoku University, JAPAN Laboratory

More information

Synchronous Optical Networks (SONET) Advanced Computer Networks

Synchronous Optical Networks (SONET) Advanced Computer Networks Synchronous Optical Networks (SONET) Advanced Computer Networks SONET Outline Brief History SONET Overview SONET Rates SONET Ring Architecture Add/Drop Multiplexor (ADM) Section, Line and Path Virtual

More information

DI-SSD: Desymmetrized Interconnection Architecture and Dynamic Timing Calibration for Solid-State Drives

DI-SSD: Desymmetrized Interconnection Architecture and Dynamic Timing Calibration for Solid-State Drives DI-SSD: Desymmetrized Interconnection Architecture and Dynamic Timing Calibration for Solid-State Drives Ren-Shuo Liu and Jian-Hao Huang System and Storage Design Lab Department of Electrical Engineering

More information

The Future of Electrical I/O for Microprocessors. Frank O Mahony Intel Labs, Hillsboro, OR USA

The Future of Electrical I/O for Microprocessors. Frank O Mahony Intel Labs, Hillsboro, OR USA The Future of Electrical I/O for Microprocessors Frank O Mahony frank.omahony@intel.com Intel Labs, Hillsboro, OR USA 1 Outline 1TByte/s I/O: motivation and challenges Circuit Directions Channel Directions

More information

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,

More information

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J.

Marching Memory マーチングメモリ. UCAS-6 6 > Stanford > Imperial > Verify 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. UCAS-6 6 > Stanford > Imperial > Verify 2011 Marching Memory マーチングメモリ Tadao Nakamura 中村維男 Based on Patent Application by Tadao Nakamura and Michael J. Flynn 1 Copyright 2010 Tadao Nakamura C-M-C Computer

More information

Synchronous Optical Networks SONET. Computer Networks: SONET

Synchronous Optical Networks SONET. Computer Networks: SONET Synchronous Optical Networks SONET 1 Telephone Networks {Brief History} Digital carrier systems The hierarchy of digital signals that the telephone network uses. Trunks and access links organized in DS

More information

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers

Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers Toward a unified architecture for LAN/WAN/WLAN/SAN switches and routers Silvano Gai 1 The sellable HPSR Seamless LAN/WLAN/SAN/WAN Network as a platform System-wide network intelligence as platform for

More information

ASIC Physical Design Top-Level Chip Layout

ASIC Physical Design Top-Level Chip Layout ASIC Physical Design Top-Level Chip Layout References: M. Smith, Application Specific Integrated Circuits, Chap. 16 Cadence Virtuoso User Manual Top-level IC design process Typically done before individual

More information

3. Implementing Logic in CMOS

3. Implementing Logic in CMOS 3. Implementing Logic in CMOS 3. Implementing Logic in CMOS Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 27 September, 27 ECE Department,

More information

ASNT7120-KMA 10GS/s, 4-bit Flash Analog-to-Digital Converter

ASNT7120-KMA 10GS/s, 4-bit Flash Analog-to-Digital Converter 10GS/s, 4-bit Flash Analog-to-Digital Converter 18GHz analog input bandwidth. Selectable clocking mode: external high-speed clock or internal PLL with external lowspeed reference clock. Broadband operation

More information

A 20 GSa/s 8b ADC with a 1 MB Memory in 0.18 µm CMOS

A 20 GSa/s 8b ADC with a 1 MB Memory in 0.18 µm CMOS A 20 GSa/s 8b ADC with a 1 MB Memory in 0.18 µm CMOS Ken Poulton, Robert Neff, Brian Setterberg, Bernd Wuppermann, Tom Kopley, Robert Jewett, Jorge Pernillo, Charles Tan, Allen Montijo 1 Agilent Laboratories,

More information

A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter

A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter A 32nm, 0.9V Supply-Noise Sensitivity Tracking PLL for Improved Clock Data Compensation Featuring a Deep Trench Capacitor Based Loop Filter Bongjin Kim, Weichao Xu, and Chris H. Kim University of Minnesota,

More information

Token Bit Manager for the CMS Pixel Readout

Token Bit Manager for the CMS Pixel Readout Token Bit Manager for the CMS Pixel Readout Edward Bartz Rutgers University Pixel 2002 International Workshop September 9, 2002 slide 1 TBM Overview Orchestrate the Readout of Several Pixel Chips on a

More information