Memory Intensive Architectures for DSP and Data Communication Pronita Mehrotra, Paul Franzon

Size: px
Start display at page:

Download "Memory Intensive Architectures for DSP and Data Communication Pronita Mehrotra, Paul Franzon"

Transcription

1 Memory Intensive Architectures for DSP and Data Communication Pronita Mehrotra, Paul Franzon Department of Electrical and Computer Engineering North Carolina State University

2 Outline Objectives Approach Signal Integrity and Routability Algorithms and DRAM Architecture Memory Mapping Scheme Twiddle Factor Generation Scheme Analysis of FFT Architecture and Performance Forwarding Schemes for Router Routing Scheme based on Compaction Binary Search Based Routing Scheme SHOCC High Density Packaging technology increases potential performance of a 1GB DSP system by a factor of about 20 Lower Memory Requirements for the Router 1

3 Motivation & Approach Radar processor for future UAVs Large problem size (1 GB, 1 M point FFTs) High-performance/Low-volume Leverage High Density Packaging Utilize SHOCC (Seamless High Off Chip Connectivity) iallows 128 parallel 16-bit wide memory channels i Number of channels limited by signal integrity and routability Designed 2,048-bit, 250 MHz, memory bus Determine architecture that maximizes the potential of this memory bandwidth 2

4 Physical Design High Density Substrate (8cm x 8cm) Edge-mounted commercial DDR DRAMs - approx. 2 mm pitch µm solder bump pitch (today) ( => ~ 120 pins => up to x36 memory) - 2 sets of 64x64 Mbit - organized as multiple independent banks Mbps per pin SHOCC-mounted - Better availability than RAMBUS Identical Accelerator ICs bare die (64 Multiplier-Accumulators) + more certain SI issues (approx. 1 sq.cm.) -interconnected by 2GHz, 128-bit bus 3

5 Substrate Stack-up 5µ BCB (5µ) BCB (5µ) BCB (5µ) BCB (10µ) BCB (5µ) Si Substrate Signal layer S1 (2µ) Signal layer S2 (2µ) (local ground) Signal layer S3 (2µ) Gnd/Pow planes (2µ) S2 acting as the local ground reduces the coupling between S1 and S3 Maxwell Q-3D (Ansoft) parameter extractor used to determine R,L,C 4

6 Routing Approach 2-Stage breakout routing approach: 13 µm Breakout Pitch (2 layers) 26 µm Intermediate pitch (1 layer) 36 µm final routing Parallel Routing S1 Gnd S2 S1 S2 Pitch decided by crosstalk limitations X-Y routing 5

7 SI Issues for High Density Wiring 0.25 µm CMOS Technology: DC NM = 1.04V Our design uses an upper limit of 0.7V Noise Sources: Crosstalk iespecially in the breakout region SSN Reflection Noise ipotential Issue for long, wide memory wiring 6

8 Equivalent Circuit (SHOCC Line) Dr. Dr. R oc L oc C oc R bump L bump C bump SHOCC line line model R bump L bump C bump R oc L oc C oc Rec. Input Signal: 2ns pulse with a rise time of 80ps Driver: 5 stage driver with a stage ratio of 3 7

9 SHOCC line model (Crosstalk) Signal (top) R/n L/n C/2n C mtt L mtt C/2n Signal (bottom) R/n L/n C/2n C mtb L mtb C/2n Signal (top) R/n L/n C/2n C mtb L mtb C/2n Signal (bottom) R/n L/n C/2n Signal (top) R/n C mtt L mtt L/n C/2n C/2n C/2n 8

10 Crosstalk Noise in Different Regions Crosstalk Noise (mv) bottom (S3) top (S1) Crosstalk Noise (mv) bottom (S3) top (S1) (a) Length (cm) (b) Length (cm) Crosstalk Noise (mv) (c) Length (cm) bottom (S3) top (S1) Crosstalk Noise for: (a) 13µ initial breakout pitch (b) 26µ XY routing (c) 36µ routing (under DRAMs) Trace Width in all cases = 10µ 9

11 Reflection Noise 50 Reflection Noise (mv) bottom (S3) top (S1) Reflection Noise for 36µ routing Length (cm) Reflection Noise constitutes a fairly small percentage of the total noise 10

12 Delays in Different SHOCC regions Delay (ns) bottom (S3) top (S1) Delay (ns) bottom (S3) top (S1) (a) Length (cm) (b) Length (cm) Delay (ns) bottom (S3) top (S1) Worst Case delay for: (a) 13µ initial breakout pitch (b) 26µ XY routing (c) 36µ routing (under DRAMs) Trace Width in all cases = 10µ (c) Length (cm) 11

13 Noise and Timing Analysis For a 1cm x 1cm chip (with 2500 I/O pins), the escape lengths in the two regions are 1cm and 0.8cm For the various routing regions, crosstalk noise is 0.19V + 0.2V V 0.56V The reflection noise is approximately 0.03V The total RSS noise is 0.6V (with an SSN of 0.2V). This is within the Noise Margin of 0.7V for a 0.25µ technology The worst case off-chip skew on an 8cm x 8cm substrate is around 0.2ns. After adding factors for on-chip skew and jitter, we can have a cycle time of at least 2ns This gives an I/O bandwidth > 100GByte/sec 12

14 DRAM Timing Issues DRAM organized in banks and rows: Row address Sense Amps Column Address Bank Address Data Word irandom access takes 60 ns ia new bank can be accessed every 15 ns ia different entry within the row most recently accessed can be read or written in 4 ns However, in the FFT described next we can sustain 98% of peak bandwidth SRAM performance at DRAM prices 13

15 FFT Architectural Issues Conventional FFT implementation would spend most time in only one memory channel Developed staggered channel algorithm Need to maximize page mode access in DRAMs Developed novel memory map scheme for data Conventional FFT stores twiddle factors in main memory Instead we regenerate them on-the-fly in the datapath during otherwise dead cycles 14

16 Micro-accelerator IC 32-bit FP arithmetic units MEM MEM MEM X X X SRAM 1 cm X X X MEM MEM MEM 1 cm For a 0.25µ technology, a 32 bit multiplier and adder would take up an area < 1mm 2. A 1cm 2 chip area can hold enough hardware to make a fully parallel 16 point FFT Micro-Accelerator: control reconfigures IC and manages MEM interface 64 multipliers and adders per chip bit mem interface units 0.5KB SRAM to store twiddle factors Four chips work together to give a radix 64 FFT engine 15

17 Performance of the FFT Engine A 32-bit multiply-accumulate unit, in 0.25µ technology, takes < 2ns to execute A 64-point FFT (including the twiddling) can be done in < 32ns. By pipelining the FFT into two stages, a result can be obtained every 20ns 4 8-point units units 4 8-point units units Read Read 4 8-point units units 4 8-point units units Write Write 20 ns + penalty for new page access 20 ns 20 ns 20 ns + penalty for new page access 16

18 Address-Mapping Algorithm Key to success when using DRAM Maximize page-mode accesses At each stage The result set of each 64-point FFT is written to different DRAMs according to the following relation DRAM# = (FFT# + Index) % 64 where, FFT# is (index/64) Resulting performance: Most of the new-page penalty is hidden by bank operations 1.31ms for 4 stage million-point FFT Within 1.6% of perfect SRAM performance 17

19 ...Addressing Scheme Example: The indices and memory layout of the data after the end of the first stage is shown for one row in each of the DRAM s as an illustrative example. The other stages are the same. 0, 0, , , 0, , 1, DRAM 0 DRAM 1 The inputs for the next stage are now arranged in different DRAM s, allowing full exploitation of the memory bandwidth. 63, 63, , 63, DRAM 63 This shuffling of data after reading, for the next stage, is easily implemented using shift registers. 18

20 ...Addressing Scheme 12288, B , B3 8192, B2 8193, B2 4096, B1 4097, B1 0, B0 1, B , B3 8255, B2 4159, B1 63, B0 DRAM 0 DRAM 1 DRAM 63 Reads: Row # = FFT#/4 Bank # = FFT# % 4 Writes: Row # = FFT#/256 Bank # =( FFT#/64 ) % 4 Where FFT# = index/64 19

21 Twiddle Factor Generation A one-dimensional input array (N) can be manipulated as a two-dimensional array (LxM) X ( s, r) = M 1 m= 0 W Lmr W L 1 ms For a Radix-64 FFT, L=64 The results of 64-point FFT need to be multiplied with the twiddle factors, W ms, where W ms = e j( 2π / N ) ms l= 0 x( l, m) W Msl 20

22 Twiddle Factor Generation For all stages, s varies from 0 to 63. By storing an initial set of twiddle factors (m=1), subsequent twiddle factors in the same stage can be generated by multiplying current factors by the initial factors W ms N = W ( m 1) s N Whenever, m reaches 64, an initial twiddle factor set can be generated for the next stage W 64 s 1. s W N = WN / s N 21

23 Scheduling FFT Operations The first 2 stages of an 8-point FFT do not involve any multiplications. The free multipliers can be used for generating twiddle factors needed later. 1st two stages of 8-point FFT Generation of twiddle factors 3rd stage twiddling the final results 2ns 4ns 6ns 8ns 10ns 12ns time 22

24 FFT Performance Discussion 1,048,576 point FFT in 1.31 ms 892 FFT/s 1.44 x FLOPS 127 GBps sustained memory performance Commercial comparisons BOPS Inc. System: i 80 sq.cm. of PCB, 4-32-bit memory channels, 4 PEs with each PE having 5 FP units 21.5 ms to perform one million point FFT Motorola s Altivecs: 128 bit vector execution unit with 4 parallel executions, simultaneous load of 4 IEEE floats 511 ms for a million point FFT 23

25 Optical Burst Switching (OBS) Need to decouple the transmission/switching from forwarding/routing One control channel that goes through O/E/O conversion Data cuts through nodes without any conversion Just-in-Time signaling protocol for burst transmission Transmit packet after some delay without waiting for confirmation CALLING HOST CALLING SWITCH CALLED SWITCH CALLED HOST PROCESSING DELAY SETUP CALL PROC CROSSCONNECT CONFIGURED CONNECT SETUP OPTICAL BURST CONNECT SETUP CONNECT 24

26 OBS Node Architecture ICC #1 Input Module Input Module ICC #N Router Buffer and Scheduler Output Module Output Module Input Fiber #1 Demux Demux IDC #1 FDL ODC #1 Mux Output Fiber #1 Input Fiber #N Demux Demux IDC #N FDL Switching Switching Fabric Fabric ODC #N Mux Output Fiber #N Optical blocks Electrical blocks 25

27 Message Engine Message generator Data Bus Message Parsing and Header Verification TTL and CRC update Route Lookup Scheduler Switch Control Exception Handler Hard Path Soft Path To Software SRAM/ DRAM SRAM/ DRAM 26

28 Forwarding Engine The bottleneck of the forwarding engine is the route lookup Speed Reduce the number of lookups esp. in main memory inumber of memory accesses 2-9 (IPV4) ipartition data to ease hardware pipelining iexisting schemes take ns (average time) for address lookup Scalability Reduce the amount of memory required to store data idirect/indirect lookup schemes use memory inefficiently itree Based Schemes better 27

29 Trie Vs. Tree 0 1 < > < > < > < > < > < > < > Binary Trie Binary Tree Memory Accesses: Binary Trie: Number of address bits (32 for IPv4) Binary Tree: log 2 (N) ( 16 for 64K entries) *Nick McKeown, Balaji Prabhakar, High Performance Switches and Routers: Theory and Practice, Hot Interconnects Tutorial Slides (

30 Trie Based Schemes: Direct Lookup An entry for each address Inefficient use of memory Very poor scalability Trie of depth=1 and degree=2 B Lookup Time = 1 cycle (60ns) B bits Address 2 B bits Required Memory Size 1.00E E E E E E E E Address Bits 1,000 DRAM chips 29

31 Trie Based Schemes: Indirect Lookup Address split in 2 or more parts* Somewhat better use of memory Poor scalability B 1 B 2 B 1 Lookup Time = N cycles (N=no. of segments in the address) Memory Requirement = Depends on the routing table. Can reduce memory usage by using variable offset length B 2 *P. Gupta, S. Lin, N. McKeown, Routing Lookups in hardware at memory access speeds, in Proc. IEEE Infocom 98, Session 10B-1, San Francisco, CA, pp

32 Trie Based Schemes: Trie Optimizations Memory usage optimal Lookup Time = H cycles (H=depth of No Prefix tree) Binary tree Skip= Path-compressed (Patricia) tree Skip= Level-compressed (LC) tree *S. Nilsson, G. Karlsson, IP-Address Lookup Using LC- Tries, IEEE Journal on Selected Areas in Communications, Vol. 17, No.6, June 1999, pp

33 Trie or Tree? Issues with Trie Based Schemes: Extra Nodes with no data add to the depth of the tree imore Memory Accesses Needed Search time proportional to the size of the address ibinary Trie for Ipv4 can take up to 32 cycles ifor IPv6 the worst case could be 128 cycles. Issues with Tree Based Schemes Binary Search works for exact matching ibacktracking or wrong paths iunbalanced Approaches Pre-processing overhead higher 32

34 Tree Based Schemes: Binary Search Encoding prefixes as ranges Multiway search to reduce search time from log 2 N to log k+1 N Pre-computed table of best matching prefixes for the first Y bits Worst Case Lookup time =490ns (>32,000 entries) Patricia Binary 16 bit + binary 16 bit + 6 way Worst case search (ns) Worst case relative to Patricia *B. Lampson, V. Srinivasan, G. Varghese, IP Lookups using Multiway and Multicolumn Search, Infocom 98, Vol. 3, 1998, pp

35 Lookups using Hash Tables Hash Tables organized by prefix lengths hash collisions? Lookup Time = log 2 (address bits) Length Hash Improve performance by binary search of hash tables by using markers in tables corresponding to shorter lengths to point to prefixes of greater lengths *M. Waldvogel, G. Varghese, J. Turner, B. Plattner, Scalable High Speed IP Routing Lookups, ACM Comput. Commun. Rev., Vol. 27, Oct. 1997, pp

36 Proposed Scheme Using Compaction Store path information in a smaller ( 250x than forwarding table), faster, wide ( 1000 bits) on-chip SRAM Few SRAM and one DRAM lookups Store a table containing number of 1 s in each level Additionally, for each row of SRAM, the first few bits store number of 1 s till the previous row in that level First 16 bits or so can be direct mapped A lookup can be done every 60-65ns (14-15 million lookups per second) 35

37 Proposed Scheme Using Compaction On-chip SRAM and Off-chip DRAM >1000 bit wide on-chip SRAM For 40,000 prefixes in the routing table, the required SRAM size is less than 5kB 2 sets of these memories can be used to hide the update operations Pipelined SRAM and DRAM operation Only 1 DRAM lookup in all cases One lookup can be done every 60-65ns million lookups per second 36

38 Binary Search Based Proposed Scheme Sorting Prefixes: Two prefixes: A=a 1 a 2 a n B=b 1 b 2 b m If n = m, Compare by numerical value If n m, Chop longer prefix and compare. If chopped prefixes are equal then, the shorter prefix is considered larger After Sorting: 00010*, 0001*, *, *, *, *, 01011*, 01*, *, *, *, *, 1011*, 10*, 110* Prefix Next Hop 10* 7 01* 5 110* * * * * * * * * * * * * 9 Sample Prefix Set* *N. Yazdani, P.S. Min, Fast and Scalable Schemes for IP Address Lookup Problem, Proc. IEEE Conference on High Performance Switching and Routing, pp ,

39 Binary Search Based Proposed Scheme Sorting gives depth-firstsearch of corresponding binary trie Binary Trie constructed as: If A is a prefix of B, then B is the child of A If A < B, then A lies on the left of B Root(*) * * * * *

40 Modified Prefix Table Store Information about all parents in another field Pre-processing requires another step. Update process is O(N) (same as Lampson s scheme) Memory Requirement is 2x lesser * * * Match between * and * is until 4 bits Prefix Next Hop Parent Info * * * * * * * * * * * * * * * Best Matching Prefix is 01* 39

41 Conclusions 2,048-bit memory system buildable in high density packaging technologies Limit determined by Signal Integrity Issues Modeled & Simulated with Ansoft and Hspice FFT Architecture optimized to maximize available memory bandwidth Memory map perfectly matched to DDR DRAM architecture On-the-fly twiddle factor calculation Verified in Verilog model Result 20x faster capable with conventional packaging 40

42 Conclusions Trie-based routing scheme using compaction suggested for smaller address sizes SRAM Size is almost 250x lesser than DRAM size One DRAM access only Binary Search scheme for larger address size Number of memory accesses = log 2 (N) Memory requirement 2x lesser than existing schemes Update Process at O(N). Same as existing schemes 41

43 Future Work FFT Complete Verification (Verilog) Submit journal papers (T.VLSI, CPMT) (Conference paper published - EPEP) Forwarding Engine Verify Routing schemes (high level Verilog) Evaluate pre-processing overheads Evaluate performance against standard routing tables Submit journal paper Conducting scaling studies to support OBS 42

Binary Search Schemes for Fast IP Lookups

Binary Search Schemes for Fast IP Lookups Binary Search Schemes for Fast IP Lookups Pronita Mehrotra Paul D. Franzon Department of Electrical and Computer Engineering North Carolina State University {pmehrot,paulf}@eos.ncsu.edu This research is

More information

Binary Search Schemes for Fast IP Lookups

Binary Search Schemes for Fast IP Lookups 1 Schemes for Fast IP Lookups Pronita Mehrotra, Paul D. Franzon Abstract IP route look up is the most time consuming operation of a router. Route lookup is becoming a very challenging problem due to the

More information

Novel Hardware Architecture for Fast Address Lookups

Novel Hardware Architecture for Fast Address Lookups Novel Hardware Architecture for Fast Address Lookups Pronita Mehrotra Paul D. Franzon Department of Electrical and Computer Engineering North Carolina State University {pmehrot,paulf}@eos.ncsu.edu This

More information

Novel Hardware Architecture for Fast Address Lookups

Novel Hardware Architecture for Fast Address Lookups Novel Hardware Architecture for Fast Address Lookups Pronita Mehrotra, Paul D. Franzon ECE Department, North Carolina State University, Box 7911, Raleigh, NC 27695-791 1, USA Ph: +1-919-515-735 1, Fax:

More information

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results.

Problem Statement. Algorithm MinDPQ (contd.) Algorithm MinDPQ. Summary of Algorithm MinDPQ. Algorithm MinDPQ: Experimental Results. Algorithms for Routing Lookups and Packet Classification October 3, 2000 High Level Outline Part I. Routing Lookups - Two lookup algorithms Part II. Packet Classification - One classification algorithm

More information

Growth of the Internet Network capacity: A scarce resource Good Service

Growth of the Internet Network capacity: A scarce resource Good Service IP Route Lookups 1 Introduction Growth of the Internet Network capacity: A scarce resource Good Service Large-bandwidth links -> Readily handled (Fiber optic links) High router data throughput -> Readily

More information

Frugal IP Lookup Based on a Parallel Search

Frugal IP Lookup Based on a Parallel Search Frugal IP Lookup Based on a Parallel Search Zoran Čiča and Aleksandra Smiljanić School of Electrical Engineering, Belgrade University, Serbia Email: cicasyl@etf.rs, aleksandra@etf.rs Abstract Lookup function

More information

Scalable Name-Based Packet Forwarding: From Millions to Billions. Tian Song, Beijing Institute of Technology

Scalable Name-Based Packet Forwarding: From Millions to Billions. Tian Song, Beijing Institute of Technology Scalable Name-Based Packet Forwarding: From Millions to Billions Tian Song, songtian@bit.edu.cn, Beijing Institute of Technology Haowei Yuan, Patrick Crowley, Washington University Beichuan Zhang, The

More information

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup:

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup: : Computer Networks Lecture 6: March 7, 2005 Fast Address Lookup: Forwarding/Routing Revisited Best-match Longest-prefix forwarding table lookup We looked at the semantics of bestmatch longest-prefix address

More information

Efficient hardware architecture for fast IP address lookup. Citation Proceedings - IEEE INFOCOM, 2002, v. 2, p

Efficient hardware architecture for fast IP address lookup. Citation Proceedings - IEEE INFOCOM, 2002, v. 2, p Title Efficient hardware architecture for fast IP address lookup Author(s) Pao, D; Liu, C; Wu, A; Yeung, L; Chan, KS Citation Proceedings - IEEE INFOCOM, 2002, v 2, p 555-56 Issued Date 2002 URL http://hdlhandlenet/0722/48458

More information

15-744: Computer Networking. Routers

15-744: Computer Networking. Routers 15-744: Computer Networking outers Forwarding and outers Forwarding IP lookup High-speed router architecture eadings [McK97] A Fast Switched Backplane for a Gigabit Switched outer Optional [D+97] Small

More information

INF5050 Protocols and Routing in Internet (Friday ) Subject: IP-router architecture. Presented by Tor Skeie

INF5050 Protocols and Routing in Internet (Friday ) Subject: IP-router architecture. Presented by Tor Skeie INF5050 Protocols and Routing in Internet (Friday 9.2.2018) Subject: IP-router architecture Presented by Tor Skeie High Performance Switching and Routing Telecom Center Workshop: Sept 4, 1997. This presentation

More information

Routing Lookup Algorithm for IPv6 using Hash Tables

Routing Lookup Algorithm for IPv6 using Hash Tables Routing Lookup Algorithm for IPv6 using Hash Tables Peter Korppoey, John Smith, Department of Electronics Engineering, New Mexico State University-Main Campus Abstract: After analyzing of existing routing

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Last Lecture: Network Layer

Last Lecture: Network Layer Last Lecture: Network Layer 1. Design goals and issues 2. Basic Routing Algorithms & Protocols 3. Addressing, Fragmentation and reassembly 4. Internet Routing Protocols and Inter-networking 5. Router design

More information

THE advent of the World Wide Web (WWW) has doubled

THE advent of the World Wide Web (WWW) has doubled IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 17, NO. 6, JUNE 1999 1093 A Novel IP-Routing Lookup Scheme and Hardware Architecture for Multigigabit Switching Routers Nen-Fu Huang, Member, IEEE,

More information

Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables

Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables Disjoint Superposition for Reduction of Conjoined Prefixes in IP Lookup for Actual IPv6 Forwarding Tables Roberto Rojas-Cessa, Taweesak Kijkanjanarat, Wara Wangchai, Krutika Patil, Narathip Thirapittayatakul

More information

250 Mbps Transceiver in LC FB2M5LVR

250 Mbps Transceiver in LC FB2M5LVR 250 Mbps Transceiver in LC FB2M5LVR DATA SHEET 650 nm 250 Mbps Fiber Optic Transceiver with LC Termination LVDS I/O IEC 61754-20 Compliant FEATURES LC click lock mechanism for confident connections Compatible

More information

FPGA Implementation of Lookup Algorithms

FPGA Implementation of Lookup Algorithms 2011 IEEE 12th International Conference on High Performance Switching and Routing FPGA Implementation of Lookup Algorithms Zoran Chicha, Luka Milinkovic, Aleksandra Smiljanic Department of Telecommunications

More information

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline Forwarding and Routers 15-744: Computer Networking L-9 Router Algorithms IP lookup Longest prefix matching Classification Flow monitoring Readings [EVF3] Bitmap Algorithms for Active Flows on High Speed

More information

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table H. Michael Ji, and Ranga Srinivasan Tensilica, Inc. 3255-6 Scott Blvd Santa Clara, CA 95054 Abstract--In this paper we examine

More information

Ethernet OptoLock EDL300T

Ethernet OptoLock EDL300T Ethernet OptoLock EDL300T DATA SHEET 650 nm 100 Mbps Ethernet Fiber Optic Transceiver with Termination for Bare POF Seamless Digital to Light/ Light to Digital Conversion FEATURES Simple low-cost termination

More information

Emerging DRAM Technologies

Emerging DRAM Technologies 1 Emerging DRAM Technologies Michael Thiems amt051@email.mot.com DigitalDNA Systems Architecture Laboratory Motorola Labs 2 Motivation DRAM and the memory subsystem significantly impacts the performance

More information

Fast Update of Forwarding Tables in Internet Router Using AS Numbers Λ

Fast Update of Forwarding Tables in Internet Router Using AS Numbers Λ Fast Update of Forwarding Tables in Internet Router Using AS Numbers Λ Heonsoo Lee, Seokjae Ha, and Yanghee Choi School of Computer Science and Engineering Seoul National University San 56-1, Shilim-dong,

More information

250 Mbps Transceiver in OptoLock IDL300T XXX

250 Mbps Transceiver in OptoLock IDL300T XXX NOT RECOMMENDED FOR NEW DESIGNS * For new designs please see part numbers: FB2M5KVR (2.2 mm POF), FB2M5BVR (1.5 mm POF) 250 Mbps Transceiver in OptoLock IDL300T XXX 650 nm 250 Mbps Fiber Optic Transceiver

More information

The Design of the KiloCore Chip

The Design of the KiloCore Chip The Design of the KiloCore Chip Aaron Stillmaker*, Brent Bohnenstiehl, Bevan Baas DAC 2017: Design Challenges of New Processor Architectures University of California, Davis VLSI Computation Laboratory

More information

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava

Hardware Acceleration in Computer Networks. Jan Kořenek Conference IT4Innovations, Ostrava Hardware Acceleration in Computer Networks Outline Motivation for hardware acceleration Longest prefix matching using FPGA Hardware acceleration of time critical operations Framework and applications Contracted

More information

Binary Search Schemes for Fast IP Lookups

Binary Search Schemes for Fast IP Lookups Schemes for Fast IP Lookups Pronita Mehrotra, Paul D. Franzon ECE Department, North Carolina State University Box 791 1, Raleigh NC 27695 {pmehrot,paulf]@eos.ncsu.edu Abstra&Ronte lookup is becoming a

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

2. Link and Memory Architectures and Technologies

2. Link and Memory Architectures and Technologies 2. Link and Memory Architectures and Technologies 2.1 Links, Thruput/Buffering, Multi-Access Ovrhds 2.2 Memories: On-chip / Off-chip SRAM, DRAM 2.A Appendix: Elastic Buffers for Cross-Clock Commun. Manolis

More information

The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM

The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM Enabling the Future of the Internet The iflow Address Processor Forwarding Table Lookups using Fast, Wide Embedded DRAM Mike O Connor - Director, Advanced Architecture www.siliconaccess.com Hot Chips 12

More information

Scalable IP Routing Lookup in Next Generation Network

Scalable IP Routing Lookup in Next Generation Network Scalable IP Routing Lookup in Next Generation Network Chia-Tai Chan 1, Pi-Chung Wang 1,Shuo-ChengHu 2, Chung-Liang Lee 1, and Rong-Chang Chen 3 1 Telecommunication Laboratories, Chunghwa Telecom Co., Ltd.

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

EECS150 - Digital Design Lecture 16 Memory 1

EECS150 - Digital Design Lecture 16 Memory 1 EECS150 - Digital Design Lecture 16 Memory 1 March 13, 2003 John Wawrzynek Spring 2003 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: Whenever a large collection of state elements is required. data &

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011 // Bottlenecks Memory, memory, 88 - Switch and Router Design Dr. David Hay Ross 8b dhay@cs.huji.ac.il Source: Nick Mckeown, Isaac Keslassy Packet Processing Examples Address Lookup (IP/Ethernet) Where

More information

High performance HBM Known Good Stack Testing

High performance HBM Known Good Stack Testing High performance HBM Known Good Stack Testing FormFactor Teradyne Overview High Bandwidth Memory (HBM) Market and Technology Probing challenges Probe solution Power distribution challenges PDN design Simulation

More information

ECE697AA Lecture 20. Forwarding Tables

ECE697AA Lecture 20. Forwarding Tables ECE697AA Lecture 20 Routers: Prefix Lookup Algorithms Tilman Wolf Department of Electrical and Computer Engineering 11/14/08 Forwarding Tables Routing protocols involve a lot of information Path choices,

More information

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router Overview Implementing Gigabit Routers with NetFPGA Prof. Sasu Tarkoma The NetFPGA is a low-cost platform for teaching networking hardware and router design, and a tool for networking researchers. The NetFPGA

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Multi-gigabit Switching and Routing

Multi-gigabit Switching and Routing Multi-gigabit Switching and Routing Gignet 97 Europe: June 12, 1997. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science nickm@ee.stanford.edu http://ee.stanford.edu/~nickm

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Calibrating Achievable Design GSRC Annual Review June 9, 2002 Calibrating Achievable Design GSRC Annual Review June 9, 2002 Wayne Dai, Andrew Kahng, Tsu-Jae King, Wojciech Maly,, Igor Markov, Herman Schmit, Dennis Sylvester DUSD(Labs) Calibrating Achievable Design

More information

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Towards Effective Packet Classification J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Outline Algorithm Study Understanding Packet Classification Worst-case Complexity

More information

3D TECHNOLOGIES: SOME PERSPECTIVES FOR MEMORY INTERCONNECT AND CONTROLLER

3D TECHNOLOGIES: SOME PERSPECTIVES FOR MEMORY INTERCONNECT AND CONTROLLER 3D TECHNOLOGIES: SOME PERSPECTIVES FOR MEMORY INTERCONNECT AND CONTROLLER CODES+ISSS: Special session on memory controllers Taipei, October 10 th 2011 Denis Dutoit, Fabien Clermidy, Pascal Vivet {denis.dutoit@cea.fr}

More information

Outline. The demand The San Jose NAP. What s the Problem? Most things. Time. Part I AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM.

Outline. The demand The San Jose NAP. What s the Problem? Most things. Time. Part I AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM. Outline AN OVERVIEW OF HARDWARE ISSUES FOR IP AND ATM Name one thing you could achieve with ATM that you couldn t with IP! Nick McKeown Assistant Professor of Electrical Engineering and Computer Science

More information

Package level Interconnect Options

Package level Interconnect Options Package level Interconnect Options J.Balachandran,S.Brebels,G.Carchon, W.De Raedt, B.Nauwelaers,E.Beyne imec 2005 SLIP 2005 April 2 3 Sanfrancisco,USA Challenges in Nanometer Era Integration capacity F

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

EECS150 - Digital Design Lecture 16 - Memory

EECS150 - Digital Design Lecture 16 - Memory EECS150 - Digital Design Lecture 16 - Memory October 17, 2002 John Wawrzynek Fall 2002 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: data & program storage general purpose registers buffering table lookups

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

Lecture 18: DRAM Technologies

Lecture 18: DRAM Technologies Lecture 18: DRAM Technologies Last Time: Cache and Virtual Memory Review Today DRAM organization or, why is DRAM so slow??? Lecture 18 1 Main Memory = DRAM Lecture 18 2 Basic DRAM Architecture Lecture

More information

Multiway Range Trees: Scalable IP Lookup with Fast Updates

Multiway Range Trees: Scalable IP Lookup with Fast Updates Multiway Range Trees: Scalable IP Lookup with Fast Updates Subhash Suri George Varghese Priyank Ramesh Warkhede Department of Computer Science Washington University St. Louis, MO 63130. Abstract In this

More information

High-Performance Memory Interfaces Made Easy

High-Performance Memory Interfaces Made Easy High-Performance Memory Interfaces Made Easy Xilinx 90nm Design Seminar Series: Part IV Xilinx - #1 in 90 nm We Asked Our Customers: What are your challenges? Shorter design time, faster obsolescence More

More information

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX

More information

Computer Networks CS 552

Computer Networks CS 552 Computer Networks CS 552 Routers Badri Nath Rutgers University badri@cs.rutgers.edu. High Speed Routers 2. Route lookups Cisco 26: 8 Gbps Cisco 246: 32 Gbps Cisco 286: 28 Gbps Power: 4.2 KW Cost: $5K Juniper

More information

Technical Data Sheet Photolink- Fiber Optic Receiver

Technical Data Sheet Photolink- Fiber Optic Receiver Technical Data Sheet Photolink- Fiber Optic Receiver Features 1. High PD sensitivity optimized for red light 2. Data : NRZ signal 3. Low power consumption for extended battery life 4. Built-in threshold

More information

DIRECT Rambus DRAM has a high-speed interface of

DIRECT Rambus DRAM has a high-speed interface of 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 11, NOVEMBER 1999 A 1.6-GByte/s DRAM with Flexible Mapping Redundancy Technique and Additional Refresh Scheme Satoru Takase and Natsuki Kushiyama

More information

Advancing high performance heterogeneous integration through die stacking

Advancing high performance heterogeneous integration through die stacking Advancing high performance heterogeneous integration through die stacking Suresh Ramalingam Senior Director, Advanced Packaging European 3D TSV Summit Jan 22 23, 2013 The First Wave of 3D ICs Perfecting

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

Tree, Segment Table, and Route Bucket: A Multistage Algorithm for IPv6 Routing Table Lookup

Tree, Segment Table, and Route Bucket: A Multistage Algorithm for IPv6 Routing Table Lookup Tree, Segment Table, and Route Bucket: A Multistage Algorithm for IPv6 Routing Table Lookup Zhenqiang LI Dongqu ZHENG Yan MA School of Computer Science and Technology Network Information Center Beijing

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including Introduction Router Architectures Recent advances in routing architecture including specialized hardware switching fabrics efficient and faster lookup algorithms have created routers that are capable of

More information

Optical SerDes Test Interface for High-Speed and Parallel Testing

Optical SerDes Test Interface for High-Speed and Parallel Testing June 7-10, 2009 San Diego, CA SerDes Test Interface for High-Speed and Parallel Testing Sanghoon Lee, Ph. D Sejang Oh, Kyeongseon Shin, Wuisoo Lee Memory Division, SAMSUNG ELECTRONICS Why Interface? High

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES Greg Hankins APRICOT 2012 2012 Brocade Communications Systems, Inc. 2012/02/28 Lookup Capacity and Forwarding

More information

3D Technologies For Low Power Integrated Circuits

3D Technologies For Low Power Integrated Circuits 3D Technologies For Low Power Integrated Circuits Paul Franzon North Carolina State University Raleigh, NC paulf@ncsu.edu 919.515.7351 Outline 3DIC Technology Set Approaches to 3D Specific Power Minimization

More information

CS 552 Computer Networks

CS 552 Computer Networks CS 55 Computer Networks IP forwarding Fall 00 Rich Martin (Slides from D. Culler and N. McKeown) Position Paper Goals: Practice writing to convince others Research an interesting topic related to networking.

More information

VLSI AppNote: VSx053 Simple DSP Board

VLSI AppNote: VSx053 Simple DSP Board : VSx053 Simple DSP Board Description This document describes the VS1053 / VS8053 Simple DPS Board and the VSx053 Simple DSP Host Board. Schematics, layouts and pinouts of both cards are included. The

More information

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table

Message Switch. Processor(s) 0* 1 100* 6 1* 2 Forwarding Table Recent Results in Best Matching Prex George Varghese October 16, 2001 Router Model InputLink i 100100 B2 Message Switch B3 OutputLink 6 100100 Processor(s) B1 Prefix Output Link 0* 1 100* 6 1* 2 Forwarding

More information

CS 268: Route Lookup and Packet Classification

CS 268: Route Lookup and Packet Classification Overview CS 268: Route Lookup and Packet Classification Packet Lookup Packet Classification Ion Stoica March 3, 24 istoica@cs.berkeley.edu 2 Lookup Problem Identify the output interface to forward an incoming

More information

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Yuzhe Chen, Zhaoqing Chen and Jiayuan Fang Department of Electrical

More information

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by:

Main Memory. EECC551 - Shaaban. Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be

More information

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21 100 GBE AND BEYOND 2011 Brocade Communications Systems, Inc. Diagram courtesy of the CFP MSA. v1.4 2011/11/21 Current State of the Industry 10 Electrical Fundamental 1 st generation technology constraints

More information

Stacked Silicon Interconnect Technology (SSIT)

Stacked Silicon Interconnect Technology (SSIT) Stacked Silicon Interconnect Technology (SSIT) Suresh Ramalingam Xilinx Inc. MEPTEC, January 12, 2011 Agenda Background and Motivation Stacked Silicon Interconnect Technology Summary Background and Motivation

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs

Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Processor and DRAM Integration by TSV- Based 3-D Stacking for Power-Aware SOCs Shin-Shiun Chen, Chun-Kai Hsu, Hsiu-Chuan Shih, and Cheng-Wen Wu Department of Electrical Engineering National Tsing Hua University

More information

ASNT1011. ASNT1011-PQA DC-to-16Gbps Digital Multiplexer 16:1 / Serializer

ASNT1011. ASNT1011-PQA DC-to-16Gbps Digital Multiplexer 16:1 / Serializer Advaed Sciee And Novel Technology Company, I. 27 Via Porto Grande, Raho Palos Verdes, CA 90275 ASNT1011-PQA DC-to-16Gbps Digital Multiplexer 16:1 / Serializer Broadband digital serializer 16-to-1. LVDS

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Shreyas G. Singapura, Anand Panangadan and Viktor K. Prasanna University of Southern California, Los Angeles CA 90089, USA, {singapur,

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

ASNT1011A-KMA DC-to-17Gbps Digital Multiplexer 16:1 / Serializer

ASNT1011A-KMA DC-to-17Gbps Digital Multiplexer 16:1 / Serializer ASNT1011A-KMA DC-to-17Gbps Digital Multiplexer 16:1 / Serializer Broadband digital serializer 16 to 1 operating seamlessly from DC to 17Gbps LVDS compliant input data buffers Full-rate clock output Clock

More information

High-speed, high-bandwidth DRAM memory bus with Crosstalk Transfer Logic (XTL) interface. Outline

High-speed, high-bandwidth DRAM memory bus with Crosstalk Transfer Logic (XTL) interface. Outline High-speed, high-bandwidth DRAM memory bus with Crosstalk Transfer Logic (XTL) interface Hideki Osaka Hitachi Ltd., Kanagawa, Japan oosaka@sdl.hitachi.co.jp Toyohiko Komatsu Hitachi Ltd., Kanagawa, Japan

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

DRAM Memory Modules Overview & Future Outlook. Bill Gervasi Vice President, DRAM Technology SimpleTech

DRAM Memory Modules Overview & Future Outlook. Bill Gervasi Vice President, DRAM Technology SimpleTech DRAM Memory Modules Overview & Future Outlook Bill Gervasi Vice President, DRAM Technology SimpleTech bilge@simpletech.com Many Applications, Many Configurations 2 Module Configurations DDR1 DDR2 Registered

More information

Computer System Components

Computer System Components Computer System Components CPU Core 1 GHz - 3.2 GHz 4-way Superscaler RISC or RISC-core (x86): Deep Instruction Pipelines Dynamic scheduling Multiple FP, integer FUs Dynamic branch prediction Hardware

More information

Professor Yashar Ganjali Department of Computer Science University of Toronto.

Professor Yashar Ganjali Department of Computer Science University of Toronto. Professor Yashar Ganjali Department of Computer Science University of Toronto yganjali@cs.toronto.edu http://www.cs.toronto.edu/~yganjali Today Outline What this course is about Logistics Course structure,

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

A Novel Level-based IPv6 Routing Lookup Algorithm

A Novel Level-based IPv6 Routing Lookup Algorithm A Novel Level-based IPv6 Routing Lookup Algorithm Xiaohong Huang 1 Xiaoyu Zhao 2 Guofeng Zhao 1 Wenjian Jiang 2 Dongqu Zheng 1 Qiong Sun 1 Yan Ma 1,3 1. School of Computer Science and Technology, Beijing

More information

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc.

Z-RAM Ultra-Dense Memory for 90nm and Below. Hot Chips David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Z-RAM Ultra-Dense Memory for 90nm and Below Hot Chips 2006 David E. Fisch, Anant Singh, Greg Popov Innovative Silicon Inc. Outline Device Overview Operation Architecture Features Challenges Z-RAM Performance

More information

Counter Braids: A novel counter architecture

Counter Braids: A novel counter architecture Counter Braids: A novel counter architecture Balaji Prabhakar Balaji Prabhakar Stanford University Joint work with: Yi Lu, Andrea Montanari, Sarang Dharmapurikar and Abdul Kabbani Overview Counter Braids

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

CONTACT: ,

CONTACT: , S.N0 Project Title Year of publication of IEEE base paper 1 Design of a high security Sha-3 keccak algorithm 2012 2 Error correcting unordered codes for asynchronous communication 2012 3 Low power multipliers

More information

RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing

RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing RHiNET-3/SW: an 0-Gbit/s high-speed network switch for distributed parallel computing S. Nishimura 1, T. Kudoh 2, H. Nishi 2, J. Yamamoto 2, R. Ueno 3, K. Harasawa 4, S. Fukuda 4, Y. Shikichi 4, S. Akutsu

More information

ASNT1011-KMA DC-to-17Gbps Digital Multiplexer 16:1 / Serializer

ASNT1011-KMA DC-to-17Gbps Digital Multiplexer 16:1 / Serializer ASNT1011-KMA DC-to-17Gbps Digital Multiplexer 16:1 / Serializer Broadband digital serializer 16 to 1 LVDS compliant input data buffers Full-rate clock output Clock-divided-by-16 LVDS output buffer with

More information

KiloCore: A 32 nm 1000-Processor Array

KiloCore: A 32 nm 1000-Processor Array KiloCore: A 32 nm 1000-Processor Array Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas University of California, Davis VLSI Computation

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

Indian Silicon Technologies 2013

Indian Silicon Technologies 2013 SI.No Topics IEEE YEAR 1. An RFID Based Solution for Real-Time Patient Surveillance and data Processing Bio- Metric System using FPGA 2. Real-time Binary Shape Matching System Based on FPGA 3. An Optimized

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information