Intro to SKARAB for programmers

Size: px
Start display at page:

Download "Intro to SKARAB for programmers"

Transcription

1 Intro to SKARAB for programmers (and how to use HMC!) Jason Manley 2017 CASPER workshop

2 Hardware

3 Hardware Virtex 7, 690T FPGA 4 Mezzanine sites per SKARAB 2 in front, 2 in back 16 SERDES links per site Designed to early PowerMX standard. Fans over-provisioned, normally run around 20% - 30% rated speed.

4 Hardware Mezzanine cards allow trading off of memory vs IO capacity. Four cards per SKARAB. Only one type of off-chip memory currently available on SKARAB: HMC. HMC replaces QDR/SRAM and also DRAM found on previous CASPER boards. 40G mezzanine card offers 4x40G QSFP Ethernet ports, can drive optics or copper. No more complicated, flaky PHY chips that need firmware loaded to function properly. An ADC is now also available, with other cards to follow.

5 Hardware: HMC Mezzanine card 1x HMC device per card HMC is 2GiB or 4GiB Two independent interfaces per card: 2x half-width (8 lane) links at 10Gbps per lane. Each link is bi-directional. Up to 160Gbps throughput per card.

6 Hardware: QSFP 40G mezzanine card Quad 40G QSFP Ethernet card PHY-less (purely passive). Does have a little micro processor for SFP management (power, temp etc). Able to drive optics directly. Tested with up to 7m passive cables. Recommend AOC (Active Optical Cables) for anything 5m and over. Does not currently work in breakout mode with spider/octopus cables. (turning one 40G port into 4x10G ports)

7 Compared to existing CASPER hardware ibob ROACH ROACH-2 SKARAB Logic cells 53K 94K 476K 693K DSP slices BRAM capacity 4.2Mb 8.8Mb 38Mb 53Mb SRAM capacity 2x18Mb 2x36Mb 4x144Mb 9Gbps 43Gbps 200Gbps - 1x8Gb 1x16Gb SRAM bandwidth DDR capacity (max) DDR bandwidth (total) Ethernet ports HMC < 8x 32Gib 8x 30Gbps R+W - 38Gbps 50Gbps 2x 10G 4x10G 8x10G < 16x40G

8 Hardware Uses the JASPER flow, not the traditional CASPER flow. Python now forms the backend for managing: busses Yellowblock Backend is Xilinx VIVADO, not ISE (hard break at Virtex-6/ROACH-2; no overlapping tool support). (recall Wesley s JASPER/VIVADO in talk on Monday) SKARAB incorporates all the lessons-learnt from SKA-SA s sizable deployments of ibob/bee2, ROACH-1 and ROACH-2s. After compiling a bitstream, interacting with a SKARAB from a network-attached control computer using any of the standard tools is the same as working with any previous CASPER hardware. But it is quite different under-the-hood...

9 Remotely controlling SKARABs Previous CASPER boards (ibobs, BEE2s, ROACH1s, ROACH2s) all had out of band management ports (separate 100Mbps or 1G Ethernet ports from the 10G data ports). SKARAB can do everything in-band: data, management as well as (re)programming Eventually over any network interface, But currently only over 1G port or first 40G port. Work in progress! SKARAB does not have a separate management processor. It uses a lightweight on-fpga softcore MicroBlaze. Microblaze is reloaded whenever FPGA is reprogrammed Process must be robust, and managed carefully, to avoid losing comms to boards. Simpler setup and maintenance: Just need a power cable and network cable to each SKARAB. Network appliance: No need for managing boot servers, Linux filesystems etc Entire platform can be managed remotely, including upgrading all firmware over network. Designed for large-scale deployments (MeerKAT, with an eye on SKA).

10 SKARAB startup sequencing Onboard flash memory ships with two (space for up to four) bitstreams pre-loaded. Golden Image and Multiboot Image Exactly same bitstream; Tries to boot multiboot image quickly. If that fails, falls back to golden image more slowly. You can load your own images here, if you want, but that s not the idea Most large CASPER deployments have a control computer on the network to configure the FPGA boards. SKARAB is designed to work in this environment. Host computer stores your various bitstreams. So, when SKARAB boots, loads flash image, asks for DHCP. Server then knows about new SKARAB board on network, and can load whichever DSP gateware image, configure registers and set it to work. Default is for DHCP on all network ports on startup. (SKARAB wants DHCP server. Hard-coding IP addresses in your bitstreams no longer so easy.) Hostname support, for example, skarab LLDP support (boards announce themselves to switches) MAC addresses are based on serial number and network port. First 40G port has hostname skarab , with MAC 06:50:02:03:02:01 After loading DSP bitstream, network interfaces flap and a new DHCP transaction ensues. Depending on your DHCP server and network (switch), can take a few seconds to bring link back up.

11 What s working? Working Not (yet) working Basic JASPER toolflow Legacy CASPER toolflow (and never will) Polling sensors (power, temp, fans etc) Automatic fan speed control HMC Mezzanine cards Retrieval of logs for hardware errors First 40G ethernet port Arbitrary combinations of Ethernet and HMC cards 1G ethernet port Onboard USB JTAG bridge Remote reprogramming and control Fast (~1 second) remote reloading of FPGA gateware Remote updates (flash firmware) Large wishbone bus (timing implications; WIP) DHCP, LLDP, ARP, PING and other network services Comprehensive DRC during compile Python casperfpga interfaces (mostly; WIP)

12 Tips for designs Keep to the UDP port compiled-in to your yellowblock for all your high-speed traffic. Else, can overwhelm microblaze with traffic; especially problematic while trying to reprogram. Yellowblock default is to use 7148 (SPEAD default at SKA-SA). Don t ever use: 7778 decimal (0x1e62); that s for controlling the microblaze decimal (0x7148); that s used for reprogramming. In the event of a network failure at startup, SKARAB will try indefinitely to get a DHCP lease. LEDs on front panel indicate DHCP success on golden image (useful for basic/visual debugging). Check for updates regularly. Development s very fluid at the moment, and nothing is stable yet. Current bus architecture limitations prevent very large numbers of attachments (~50 slaves ok). Good news is that V7 seems to have much better routing resources, especially when building large BRAMs. Timing much easier for large FFTs and snapshot blocks than on V6. Large designs easily meet timing at 240MHz. You ll get to play with all this stuff during Adam s SKARAB tutorials.

13 HMC memory What is Hybrid Memory Cube? Stacked DRAM on a chip, with a built-in management layer. Designed and optimised for very high throughput, not low-latency. Perfect for RA instrumentation! HMC takes care of itself, including error detection on memory cells and IO operations. Don t have to deal with refreshes, bank management etc in FPGA controller anymore. HMC contains smarts... has buffers and a small ALU. (can build accumulator inside the memory!) External interface is high speed serial ( SERDES ) links. HMC supports up to 4 sets of bidirectional 16-lane links, with each lane operating up to 15Gbps... That s up to 1.9Tbps. It s FAST! Micron already on 3rd generation HMC. SKARAB uses 2nd generation at lower speeds.

14 Accessing HMC memory Yellowblock packages your instructions (read/write) into flits. A flit is a packet containing a header (instruction) and data (see HMC datasheet for details). Fortunately, all of this is abstracted-away for user; Yellowblock makes HMC look like a conventional memory interface. Each HMC yellowblock offers two dual-ported interfaces. Simultaneous read and write operations are combined into a single flit. Memory is organised into Vaults, Banks and DRAMs. The controller allows you to arbitrarily map these into your address bits. By default, SKARAB s implementation optimises for linear reads and writes. a26... a8 a7 a6 a5 a4 a3 a2 a1 a0 D19... D0 B3 B2 B1 B0 V3 V2 V1 V0 Yellowblock accesses 256 bits at a time, and presents a 256 bit bus. One clock cycle per read&/write request No need for burst reads or writes: truly random access possible.

15 Accessing HMC memory Yellowblock packages your instructions (read/write) into flits. A flit is a packet containing a header (instruction) and data (see HMC datasheet for details). Fortunately, all of this is abstracted-away for user; Yellowblock makes HMC look like a conventional memory interface. Each HMC yellowblock offers two dual-ported interfaces. Simultaneous read and write operations are combined into a single flit. Memory is organised into Vaults, Banks and DRAMs. The controller allows you to arbitrarily map these into your address bits. By default, SKARAB s implementation optimises for linear reads and writes. a26... a8 a7 a6 a5 a4 a3 a2 a1 a0 D19... D0 B3 B2 B1 B0 V3 V2 V1 V0 Yellowblock accesses 256 bits at a time, and presents a 256 bit bus. One clock cycle per read&/write request No need for burst reads or writes: truly random access possible.

16 HMC vaults and links There are 16 vaults per HMC device. Four are co-located with each link (collection of SERDES lanes). They are interconnected on-chip using a switched network, so any link can access any vault. Naturally, accessing co-located memory is faster than hopping through the switches to get to memory located on other links. Mapping is as you d expect: Link 1: vaults 0,1,2,3 Link 2: vaults 4,5,6,7 Link 3: vaults 8,9,10,11 Link 4: vaults 12,13,14,15 SKARAB has links 2 and 3 connected. Thus, half the memory can be accessed locally, incurring minimum latency. Accessing remote vaults (0-3 and 12-15) will incur additional latency, but the switching network is full crossbar (no reduction in bandwidth).

17 HMC: More on vaults To increase throughput, data must be striped over multiple Vaults. Maximum throughput performance requires you to use all vaults. Each vault has a buffer for transactions. If you keep accessing the same vault continuously, operations will queue and performance will degrade. NNB for matrix-transpose (corner-turner). Vaults operate semi-autonomously, and respond as quickly as they can. Latency, throughput and order of operations thus not guaranteed. You can issue a request to vault 1 and then another to vault 2, and get the response back from vault 2 first and then the reply from vault 1 some time later. Performance heavily dependent upon your access patterns. To keep track of your read requests, you issue 9-bit tags with each read request. Responses contain your tags so you can sort them out again. This can complicate things enormously. Data is also cached in the HMC, so if you issue the same read request twice, you get the second response back very quickly, and possibly before many earlier read requests. Typical latency: ~80 FPGA clock cycles (230MHz) in VACC applications. Typical out-of-order: ranges from 0 to ~230, depending on access patterns and speeds.

18 HMC yellowblock HMC controller automatically performs POST upon startup. After POST, HMC monitors itself. 6-bit error code reported in event of failure during operation. Checks include: flit (SERDES comms) errors ECC in DRAM core Buffer overruns Internal logic errors For best performance: linear access, simultaneous read and write flits Higher-level HMC blocks available in DSP library: Wideband, programmable delay line Corner-Turner (matrix transpose) Vector-accumulator (buffered, with backpressure)

19 HMC conclusions & considerations Latency through the chip is not guaranteed. Throughput is not guaranteed, and depends on access patterns. No SKARAB support yet for special instructions (just basic read&write). Most applications will need a reorder block after the HMC to deal with out-of-order responses. If you re doing reads and writes, issue these instructions simultaneously.

20 40G ethernet core, forty_gbe Yellowblock interface exactly like the 10G ethernet core, but with 256b interfaces instead of 64b interfaces. 40G core now does proper RX CRC checking (uses a lot of HW resources, though). No longer managed by tcpborphserver and tgtap software process on PPC. Microblaze softcore manages all network services. Features in place already: DHCP with auto-renew and hostname support based on serial number LLDP reporting and discovery ARP Ping Multicast TX and RX, including subscription to multiple sequential addresses. IGMPv2 signalling. As with 10G core, multicasting RX uses bitmask arrangement. Can only subscribe to contiguous chunks of 2^N addresses. Current status, limitations and work in progress: At the moment, 40G yellowblock is hard-coded for the first QSFP port on the third mezzanine site. 40G yellowblock currently pulls-in microblaze infrastructure, so all designs must contain a 40G core, even if you re not using it!

21 40G Ethernet and HMC resources Hardware resources for 40G ethernet and HMC cores: Total available Per 40G port Per HMC mezzanine card Slices (3.1%) (13.1%) BRAM (1.7%) 116 (7.9%) DSP (0%) 4 (0.1%)

22 Questions & Comments Jason Manley

JASPER and the SKARAB. Wesley New 2017 CASPER workshop

JASPER and the SKARAB. Wesley New 2017 CASPER workshop JASPER and the SKARAB Wesley New 2017 CASPER workshop Hardware Hardware: SKARAB Motherboard Peralex in conjunction with SKA-SA have designed the SKARAB. Based on the Virtex 7, 690T FPGA 53Mb BRAM 3600

More information

BittWare s XUPP3R is a 3/4-length PCIe x16 card based on the

BittWare s XUPP3R is a 3/4-length PCIe x16 card based on the FPGA PLATFORMS Board Platforms Custom Solutions Technology Partners Integrated Platforms XUPP3R Xilinx UltraScale+ 3/4-Length PCIe Board with Quad QSFP and 512 GBytes DDR4 Xilinx Virtex UltraScale+ VU7P/VU9P/VU11P

More information

Jason Manley. Internal presentation: Operation overview and drill-down October 2007

Jason Manley. Internal presentation: Operation overview and drill-down October 2007 Jason Manley Internal presentation: Operation overview and drill-down October 2007 System overview Achievements to date ibob F Engine in detail BEE2 X Engine in detail Backend System in detail Future developments

More information

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand

Spring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates

More information

VXS-621 FPGA & PowerPC VXS Multiprocessor

VXS-621 FPGA & PowerPC VXS Multiprocessor VXS-621 FPGA & PowerPC VXS Multiprocessor Xilinx Virtex -5 FPGA for high performance processing On-board PowerPC CPU for standalone operation, communications management and user applications Two PMC/XMC

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

Hybrid Memory Cube (HMC)

Hybrid Memory Cube (HMC) 23 Hybrid Memory Cube (HMC) J. Thomas Pawlowski, Fellow Chief Technologist, Architecture Development Group, Micron jpawlowski@micron.com 2011 Micron Technology, I nc. All rights reserved. Products are

More information

Compute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen

Compute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen Compute Node Design for DAQ and Trigger Subsystem in Giessen Justus Liebig University in Giessen Outline Design goals Current work in Giessen Hardware Software Future work Justus Liebig University in Giessen,

More information

UCT Software-Defined Radio Research Group

UCT Software-Defined Radio Research Group UCT Software-Defined Radio Research Group UCT SDRRG Team UCT Faculty: Alan Langman Mike Inggs Simon Winberg PhD Students: Brandon Hamilton MSc Students: Bruce Raw Gordon Inggs Simon Scott Joseph Wamicha

More information

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Implementing Ultra Low Latency Data Center Services with Programmable Logic Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,

More information

Enabling success from the center of technology. Interfacing FPGAs to Memory

Enabling success from the center of technology. Interfacing FPGAs to Memory Interfacing FPGAs to Memory Goals 2 Understand the FPGA/memory interface Available memory technologies Available memory interface IP & tools from Xilinx Compare Performance Cost Resources Demonstrate a

More information

VXS-610 Dual FPGA and PowerPC VXS Multiprocessor

VXS-610 Dual FPGA and PowerPC VXS Multiprocessor VXS-610 Dual FPGA and PowerPC VXS Multiprocessor Two Xilinx Virtex -5 FPGAs for high performance processing On-board PowerPC CPU for standalone operation, communications management and user applications

More information

High Bandwidth Electronics

High Bandwidth Electronics DOE BES Neutron & Photon Detectors Workshop, August 1-3, 2012 Ryan Herbst System Overview What are the standard components in a detector system? Detector/Amplifier & ADC Digital front end - Configure and

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit

More information

FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES

FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES MARCO BARTOLINI - BARTOLINI@IRA.INAF.IT TORINO 18 MAY 2016 WORKSHOP: FPGA APPLICATION IN ASTROPHYSICS FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES TORINO, 18 MAY 2016, INAF FPGA

More information

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info. A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment

More information

srio SERIAL BUFFER FLOW-CONTROL DEVICE

srio SERIAL BUFFER FLOW-CONTROL DEVICE SERIAL BUFFER FLOW-CONTROL DEVICE 80KSBR201 Device Overview The IDT80KSBR201 is a high speed Buffer (SerB) that can connect up to two high-speed RapidIO interfaces. This device is built to work with any

More information

5051 & 5052 PCIe Card Overview

5051 & 5052 PCIe Card Overview 5051 & 5052 PCIe Card Overview About New Wave New Wave DV provides high performance network interface cards, system level products, FPGA IP cores, and custom engineering for: High-bandwidth low-latency

More information

ibob ADC Tutorial CASPER Reference Design

ibob ADC Tutorial CASPER Reference Design ibob ADC Tutorial Author: Griffin Foster April 14, 2009 (v1.0) Hardware Platforms Used: ibob, iadc FPGA Clock Rate: 100 MHz Sampling Rate: 400 MHz Software Environment: TinySH This tutorial walks through

More information

SMT943 APPLICATION NOTE 1 APPLICATION NOTE 1. Application Note - SMT372T and SMT943.doc SMT943 SUNDANCE MULTIPROCESSOR TECHNOLOGY LTD.

SMT943 APPLICATION NOTE 1 APPLICATION NOTE 1. Application Note - SMT372T and SMT943.doc SMT943 SUNDANCE MULTIPROCESSOR TECHNOLOGY LTD. APPLICATION NOTE 1 Application Note - SMT372T + SMT943 SMT943 SUNDANCE MULTIPROCESSOR TECHNOLOGY LTD. Date Comments / Changes Author Revision 07/07/10 Original Document completed CHG 1 Date 13/05/2010

More information

NetFPGA Hardware Architecture

NetFPGA Hardware Architecture NetFPGA Hardware Architecture Jeffrey Shafer Some slides adapted from Stanford NetFPGA tutorials NetFPGA http://netfpga.org 2 NetFPGA Components Virtex-II Pro 5 FPGA 53,136 logic cells 4,176 Kbit block

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

Interface electronics

Interface electronics Peter Göttlicher, DESY-FEB, June 11th 2008 1 Interface electronics Links to backend/control implications to mechanical design, to effort in FPGA's Peter Göttlicher, DESY-FEB specifications of signals at

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

Spartan-6 and Virtex-6 FPGA Embedded Kit FAQ

Spartan-6 and Virtex-6 FPGA Embedded Kit FAQ Spartan-6 and Virtex-6 FPGA FAQ February 5, 2009 Getting Started 1. Where can I purchase an Embedded kit? A: You can purchase your Spartan-6 and Virtex-6 FPGA Embedded kits online at: Spartan-6 FPGA :

More information

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs

An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs Architecture optimized for Fast Ultra Long FFTs Parallel FFT structure reduces external memory bandwidth requirements Lengths from 32K to

More information

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James

Computer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions

More information

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing

Microcontroller Systems. ELET 3232 Topic 11: General Memory Interfacing Microcontroller Systems ELET 3232 Topic 11: General Memory Interfacing 1 Objectives To become familiar with the concepts of memory expansion and the data and address bus To design embedded systems circuits

More information

Simplify System Complexity

Simplify System Complexity Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint

More information

Full Linux on FPGA. Sven Gregori

Full Linux on FPGA. Sven Gregori Full Linux on FPGA Sven Gregori Enclustra GmbH FPGA Design Center Founded in 2004 7 engineers Located in the Technopark of Zurich FPGA-Vendor independent Covering all topics

More information

FPGA Solutions: Modular Architecture for Peak Performance

FPGA Solutions: Modular Architecture for Peak Performance FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com Agenda Company Overview FPGA

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

SMT9091 SMT148-FX-SMT351T/SMT391

SMT9091 SMT148-FX-SMT351T/SMT391 Unit / Module Description: Unit / Module Number: Document Issue Number: Issue Date: Original Author: This Document provides an overview of the developed system key features. SMT148-FX-SMT351T/SMT391 E.Puillet

More information

An Intelligent NIC Design Xin Song

An Intelligent NIC Design Xin Song 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational

More information

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,

More information

BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design

BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design Valeh Valiollahpour Amiri (vv2252) Christopher Campbell (cc3769) Yuanpei Zhang (yz2727) Sheng Qian ( sq2168) March 26, 2015 I) Hardware

More information

PowerPC on NetFPGA CSE 237B. Erik Rubow

PowerPC on NetFPGA CSE 237B. Erik Rubow PowerPC on NetFPGA CSE 237B Erik Rubow NetFPGA PCI card + FPGA + 4 GbE ports FPGA (Virtex II Pro) has 2 PowerPC hard cores Untapped resource within NetFPGA community Goals Evaluate performance of on chip

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

Lecture 18: DRAM Technologies

Lecture 18: DRAM Technologies Lecture 18: DRAM Technologies Last Time: Cache and Virtual Memory Review Today DRAM organization or, why is DRAM so slow??? Lecture 18 1 Main Memory = DRAM Lecture 18 2 Basic DRAM Architecture Lecture

More information

The WINLAB Cognitive Radio Platform

The WINLAB Cognitive Radio Platform The WINLAB Cognitive Radio Platform IAB Meeting, Fall 2007 Rutgers, The State University of New Jersey Ivan Seskar Software Defined Radio/ Cognitive Radio Terminology Software Defined Radio (SDR) is any

More information

IGLOO2 Evaluation Kit Webinar

IGLOO2 Evaluation Kit Webinar Power Matters. IGLOO2 Evaluation Kit Webinar Jamie Freed jamie.freed@microsemi.com August 29, 2013 Overview M2GL010T- FG484 $99* LPDDR 10/100/1G Ethernet SERDES SMAs USB UART Available Demos Small Form

More information

Outline of Presentation Field Programmable Gate Arrays (FPGAs(

Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable

More information

Application Note for EVP

Application Note for EVP Sundance Multiprocessor Technology Limited Application Note Form : QCF32 Date : 11 Februay 2009 Unit / Module Description: SMT111-SMT372T-SMT946 Unit / Module Number: Document Issue Number: 1.0 Issue Date:

More information

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory Semiconductor Memory Types Semiconductor Memory RAM Misnamed as all semiconductor memory is random access

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Sophon SC1 White Paper

Sophon SC1 White Paper Sophon SC1 White Paper V10 Copyright 2017 BITMAIN TECHNOLOGIES LIMITED All rights reserved Version Update Content Release Date V10-2017/10/25 Copyright 2017 BITMAIN TECHNOLOGIES LIMITED All rights reserved

More information

Addendum to Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches

Addendum to Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches Addendum to Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches Gabriel H. Loh Mark D. Hill AMD Research Department of Computer Sciences Advanced Micro Devices, Inc. gabe.loh@amd.com

More information

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization 5.1 Semiconductor Main Memory

More information

Simplifying FPGA Design for SDR with a Network on Chip Architecture

Simplifying FPGA Design for SDR with a Network on Chip Architecture Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen

More information

High-Speed NAND Flash

High-Speed NAND Flash High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The

More information

10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?!

10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?! University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Memory Let s Name Some Groups of Bits I need your help. The computer we re going

More information

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB) CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and

More information

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types CSCI 4717/5717 Computer Architecture Topic: Internal Memory Details Reading: Stallings, Sections 5.1 & 5.3 Basic Organization Memory Cell Operation Represent two stable/semi-stable states representing

More information

Five Key Steps to High-Speed NAND Flash Performance and Reliability

Five Key Steps to High-Speed NAND Flash Performance and Reliability Five Key Steps to High-Speed Flash Performance and Reliability Presenter Bob Pierce Flash Memory Summit 2010 Santa Clara, CA 1 NVM Performance Trend ONFi 2 PCM Toggle ONFi 2 DDR SLC Toggle Performance

More information

Adapter Modules for FlexRIO

Adapter Modules for FlexRIO Adapter Modules for FlexRIO Ravichandran Raghavan Technical Marketing Engineer National Instruments FlexRIO LabVIEW FPGA-Enabled Instrumentation 2 NI FlexRIO System Architecture PXI/PXIe NI FlexRIO Adapter

More information

Chapter 5 Internal Memory

Chapter 5 Internal Memory Chapter 5 Internal Memory Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM) Read-write memory Electrically, byte-level Electrically Volatile Read-only memory (ROM) Read-only

More information

Avoid Bottlenecks Using PCI Express-Based Embedded Systems

Avoid Bottlenecks Using PCI Express-Based Embedded Systems Avoid Bottlenecks Using PCI Express-Based Embedded Systems Implementing efficient data movement is a critical element in high-performance embedded systems, and the advent of PCI Express has presented us

More information

COSC 6385 Computer Architecture - Memory Hierarchies (III)

COSC 6385 Computer Architecture - Memory Hierarchies (III) COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory

More information

COSC 6385 Computer Architecture - Memory Hierarchies (II)

COSC 6385 Computer Architecture - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity

More information

Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University DRAMs Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Main Memory & Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width

More information

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall

Memory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall The Memory Wall EE 357 Unit 13 Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology targets density rather than speed) Large memories

More information

Introduction to Zynq

Introduction to Zynq Introduction to Zynq Lab 2 PS Config Part 1 Hello World October 2012 Version 02 Copyright 2012 Avnet Inc. All rights reserved Table of Contents Table of Contents... 2 Lab 2 Objectives... 3 Experiment 1:

More information

COMPUTER ARCHITECTURES

COMPUTER ARCHITECTURES COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and

More information

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21 100 GBE AND BEYOND 2011 Brocade Communications Systems, Inc. Diagram courtesy of the CFP MSA. v1.4 2011/11/21 Current State of the Industry 10 Electrical Fundamental 1 st generation technology constraints

More information

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness

Mark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology

More information

Introduction to High-Speed InfiniBand Interconnect

Introduction to High-Speed InfiniBand Interconnect Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output

More information

SKA Technical developments relevant to the National Facility. Keith Grainge University of Manchester

SKA Technical developments relevant to the National Facility. Keith Grainge University of Manchester SKA Technical developments relevant to the National Facility Keith Grainge University of Manchester Talk Overview SKA overview Receptors Data transport and network management Synchronisation and timing

More information

INT G bit TCP Offload Engine SOC

INT G bit TCP Offload Engine SOC INT 10011 10 G bit TCP Offload Engine SOC Product brief, features and benefits summary: Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx/Altera FPGAs or Structured ASIC flow.

More information

Lot # 10 - Servers. 1. Rack Server. Rack Server Server

Lot # 10 - Servers. 1. Rack Server. Rack Server Server 1. Rack Server Rack Server Server Processor: 1 x Intel Xeon E5 2620v3 (2.4GHz/6 core/15mb/85w) Processor Kit. Upgradable to 2 CPU Chipset: Intel C610 Series Chipset. Intel E5 2600v3 Processor Family. Memory:

More information

Xu Wang Hardware Engineer Facebook, Inc.

Xu Wang Hardware Engineer Facebook, Inc. Hardware Overview Xu Wang Hardware Engineer Facebook, Inc. An Improved Wedge 100 19-in SKU For regular rack 21-in SKU for Open Rack Open 32x100GE TOR Switch Facebook s second generation Open TOR Switch

More information

100% PACKET CAPTURE. Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms. Up to 200Gbps

100% PACKET CAPTURE. Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms. Up to 200Gbps 100% PACKET CAPTURE Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms Up to 200Gbps Dual Port 100 GigE ANIC-200KFlex (QSFP28) The ANIC-200KFlex FPGA-based PCIe adapter/nic features dual

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem

More information

Jakub Cabal et al. CESNET

Jakub Cabal et al. CESNET CONFIGURABLE FPGA PACKET PARSER FOR TERABIT NETWORKS WITH GUARANTEED WIRE- SPEED THROUGHPUT Jakub Cabal et al. CESNET 2018/02/27 FPGA, Monterey, USA Packet parsing INTRODUCTION It is among basic operations

More information

All Programmable: from Silicon to System

All Programmable: from Silicon to System All Programmable: from Silicon to System Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates Variability Page 3 Industry Debates on Cost Page 4

More information

Extreme TCP Speed on GbE

Extreme TCP Speed on GbE TOE1G-IP Introduction (Xilinx) Ver1.1E Extreme TCP Speed on GbE Design Gateway Page 1 Agenda Advantage and Disadvantage of TCP on GbE TOE1G-IP core overview TOE1G-IP core description Initialization High-speed

More information

QuiXilica V5 Architecture

QuiXilica V5 Architecture QuiXilica V5 Architecture: The High Performance Sensor I/O Processing Solution for the Latest Generation and Beyond Andrew Reddig President, CTO TEK Microsystems, Inc. Military sensor data processing applications

More information

System-on Solution from Altera and Xilinx

System-on Solution from Altera and Xilinx System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable

More information

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Computer Organization. 8th Edition. Chapter 5 Internal Memory William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM)

More information

High Performance Embedded Applications. Raja Pillai Applications Engineering Specialist

High Performance Embedded Applications. Raja Pillai Applications Engineering Specialist High Performance Embedded Applications Raja Pillai Applications Engineering Specialist Agenda What is High Performance Embedded? NI s History in HPE FlexRIO Overview System architecture Adapter modules

More information

S2C K7 Prodigy Logic Module Series

S2C K7 Prodigy Logic Module Series S2C K7 Prodigy Logic Module Series Low-Cost Fifth Generation Rapid FPGA-based Prototyping Hardware The S2C K7 Prodigy Logic Module is equipped with one Xilinx Kintex-7 XC7K410T or XC7K325T FPGA device

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Understanding the TOP Server ControlLogix Ethernet Driver

Understanding the TOP Server ControlLogix Ethernet Driver Understanding the TOP Server ControlLogix Ethernet Driver Page 2 of 23 Table of Contents INTRODUCTION 3 UPDATE RATES AND TAG REQUESTS 4 CHANNEL AND DEVICE CONFIGURATION 7 PROTOCOL OPTIONS 9 TAG GENERATION

More information

XMC Products. High-Performance XMC FPGAs, XMC 10gB Ethernet, and XMC Carrier Cards. XMC FPGAs. FPGA Extension I/O Modules.

XMC Products. High-Performance XMC FPGAs, XMC 10gB Ethernet, and XMC Carrier Cards. XMC FPGAs. FPGA Extension I/O Modules. E M B E D D E D C O M P U T I N G & I / O S O L U T I O N S XMC Products XMC FPGAs FPGA Extension I/O Modules XMC 10gB Ethernet XMC Carrier Cards XMC Software Support High-Performance XMC FPGAs, XMC 10gB

More information

Architectural Options for LPDDR4 Implementation in Your Next Chip Design

Architectural Options for LPDDR4 Implementation in Your Next Chip Design Architectural Options for LPDDR4 Implementation in Your Next Chip Design Marc Greenberg, Director, DDR Product Marketing, Synopsys JEDEC Mobile & IOT Forum Copyright 2016 Synopsys, Inc. Introduction /

More information

Chapter 8 Memory Basics

Chapter 8 Memory Basics Logic and Computer Design Fundamentals Chapter 8 Memory Basics Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Memory definitions Random Access

More information

CS 43: Computer Networks The Link Layer. Kevin Webb Swarthmore College November 28, 2017

CS 43: Computer Networks The Link Layer. Kevin Webb Swarthmore College November 28, 2017 CS 43: Computer Networks The Link Layer Kevin Webb Swarthmore College November 28, 2017 TCP/IP Protocol Stack host host HTTP Application Layer HTTP TCP Transport Layer TCP router router IP IP Network Layer

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

Memory. From Chapter 3 of High Performance Computing. c R. Leduc

Memory. From Chapter 3 of High Performance Computing. c R. Leduc Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor

More information

25G Ethernet CFI. Final Draft Brad Booth, Microsoft. Photo courtesy of Hugh Barrass

25G Ethernet CFI. Final Draft Brad Booth, Microsoft. Photo courtesy of Hugh Barrass 25G Ethernet CFI Final Draft Brad Booth, Microsoft Photo courtesy of Hugh Barrass Objectives To gauge the interest in starting a study group to investigate a 25 Gigabit Ethernet project Don t need to:

More information

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK] Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM

More information

CISCO CATALYST 4500-X SERIES FIXED 10 GIGABIT ETHERNET AGGREGATION SWITCH DATA SHEET

CISCO CATALYST 4500-X SERIES FIXED 10 GIGABIT ETHERNET AGGREGATION SWITCH DATA SHEET CISCO CATALYST 4500-X SERIES FIXED 10 GIGABIT ETHERNET AGGREGATION SWITCH DATA SHEET ROUTER-SWITCH.COM Leading Network Hardware Supplier CONTENT Overview...2 Appearance... 2 Key Features and Benefits...2

More information

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure Overview In order to complete the datapath for your insert-name-here machine, the register file and ALU that you designed in checkpoint

More information

P51: High Performance Networking

P51: High Performance Networking P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed

More information

Flash Controller Solutions in Programmable Technology

Flash Controller Solutions in Programmable Technology Flash Controller Solutions in Programmable Technology David McIntyre Senior Business Unit Manager Computer and Storage Business Unit Altera Corp. dmcintyr@altera.com Flash Memory Summit 2012 Santa Clara,

More information