Intro to SKARAB for programmers
|
|
- Mervyn Webb
- 6 years ago
- Views:
Transcription
1 Intro to SKARAB for programmers (and how to use HMC!) Jason Manley 2017 CASPER workshop
2 Hardware
3 Hardware Virtex 7, 690T FPGA 4 Mezzanine sites per SKARAB 2 in front, 2 in back 16 SERDES links per site Designed to early PowerMX standard. Fans over-provisioned, normally run around 20% - 30% rated speed.
4 Hardware Mezzanine cards allow trading off of memory vs IO capacity. Four cards per SKARAB. Only one type of off-chip memory currently available on SKARAB: HMC. HMC replaces QDR/SRAM and also DRAM found on previous CASPER boards. 40G mezzanine card offers 4x40G QSFP Ethernet ports, can drive optics or copper. No more complicated, flaky PHY chips that need firmware loaded to function properly. An ADC is now also available, with other cards to follow.
5 Hardware: HMC Mezzanine card 1x HMC device per card HMC is 2GiB or 4GiB Two independent interfaces per card: 2x half-width (8 lane) links at 10Gbps per lane. Each link is bi-directional. Up to 160Gbps throughput per card.
6 Hardware: QSFP 40G mezzanine card Quad 40G QSFP Ethernet card PHY-less (purely passive). Does have a little micro processor for SFP management (power, temp etc). Able to drive optics directly. Tested with up to 7m passive cables. Recommend AOC (Active Optical Cables) for anything 5m and over. Does not currently work in breakout mode with spider/octopus cables. (turning one 40G port into 4x10G ports)
7 Compared to existing CASPER hardware ibob ROACH ROACH-2 SKARAB Logic cells 53K 94K 476K 693K DSP slices BRAM capacity 4.2Mb 8.8Mb 38Mb 53Mb SRAM capacity 2x18Mb 2x36Mb 4x144Mb 9Gbps 43Gbps 200Gbps - 1x8Gb 1x16Gb SRAM bandwidth DDR capacity (max) DDR bandwidth (total) Ethernet ports HMC < 8x 32Gib 8x 30Gbps R+W - 38Gbps 50Gbps 2x 10G 4x10G 8x10G < 16x40G
8 Hardware Uses the JASPER flow, not the traditional CASPER flow. Python now forms the backend for managing: busses Yellowblock Backend is Xilinx VIVADO, not ISE (hard break at Virtex-6/ROACH-2; no overlapping tool support). (recall Wesley s JASPER/VIVADO in talk on Monday) SKARAB incorporates all the lessons-learnt from SKA-SA s sizable deployments of ibob/bee2, ROACH-1 and ROACH-2s. After compiling a bitstream, interacting with a SKARAB from a network-attached control computer using any of the standard tools is the same as working with any previous CASPER hardware. But it is quite different under-the-hood...
9 Remotely controlling SKARABs Previous CASPER boards (ibobs, BEE2s, ROACH1s, ROACH2s) all had out of band management ports (separate 100Mbps or 1G Ethernet ports from the 10G data ports). SKARAB can do everything in-band: data, management as well as (re)programming Eventually over any network interface, But currently only over 1G port or first 40G port. Work in progress! SKARAB does not have a separate management processor. It uses a lightweight on-fpga softcore MicroBlaze. Microblaze is reloaded whenever FPGA is reprogrammed Process must be robust, and managed carefully, to avoid losing comms to boards. Simpler setup and maintenance: Just need a power cable and network cable to each SKARAB. Network appliance: No need for managing boot servers, Linux filesystems etc Entire platform can be managed remotely, including upgrading all firmware over network. Designed for large-scale deployments (MeerKAT, with an eye on SKA).
10 SKARAB startup sequencing Onboard flash memory ships with two (space for up to four) bitstreams pre-loaded. Golden Image and Multiboot Image Exactly same bitstream; Tries to boot multiboot image quickly. If that fails, falls back to golden image more slowly. You can load your own images here, if you want, but that s not the idea Most large CASPER deployments have a control computer on the network to configure the FPGA boards. SKARAB is designed to work in this environment. Host computer stores your various bitstreams. So, when SKARAB boots, loads flash image, asks for DHCP. Server then knows about new SKARAB board on network, and can load whichever DSP gateware image, configure registers and set it to work. Default is for DHCP on all network ports on startup. (SKARAB wants DHCP server. Hard-coding IP addresses in your bitstreams no longer so easy.) Hostname support, for example, skarab LLDP support (boards announce themselves to switches) MAC addresses are based on serial number and network port. First 40G port has hostname skarab , with MAC 06:50:02:03:02:01 After loading DSP bitstream, network interfaces flap and a new DHCP transaction ensues. Depending on your DHCP server and network (switch), can take a few seconds to bring link back up.
11 What s working? Working Not (yet) working Basic JASPER toolflow Legacy CASPER toolflow (and never will) Polling sensors (power, temp, fans etc) Automatic fan speed control HMC Mezzanine cards Retrieval of logs for hardware errors First 40G ethernet port Arbitrary combinations of Ethernet and HMC cards 1G ethernet port Onboard USB JTAG bridge Remote reprogramming and control Fast (~1 second) remote reloading of FPGA gateware Remote updates (flash firmware) Large wishbone bus (timing implications; WIP) DHCP, LLDP, ARP, PING and other network services Comprehensive DRC during compile Python casperfpga interfaces (mostly; WIP)
12 Tips for designs Keep to the UDP port compiled-in to your yellowblock for all your high-speed traffic. Else, can overwhelm microblaze with traffic; especially problematic while trying to reprogram. Yellowblock default is to use 7148 (SPEAD default at SKA-SA). Don t ever use: 7778 decimal (0x1e62); that s for controlling the microblaze decimal (0x7148); that s used for reprogramming. In the event of a network failure at startup, SKARAB will try indefinitely to get a DHCP lease. LEDs on front panel indicate DHCP success on golden image (useful for basic/visual debugging). Check for updates regularly. Development s very fluid at the moment, and nothing is stable yet. Current bus architecture limitations prevent very large numbers of attachments (~50 slaves ok). Good news is that V7 seems to have much better routing resources, especially when building large BRAMs. Timing much easier for large FFTs and snapshot blocks than on V6. Large designs easily meet timing at 240MHz. You ll get to play with all this stuff during Adam s SKARAB tutorials.
13 HMC memory What is Hybrid Memory Cube? Stacked DRAM on a chip, with a built-in management layer. Designed and optimised for very high throughput, not low-latency. Perfect for RA instrumentation! HMC takes care of itself, including error detection on memory cells and IO operations. Don t have to deal with refreshes, bank management etc in FPGA controller anymore. HMC contains smarts... has buffers and a small ALU. (can build accumulator inside the memory!) External interface is high speed serial ( SERDES ) links. HMC supports up to 4 sets of bidirectional 16-lane links, with each lane operating up to 15Gbps... That s up to 1.9Tbps. It s FAST! Micron already on 3rd generation HMC. SKARAB uses 2nd generation at lower speeds.
14 Accessing HMC memory Yellowblock packages your instructions (read/write) into flits. A flit is a packet containing a header (instruction) and data (see HMC datasheet for details). Fortunately, all of this is abstracted-away for user; Yellowblock makes HMC look like a conventional memory interface. Each HMC yellowblock offers two dual-ported interfaces. Simultaneous read and write operations are combined into a single flit. Memory is organised into Vaults, Banks and DRAMs. The controller allows you to arbitrarily map these into your address bits. By default, SKARAB s implementation optimises for linear reads and writes. a26... a8 a7 a6 a5 a4 a3 a2 a1 a0 D19... D0 B3 B2 B1 B0 V3 V2 V1 V0 Yellowblock accesses 256 bits at a time, and presents a 256 bit bus. One clock cycle per read&/write request No need for burst reads or writes: truly random access possible.
15 Accessing HMC memory Yellowblock packages your instructions (read/write) into flits. A flit is a packet containing a header (instruction) and data (see HMC datasheet for details). Fortunately, all of this is abstracted-away for user; Yellowblock makes HMC look like a conventional memory interface. Each HMC yellowblock offers two dual-ported interfaces. Simultaneous read and write operations are combined into a single flit. Memory is organised into Vaults, Banks and DRAMs. The controller allows you to arbitrarily map these into your address bits. By default, SKARAB s implementation optimises for linear reads and writes. a26... a8 a7 a6 a5 a4 a3 a2 a1 a0 D19... D0 B3 B2 B1 B0 V3 V2 V1 V0 Yellowblock accesses 256 bits at a time, and presents a 256 bit bus. One clock cycle per read&/write request No need for burst reads or writes: truly random access possible.
16 HMC vaults and links There are 16 vaults per HMC device. Four are co-located with each link (collection of SERDES lanes). They are interconnected on-chip using a switched network, so any link can access any vault. Naturally, accessing co-located memory is faster than hopping through the switches to get to memory located on other links. Mapping is as you d expect: Link 1: vaults 0,1,2,3 Link 2: vaults 4,5,6,7 Link 3: vaults 8,9,10,11 Link 4: vaults 12,13,14,15 SKARAB has links 2 and 3 connected. Thus, half the memory can be accessed locally, incurring minimum latency. Accessing remote vaults (0-3 and 12-15) will incur additional latency, but the switching network is full crossbar (no reduction in bandwidth).
17 HMC: More on vaults To increase throughput, data must be striped over multiple Vaults. Maximum throughput performance requires you to use all vaults. Each vault has a buffer for transactions. If you keep accessing the same vault continuously, operations will queue and performance will degrade. NNB for matrix-transpose (corner-turner). Vaults operate semi-autonomously, and respond as quickly as they can. Latency, throughput and order of operations thus not guaranteed. You can issue a request to vault 1 and then another to vault 2, and get the response back from vault 2 first and then the reply from vault 1 some time later. Performance heavily dependent upon your access patterns. To keep track of your read requests, you issue 9-bit tags with each read request. Responses contain your tags so you can sort them out again. This can complicate things enormously. Data is also cached in the HMC, so if you issue the same read request twice, you get the second response back very quickly, and possibly before many earlier read requests. Typical latency: ~80 FPGA clock cycles (230MHz) in VACC applications. Typical out-of-order: ranges from 0 to ~230, depending on access patterns and speeds.
18 HMC yellowblock HMC controller automatically performs POST upon startup. After POST, HMC monitors itself. 6-bit error code reported in event of failure during operation. Checks include: flit (SERDES comms) errors ECC in DRAM core Buffer overruns Internal logic errors For best performance: linear access, simultaneous read and write flits Higher-level HMC blocks available in DSP library: Wideband, programmable delay line Corner-Turner (matrix transpose) Vector-accumulator (buffered, with backpressure)
19 HMC conclusions & considerations Latency through the chip is not guaranteed. Throughput is not guaranteed, and depends on access patterns. No SKARAB support yet for special instructions (just basic read&write). Most applications will need a reorder block after the HMC to deal with out-of-order responses. If you re doing reads and writes, issue these instructions simultaneously.
20 40G ethernet core, forty_gbe Yellowblock interface exactly like the 10G ethernet core, but with 256b interfaces instead of 64b interfaces. 40G core now does proper RX CRC checking (uses a lot of HW resources, though). No longer managed by tcpborphserver and tgtap software process on PPC. Microblaze softcore manages all network services. Features in place already: DHCP with auto-renew and hostname support based on serial number LLDP reporting and discovery ARP Ping Multicast TX and RX, including subscription to multiple sequential addresses. IGMPv2 signalling. As with 10G core, multicasting RX uses bitmask arrangement. Can only subscribe to contiguous chunks of 2^N addresses. Current status, limitations and work in progress: At the moment, 40G yellowblock is hard-coded for the first QSFP port on the third mezzanine site. 40G yellowblock currently pulls-in microblaze infrastructure, so all designs must contain a 40G core, even if you re not using it!
21 40G Ethernet and HMC resources Hardware resources for 40G ethernet and HMC cores: Total available Per 40G port Per HMC mezzanine card Slices (3.1%) (13.1%) BRAM (1.7%) 116 (7.9%) DSP (0%) 4 (0.1%)
22 Questions & Comments Jason Manley
JASPER and the SKARAB. Wesley New 2017 CASPER workshop
JASPER and the SKARAB Wesley New 2017 CASPER workshop Hardware Hardware: SKARAB Motherboard Peralex in conjunction with SKA-SA have designed the SKARAB. Based on the Virtex 7, 690T FPGA 53Mb BRAM 3600
More informationBittWare s XUPP3R is a 3/4-length PCIe x16 card based on the
FPGA PLATFORMS Board Platforms Custom Solutions Technology Partners Integrated Platforms XUPP3R Xilinx UltraScale+ 3/4-Length PCIe Board with Quad QSFP and 512 GBytes DDR4 Xilinx Virtex UltraScale+ VU7P/VU9P/VU11P
More informationJason Manley. Internal presentation: Operation overview and drill-down October 2007
Jason Manley Internal presentation: Operation overview and drill-down October 2007 System overview Achievements to date ibob F Engine in detail BEE2 X Engine in detail Backend System in detail Future developments
More informationSpring 2018 :: CSE 502. Main Memory & DRAM. Nima Honarmand
Main Memory & DRAM Nima Honarmand Main Memory Big Picture 1) Last-level cache sends its memory requests to a Memory Controller Over a system bus of other types of interconnect 2) Memory controller translates
More informationVXS-621 FPGA & PowerPC VXS Multiprocessor
VXS-621 FPGA & PowerPC VXS Multiprocessor Xilinx Virtex -5 FPGA for high performance processing On-board PowerPC CPU for standalone operation, communications management and user applications Two PMC/XMC
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More informationHybrid Memory Cube (HMC)
23 Hybrid Memory Cube (HMC) J. Thomas Pawlowski, Fellow Chief Technologist, Architecture Development Group, Micron jpawlowski@micron.com 2011 Micron Technology, I nc. All rights reserved. Products are
More informationCompute Node Design for DAQ and Trigger Subsystem in Giessen. Justus Liebig University in Giessen
Compute Node Design for DAQ and Trigger Subsystem in Giessen Justus Liebig University in Giessen Outline Design goals Current work in Giessen Hardware Software Future work Justus Liebig University in Giessen,
More informationUCT Software-Defined Radio Research Group
UCT Software-Defined Radio Research Group UCT SDRRG Team UCT Faculty: Alan Langman Mike Inggs Simon Winberg PhD Students: Brandon Hamilton MSc Students: Bruce Raw Gordon Inggs Simon Scott Joseph Wamicha
More informationImplementing Ultra Low Latency Data Center Services with Programmable Logic
Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,
More informationEnabling success from the center of technology. Interfacing FPGAs to Memory
Interfacing FPGAs to Memory Goals 2 Understand the FPGA/memory interface Available memory technologies Available memory interface IP & tools from Xilinx Compare Performance Cost Resources Demonstrate a
More informationVXS-610 Dual FPGA and PowerPC VXS Multiprocessor
VXS-610 Dual FPGA and PowerPC VXS Multiprocessor Two Xilinx Virtex -5 FPGAs for high performance processing On-board PowerPC CPU for standalone operation, communications management and user applications
More informationHigh Bandwidth Electronics
DOE BES Neutron & Photon Detectors Workshop, August 1-3, 2012 Ryan Herbst System Overview What are the standard components in a detector system? Detector/Amplifier & ADC Digital front end - Configure and
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Memory / DRAM SRAM = Static RAM SRAM vs. DRAM As long as power is present, data is retained DRAM = Dynamic RAM If you don t do anything, you lose the data SRAM: 6T per bit
More informationFPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES
MARCO BARTOLINI - BARTOLINI@IRA.INAF.IT TORINO 18 MAY 2016 WORKSHOP: FPGA APPLICATION IN ASTROPHYSICS FPGA APPLICATIONS FOR SINGLE DISH ACTIVITY AT MEDICINA RADIOTELESCOPES TORINO, 18 MAY 2016, INAF FPGA
More informationIntelop. *As new IP blocks become available, please contact the factory for the latest updated info.
A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment
More informationsrio SERIAL BUFFER FLOW-CONTROL DEVICE
SERIAL BUFFER FLOW-CONTROL DEVICE 80KSBR201 Device Overview The IDT80KSBR201 is a high speed Buffer (SerB) that can connect up to two high-speed RapidIO interfaces. This device is built to work with any
More information5051 & 5052 PCIe Card Overview
5051 & 5052 PCIe Card Overview About New Wave New Wave DV provides high performance network interface cards, system level products, FPGA IP cores, and custom engineering for: High-bandwidth low-latency
More informationibob ADC Tutorial CASPER Reference Design
ibob ADC Tutorial Author: Griffin Foster April 14, 2009 (v1.0) Hardware Platforms Used: ibob, iadc FPGA Clock Rate: 100 MHz Sampling Rate: 400 MHz Software Environment: TinySH This tutorial walks through
More informationSMT943 APPLICATION NOTE 1 APPLICATION NOTE 1. Application Note - SMT372T and SMT943.doc SMT943 SUNDANCE MULTIPROCESSOR TECHNOLOGY LTD.
APPLICATION NOTE 1 Application Note - SMT372T + SMT943 SMT943 SUNDANCE MULTIPROCESSOR TECHNOLOGY LTD. Date Comments / Changes Author Revision 07/07/10 Original Document completed CHG 1 Date 13/05/2010
More informationNetFPGA Hardware Architecture
NetFPGA Hardware Architecture Jeffrey Shafer Some slides adapted from Stanford NetFPGA tutorials NetFPGA http://netfpga.org 2 NetFPGA Components Virtex-II Pro 5 FPGA 53,136 logic cells 4,176 Kbit block
More informationInfiniBand SDR, DDR, and QDR Technology Guide
White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses
More informationInterface electronics
Peter Göttlicher, DESY-FEB, June 11th 2008 1 Interface electronics Links to backend/control implications to mechanical design, to effort in FPGA's Peter Göttlicher, DESY-FEB specifications of signals at
More informationSimplify System Complexity
1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller
More informationSpartan-6 and Virtex-6 FPGA Embedded Kit FAQ
Spartan-6 and Virtex-6 FPGA FAQ February 5, 2009 Getting Started 1. Where can I purchase an Embedded kit? A: You can purchase your Spartan-6 and Virtex-6 FPGA Embedded kits online at: Spartan-6 FPGA :
More informationAn Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs
An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs Architecture optimized for Fast Ultra Long FFTs Parallel FFT structure reduces external memory bandwidth requirements Lengths from 32K to
More informationComputer Systems Architecture I. CSE 560M Lecture 18 Guest Lecturer: Shakir James
Computer Systems Architecture I CSE 560M Lecture 18 Guest Lecturer: Shakir James Plan for Today Announcements No class meeting on Monday, meet in project groups Project demos < 2 weeks, Nov 23 rd Questions
More informationMicrocontroller Systems. ELET 3232 Topic 11: General Memory Interfacing
Microcontroller Systems ELET 3232 Topic 11: General Memory Interfacing 1 Objectives To become familiar with the concepts of memory expansion and the data and address bus To design embedded systems circuits
More informationSimplify System Complexity
Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint
More informationFull Linux on FPGA. Sven Gregori
Full Linux on FPGA Sven Gregori Enclustra GmbH FPGA Design Center Founded in 2004 7 engineers Located in the Technopark of Zurich FPGA-Vendor independent Covering all topics
More informationFPGA Solutions: Modular Architecture for Peak Performance
FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com Agenda Company Overview FPGA
More informationA 400Gbps Multi-Core Network Processor
A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,
More informationSMT9091 SMT148-FX-SMT351T/SMT391
Unit / Module Description: Unit / Module Number: Document Issue Number: Issue Date: Original Author: This Document provides an overview of the developed system key features. SMT148-FX-SMT351T/SMT391 E.Puillet
More informationAn Intelligent NIC Design Xin Song
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational
More informationRFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015
RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,
More informationBlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design
BlazePPS (Blaze Packet Processing System) CSEE W4840 Project Design Valeh Valiollahpour Amiri (vv2252) Christopher Campbell (cc3769) Yuanpei Zhang (yz2727) Sheng Qian ( sq2168) March 26, 2015 I) Hardware
More informationPowerPC on NetFPGA CSE 237B. Erik Rubow
PowerPC on NetFPGA CSE 237B Erik Rubow NetFPGA PCI card + FPGA + 4 GbE ports FPGA (Virtex II Pro) has 2 PowerPC hard cores Untapped resource within NetFPGA community Goals Evaluate performance of on chip
More informationELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II
ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,
More informationLecture 18: DRAM Technologies
Lecture 18: DRAM Technologies Last Time: Cache and Virtual Memory Review Today DRAM organization or, why is DRAM so slow??? Lecture 18 1 Main Memory = DRAM Lecture 18 2 Basic DRAM Architecture Lecture
More informationThe WINLAB Cognitive Radio Platform
The WINLAB Cognitive Radio Platform IAB Meeting, Fall 2007 Rutgers, The State University of New Jersey Ivan Seskar Software Defined Radio/ Cognitive Radio Terminology Software Defined Radio (SDR) is any
More informationIGLOO2 Evaluation Kit Webinar
Power Matters. IGLOO2 Evaluation Kit Webinar Jamie Freed jamie.freed@microsemi.com August 29, 2013 Overview M2GL010T- FG484 $99* LPDDR 10/100/1G Ethernet SERDES SMAs USB UART Available Demos Small Form
More informationOutline of Presentation Field Programmable Gate Arrays (FPGAs(
FPGA Architectures and Operation for Tolerating SEUs Chuck Stroud Electrical and Computer Engineering Auburn University Outline of Presentation Field Programmable Gate Arrays (FPGAs( FPGAs) How Programmable
More informationApplication Note for EVP
Sundance Multiprocessor Technology Limited Application Note Form : QCF32 Date : 11 Februay 2009 Unit / Module Description: SMT111-SMT372T-SMT946 Unit / Module Number: Document Issue Number: 1.0 Issue Date:
More informationWilliam Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory Semiconductor Memory Types Semiconductor Memory RAM Misnamed as all semiconductor memory is random access
More informationFCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture
More informationSophon SC1 White Paper
Sophon SC1 White Paper V10 Copyright 2017 BITMAIN TECHNOLOGIES LIMITED All rights reserved Version Update Content Release Date V10-2017/10/25 Copyright 2017 BITMAIN TECHNOLOGIES LIMITED All rights reserved
More informationAddendum to Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches
Addendum to Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches Gabriel H. Loh Mark D. Hill AMD Research Department of Computer Sciences Advanced Micro Devices, Inc. gabe.loh@amd.com
More informationMulti-core microcontroller design with Cortex-M processors and CoreSight SoC
Multi-core microcontroller design with Cortex-M processors and CoreSight SoC Joseph Yiu, ARM Ian Johnson, ARM January 2013 Abstract: While the majority of Cortex -M processor-based microcontrollers are
More informationLecture 16: On-Chip Networks. Topics: Cache networks, NoC basics
Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationOrganization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition
William Stallings Computer Organization and Architecture 6th Edition Chapter 5 Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization 5.1 Semiconductor Main Memory
More informationSimplifying FPGA Design for SDR with a Network on Chip Architecture
Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen
More informationHigh-Speed NAND Flash
High-Speed NAND Flash Design Considerations to Maximize Performance Presented by: Robert Pierce Sr. Director, NAND Flash Denali Software, Inc. History of NAND Bandwidth Trend MB/s 20 60 80 100 200 The
More information10/24/2016. Let s Name Some Groups of Bits. ECE 120: Introduction to Computing. We Just Need a Few More. You Want to Use What as Names?!
University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering ECE 120: Introduction to Computing Memory Let s Name Some Groups of Bits I need your help. The computer we re going
More informationViews of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)
CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so
More informationCS Computer Architecture
CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationRiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner
RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and
More informationBasic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types
CSCI 4717/5717 Computer Architecture Topic: Internal Memory Details Reading: Stallings, Sections 5.1 & 5.3 Basic Organization Memory Cell Operation Represent two stable/semi-stable states representing
More informationFive Key Steps to High-Speed NAND Flash Performance and Reliability
Five Key Steps to High-Speed Flash Performance and Reliability Presenter Bob Pierce Flash Memory Summit 2010 Santa Clara, CA 1 NVM Performance Trend ONFi 2 PCM Toggle ONFi 2 DDR SLC Toggle Performance
More informationAdapter Modules for FlexRIO
Adapter Modules for FlexRIO Ravichandran Raghavan Technical Marketing Engineer National Instruments FlexRIO LabVIEW FPGA-Enabled Instrumentation 2 NI FlexRIO System Architecture PXI/PXIe NI FlexRIO Adapter
More informationChapter 5 Internal Memory
Chapter 5 Internal Memory Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM) Read-write memory Electrically, byte-level Electrically Volatile Read-only memory (ROM) Read-only
More informationAvoid Bottlenecks Using PCI Express-Based Embedded Systems
Avoid Bottlenecks Using PCI Express-Based Embedded Systems Implementing efficient data movement is a critical element in high-performance embedded systems, and the advent of PCI Express has presented us
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (III)
COSC 6385 Computer Architecture - Memory Hierarchies (III) Edgar Gabriel Spring 2014 Memory Technology Performance metrics Latency problems handled through caches Bandwidth main concern for main memory
More informationCOSC 6385 Computer Architecture - Memory Hierarchies (II)
COSC 6385 Computer Architecture - Memory Hierarchies (II) Edgar Gabriel Spring 2018 Types of cache misses Compulsory Misses: first access to a block cannot be in the cache (cold start misses) Capacity
More informationComputer Systems Laboratory Sungkyunkwan University
DRAMs Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Main Memory & Caches Use DRAMs for main memory Fixed width (e.g., 1 word) Connected by fixed-width
More informationMemory System Overview. DMA & Endian-ness. Technology. Architectural. Problem: The Memory Wall
The Memory Wall EE 357 Unit 13 Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology targets density rather than speed) Large memories
More informationIntroduction to Zynq
Introduction to Zynq Lab 2 PS Config Part 1 Hello World October 2012 Version 02 Copyright 2012 Avnet Inc. All rights reserved Table of Contents Table of Contents... 2 Lab 2 Objectives... 3 Experiment 1:
More informationCOMPUTER ARCHITECTURES
COMPUTER ARCHITECTURES Random Access Memory Technologies Gábor Horváth BUTE Department of Networked Systems and Services ghorvath@hit.bme.hu Budapest, 2019. 02. 24. Department of Networked Systems and
More information100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21
100 GBE AND BEYOND 2011 Brocade Communications Systems, Inc. Diagram courtesy of the CFP MSA. v1.4 2011/11/21 Current State of the Industry 10 Electrical Fundamental 1 st generation technology constraints
More informationMark Redekopp, All rights reserved. EE 352 Unit 10. Memory System Overview SRAM vs. DRAM DMA & Endian-ness
EE 352 Unit 10 Memory System Overview SRAM vs. DRAM DMA & Endian-ness The Memory Wall Problem: The Memory Wall Processor speeds have been increasing much faster than memory access speeds (Memory technology
More informationIntroduction to High-Speed InfiniBand Interconnect
Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output
More informationSKA Technical developments relevant to the National Facility. Keith Grainge University of Manchester
SKA Technical developments relevant to the National Facility Keith Grainge University of Manchester Talk Overview SKA overview Receptors Data transport and network management Synchronisation and timing
More informationINT G bit TCP Offload Engine SOC
INT 10011 10 G bit TCP Offload Engine SOC Product brief, features and benefits summary: Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx/Altera FPGAs or Structured ASIC flow.
More informationLot # 10 - Servers. 1. Rack Server. Rack Server Server
1. Rack Server Rack Server Server Processor: 1 x Intel Xeon E5 2620v3 (2.4GHz/6 core/15mb/85w) Processor Kit. Upgradable to 2 CPU Chipset: Intel C610 Series Chipset. Intel E5 2600v3 Processor Family. Memory:
More informationXu Wang Hardware Engineer Facebook, Inc.
Hardware Overview Xu Wang Hardware Engineer Facebook, Inc. An Improved Wedge 100 19-in SKU For regular rack 21-in SKU for Open Rack Open 32x100GE TOR Switch Facebook s second generation Open TOR Switch
More information100% PACKET CAPTURE. Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms. Up to 200Gbps
100% PACKET CAPTURE Intelligent FPGA-based Host CPU Offload NIC s & Scalable Platforms Up to 200Gbps Dual Port 100 GigE ANIC-200KFlex (QSFP28) The ANIC-200KFlex FPGA-based PCIe adapter/nic features dual
More information08 - Address Generator Unit (AGU)
October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem
More informationJakub Cabal et al. CESNET
CONFIGURABLE FPGA PACKET PARSER FOR TERABIT NETWORKS WITH GUARANTEED WIRE- SPEED THROUGHPUT Jakub Cabal et al. CESNET 2018/02/27 FPGA, Monterey, USA Packet parsing INTRODUCTION It is among basic operations
More informationAll Programmable: from Silicon to System
All Programmable: from Silicon to System Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates Variability Page 3 Industry Debates on Cost Page 4
More informationExtreme TCP Speed on GbE
TOE1G-IP Introduction (Xilinx) Ver1.1E Extreme TCP Speed on GbE Design Gateway Page 1 Agenda Advantage and Disadvantage of TCP on GbE TOE1G-IP core overview TOE1G-IP core description Initialization High-speed
More informationQuiXilica V5 Architecture
QuiXilica V5 Architecture: The High Performance Sensor I/O Processing Solution for the Latest Generation and Beyond Andrew Reddig President, CTO TEK Microsystems, Inc. Military sensor data processing applications
More informationSystem-on Solution from Altera and Xilinx
System-on on-a-programmable-chip Solution from Altera and Xilinx Xun Yang VLSI CAD Lab, Computer Science Department, UCLA FPGAs with Embedded Microprocessors Combination of embedded processors and programmable
More informationComputer Organization. 8th Edition. Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM)
More informationHigh Performance Embedded Applications. Raja Pillai Applications Engineering Specialist
High Performance Embedded Applications Raja Pillai Applications Engineering Specialist Agenda What is High Performance Embedded? NI s History in HPE FlexRIO Overview System architecture Adapter modules
More informationS2C K7 Prodigy Logic Module Series
S2C K7 Prodigy Logic Module Series Low-Cost Fifth Generation Rapid FPGA-based Prototyping Hardware The S2C K7 Prodigy Logic Module is equipped with one Xilinx Kintex-7 XC7K410T or XC7K325T FPGA device
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationUnderstanding the TOP Server ControlLogix Ethernet Driver
Understanding the TOP Server ControlLogix Ethernet Driver Page 2 of 23 Table of Contents INTRODUCTION 3 UPDATE RATES AND TAG REQUESTS 4 CHANNEL AND DEVICE CONFIGURATION 7 PROTOCOL OPTIONS 9 TAG GENERATION
More informationXMC Products. High-Performance XMC FPGAs, XMC 10gB Ethernet, and XMC Carrier Cards. XMC FPGAs. FPGA Extension I/O Modules.
E M B E D D E D C O M P U T I N G & I / O S O L U T I O N S XMC Products XMC FPGAs FPGA Extension I/O Modules XMC 10gB Ethernet XMC Carrier Cards XMC Software Support High-Performance XMC FPGAs, XMC 10gB
More informationArchitectural Options for LPDDR4 Implementation in Your Next Chip Design
Architectural Options for LPDDR4 Implementation in Your Next Chip Design Marc Greenberg, Director, DDR Product Marketing, Synopsys JEDEC Mobile & IOT Forum Copyright 2016 Synopsys, Inc. Introduction /
More informationChapter 8 Memory Basics
Logic and Computer Design Fundamentals Chapter 8 Memory Basics Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Overview Memory definitions Random Access
More informationCS 43: Computer Networks The Link Layer. Kevin Webb Swarthmore College November 28, 2017
CS 43: Computer Networks The Link Layer Kevin Webb Swarthmore College November 28, 2017 TCP/IP Protocol Stack host host HTTP Application Layer HTTP TCP Transport Layer TCP router router IP IP Network Layer
More informationDesign of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture
Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and
More informationMemory. From Chapter 3 of High Performance Computing. c R. Leduc
Memory From Chapter 3 of High Performance Computing c 2002-2004 R. Leduc Memory Even if CPU is infinitely fast, still need to read/write data to memory. Speed of memory increasing much slower than processor
More information25G Ethernet CFI. Final Draft Brad Booth, Microsoft. Photo courtesy of Hugh Barrass
25G Ethernet CFI Final Draft Brad Booth, Microsoft Photo courtesy of Hugh Barrass Objectives To gauge the interest in starting a study group to investigate a 25 Gigabit Ethernet project Don t need to:
More informationAdapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]
Lecture 17 Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] SRAM / / Flash / RRAM / HDD SRAM / / Flash / RRAM/ HDD SRAM
More informationCISCO CATALYST 4500-X SERIES FIXED 10 GIGABIT ETHERNET AGGREGATION SWITCH DATA SHEET
CISCO CATALYST 4500-X SERIES FIXED 10 GIGABIT ETHERNET AGGREGATION SWITCH DATA SHEET ROUTER-SWITCH.COM Leading Network Hardware Supplier CONTENT Overview...2 Appearance... 2 Key Features and Benefits...2
More informationCS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure
CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure Overview In order to complete the datapath for your insert-name-here machine, the register file and ALU that you designed in checkpoint
More informationP51: High Performance Networking
P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed
More informationFlash Controller Solutions in Programmable Technology
Flash Controller Solutions in Programmable Technology David McIntyre Senior Business Unit Manager Computer and Storage Business Unit Altera Corp. dmcintyr@altera.com Flash Memory Summit 2012 Santa Clara,
More information