Offloading Collective Operations to Programmable Logic
|
|
- Duane Richards
- 5 years ago
- Views:
Transcription
1 Offloading Collective Operations to Programmable Logic Martin Swany (With thanks to Omer Arap) Center for Research in Extreme Scale Computing (CREST) Department of Intelligent Systems Engineering Indiana University, Bloomington, IN
2 FPGAs Should Be Used As MPI Accelerators I know MPI accelerators. We have the best MPI acclerators. - DT Martin Swany Center for Research in Extreme Scale Computing (CREST) Department of Intelligent Systems Engineering Indiana University, Bloomington, IN
3 Overview We have developed an architecture and implementation for offloading collective operations to programmable logic in the communication substrate Not the first to use FPGAs for collectives It is clear that there is a tradeoff between performance and generality FPGAs are at one end of that spectrum Some folks live at the other end Programmable logic is burgeoning Xilinx Zynq Intel Xeon+Altera (HARP) Mellanox ConnectX-4 + FPGA FPGAs on other NICs and in switches
4 Programming FPGAs The programmable logic provided by FPGAs is a powerful option for creating task-specific functionality for applications Programming with FPGAs is not easy Many agree: Accelerate common functionality Others want to make it easy to use generalpurpose toolchains to deploy arbitrary kernels Jeff and OpenACC Franck s workshop New NSF call for ways to make it easy It may not ever be easy
5 Things that go bump in the wire We have been exploring a (relatively) generic approach for scenarios in which there is programmable logic in the communication pipeline Addresses the problem of how to get data to and from the device A bump in the wire Rich Graham clearly got this phrase from us In teaching how to use FPGAs, we have found this model to be quite useful Students who haven t done Verilog can get to this in a semester Important topic in Indiana U. s new engineering program This also maps to a growing use case
6 Collective Offload Offloading collective logic is a common technique in various platforms to improve performance Mellanox s CORE-Direct is one such state of the art collective operation offload framework. Defines primitive tasks: send, receive, wait, binary calculations. Collective algorithm defines list of tasks and tasks are performed by the NIC without CPU involvement Collective offload systems have been done many times, but ours is easy to use and is a starting point
7 Goals of FPGA-based Offload Reduce collective operation latency and variance Provide selection of collective algorithms that are implemented in hardware Implement collective algorithms independent of end-to-end communication protocols Utilize hardware-level, protocol-independent multicasting to optimize, and when possible, redesign major collective operation algorithms Support collective operation offload for different classes of collective operations
8 NetFPGA A line-rate, flexible, open networking platform for teaching and research (a) NetFPGA SUME Board Fig. 1. NetFPGA SUME Board and Block Diagram
9 Xilinx Zynq Processing System Flash Controller NOR, NAND, SRAM, Quad SPI Multiport DRAM Controller DDR3, DDR3L, DDR2 2x SPI AMBA Interconnect AMBA Interconnect Processor I/O Mux EMIO 2x I2C 2x CAN 2x UART GPIO 2x SDIO with DMA 2x USB with DMA 2x GigE with DMA XADC 2x ADC, Mux, Thermal Sensor ARM CoreSight Multi-Core Debug and Trace NEON DSP/FPU Engine Cortex - A9 MPCore 32/32 KB I/D Caches General Purpose AXI Ports 512 Kbyte L2 Cache General Interrupt Controller Configuration Timers AMBA Interconnect Watchdog Timer DMA Snoop Control Unit Security AES, SHA, RSA NEON DSP/FPU Engine Cortex- A9 MPCore 32/32 KB I/D Caches ACP Programmable Logic (System Gates, DSP, RAM) 256 Kbyte On-Chip Memory AMBA Interconnect High Performance AXI Ports PCIe Gen2 1-8 Lanes Multi-Standard I/Os (3.3V & High-Speed 1.8V) Multi-Gigabit Transceivers
10 Zedboard & Ethernet FMC
11 Collective Offload Design Ease implementation with the bump in the wire model Host offloads collective to the Programmable Logic (PL) by sending a specially crafted UDP packet Simplest solution Host blocks until the PL generates a release message, which is also a UDP packet
12 Collective Offload Packet dst_mac src_mac_1 src_mac_2 type ver IHL Diff_Serv Total_Length Identification flags fragment_offset TTL Protocol Hdr_Cksum src_ip dst_ip_1 dst_ip_2 UDP_Source_Port UDP_Dest_POrt Length UDP_checksum coll_id comm_size coll_type algo_type node_type msg_type rank local_rank_count operation data_type count
13 AXI-Based Platforms Advanced extensible Interface (AXI) standards define intermodule communication in programmable logic Widely utilized in recent Xilinx FPGA designs Recent designs depend on AXI-based protocols AXIS (AXI-Stream) is for streaming data between modules. We employ the AXIS protocol in our design Support for multiple ranks per host Provides insight how our design scales
14 Implementation Modules inherited from the NetFPGA package Input Arbiter, Output Queues Modules from Xilinx IP catalog AXI Ethernet, AXI DMA, AXI Width Converter, FPU, AXIS Data FIFO New Modules Stream Separator Stream Aggregator Collective Forwarding Engine (CFE) Collective Instruction Processing Engine (CIPE) Implements MPI_Allreduce, MPI_Reduce, MPI_Barrier, MPI_Scan, MPI_Alltoall
15 Data Path AXI DMA AXI Ethernet AXI Ethernet AXI Ethernet AXI Ethernet 32-b ctrl 32-b data 32-b ctrl 32-b data 32-b ctrl 32-b data 32-b ctrl 32-b data 32-b ctrl 32-b data Stream Aggregator Stream Aggregator Stream Aggregator Stream Aggregator Stream Aggregator 32-bit 32-bit 32-bit 32-bit 32-bit Width Converter Width Converter Width Converter Width Converter Width Converter 64-bit 64-bit 64-bit 64-bit 64-bit Input Arbiter 64-bit CFE (Collective Forwarding Engine) 64-bit CIPE (Collective Instruction Processing Engine) 64-bit Output Queues 64-bit 64-bit 64-bit 64-bit 64-bit Width Converter Width Converter Width Converter Width Converter Width Converter 32-bit 32-bit 32-bit 32-bit 32-bit Stream Separator Stream Separator Stream Separator Stream Separator Stream Separator 32-b ctrl 32-b data 32-b ctrl 32-b data 32-b ctrl 32-b data 32-b ctrl 32-b data 32-b ctrl 32-b data AXI DMA AXI Ethernet AXI Ethernet AXI Ethernet AXI Ethernet
16 Collective Forwarding Engine (CFE) Heart of the design Responsible for: Assessing the offload request header fields. Detecting collective operation and algorithm to run Perform state transitions and save the state for algorithms Based on the state, make forwarding decisions Determine the instructions which the CIPE will apply
17 Collective Instruction Processing Engine (CIPE) CFE attaches 64-bit instruction header CIPE processes the instruction Example Instructions Forward packet, do nothing Buffer packet in specified buffer Perform reduction operations on incoming packet and specified buffers Buffer or not buffer the result Perform reduction operations on specified buffers and ignore incoming packet Buffer or not buffer the result Ignore incoming packet and forward data on specified buffer
18 Collective Instruction Processing Engine (CIPE) Interfaces with FPU and FIFO cores FPU cores for streaming and applying floating point arithmetic FIFO cores for streaming intermediate results of single cycle arithmetic operations MIN, MAX, BOR, LOR, BAND, LAND, BXOR, LXOR, Integer SUM Internal BRAM buffers
19 Collective Instruction Processing Engine (CIPE) Op1_1 Op1_2 FPU 1 Op2_1 Op2_2 FPU 2 Op3_1 Op3_2 FPU 3 Op4_1 CFE Instruction Processor Op4_2 RSLT1 RSLT2 RSLT3 RSLT4 FPU 4 FIFO 1 FIFO 2 FIFO 3 FIFO 4 Output Queues Buffer 1 Buffer 2 Buffer 3 Buffer 4
20 Collective Instruction Processing Engine (CIPE) Fully pipelined design FPU cores can be configured to produce results in 1 cycle Due to observed instabilities, we increased the first result cycle to 11 (the maximum) Each extra FPU core introduces extra cycle Each FIFO core employed introduces an extra cycle
21 Evaluation Experimental Setup 8 Zynq Zedboards with EthernetFMC adapters Directly connected Come see it at SC
22 Zynq Cluster MPI_Allreduce ARM&Host&vs&PL&Offloaded&(Msg&Size&=&8&byte)& Speedup"Ave" Speedup"Min" ARM"Ave" ARM"Min" PL"Ave" PL"Min" 12000" 12" ARM&Host&vs&PL&Offloaded&(Msg&Size&=&128&byte)& Speedup"Ave" Speedup"Min" ARM"Ave" ARM"Min" PL"Ave" PL"Min" 12000" 12" 10000" 10" 10000" 10" Time&(usecs)& 8000" 6000" 4000" 8" 6" 4" Speedup& Time&(usecs)& 8000" 6000" 4000" 8" 6" 4" Speedup& 2000" 2" 2000" 2" 0" 8" 16" 32" 64" 128" Communicator&Size& 0" 0" 8" 16" 32" 64" 128" Communicator&Size& 0" ARM&Host&vs&PL&Offloaded&(Msg&Size&=&1024&Byte)& Speedup"Ave" Speedup"Min" ARM"Ave" ARM"Min" PL"Ave" PL"Min" 12000" 12" 10000" 10" Time&(usecs)& 8000" 6000" 4000" 8" 6" 4" Speedup& 2000" 2" 0" 8" 16" 32" 64" 128" Communicator&Size& 0"
23 Zynq Cluster MPI_Allreduce Ave,)Min)Difference:)ARM)vs)PL)Msg)(Size)=)8B)) Ave,)Min)Difference:)ARM)vs)PL)Msg)(Size)=)128B)) ARM"%"Diff" PL"%"Diff" ARM"Diff" PL"Diff" ARM"%"Diff" PL"%"Diff" ARM"Diff" PL"Diff" 1800" 30" 1800" 30" 1600" 1400" 25" 1600" 1400" 25" Time)(usecs)) 1200" 1000" 800" 600" 20" 15" 10" Percentage) Time)(usecs)) 1200" 1000" 800" 600" 20" 15" 10" Percentage) 400" 200" 5" 400" 200" 5" 0" 8" 16" 32" 64" 128" Communicator)Size) 0" 0" 8" 16" 32" 64" 128" Communicator)Size) 0" Time)(usecs)) 1800" 1600" 1400" 1200" 1000" 800" 600" 400" 200" 0" Ave,)Min)Difference:)ARM)vs)PL)Msg)(Size)=)1024B)) ARM"%"Diff" PL"%"Diff" ARM"Diff" PL"Diff" 8" 16" 32" 64" 128" Communicator)Size) 30" 25" 20" 15" 10" 5" 0" Percentage)
24 Zynq Cluster MILC Perentage( 20" 18" 16" 14" 12" 10" 8" 6" 4" 2" 0" MILC(4(Clover(Dynamical( %"speedup"(min"results)" %"speedup"(average"results)" 8" 16" 32" 64" Communicator(Size( 45.00# %#speedup#(min#results)# MILC(4(Schroed_cl_inv( %#speedup#(average#results)# Perentage( 40.00# 35.00# 30.00# 25.00# 20.00# 15.00# 10.00# 5.00# 0.00# 8# 16# 32# 64# Communicator(Size(
25 Conclusions and Future Work Empirical validation of the efficacy of collective offload Promising performance Increasing availability of programmable logic stands to increase impact The use of UDP allowed straightforward implementation, but is limiting Clearly, this the right way to use FPGAs 25
26
Zynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationDesign Choices for FPGA-based SoCs When Adding a SATA Storage }
U4 U7 U7 Q D U5 Q D Design Choices for FPGA-based SoCs When Adding a SATA Storage } Lorenz Kolb & Endric Schubert, Missing Link Electronics Rudolf Usselmann, ASICS World Services Motivation for SATA Storage
More informationOffloading MPI Parallel Prefix Scan (MPI Scan) with the NetFPGA
1 Offloading MPI Parallel Prefix Scan (MPI Scan) with the NetFPGA Omer Arap Martin Swany Center for Research in Extreme Scale Technology Indiana University Bloomington, IN 4745 {omerarap,swany}@crest.iu.edu
More informationCopyright 2016 Xilinx
Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building
More informationZynq Architecture, PS (ARM) and PL
, PS (ARM) and PL Joint ICTP-IAEA School on Hybrid Reconfigurable Devices for Scientific Instrumentation Trieste, 1-5 June 2015 Fernando Rincón Fernando.rincon@uclm.es 1 Contents Zynq All Programmable
More informationMYC-C7Z010/20 CPU Module
MYC-C7Z010/20 CPU Module - 667MHz Xilinx XC7Z010/20 Dual-core ARM Cortex-A9 Processor with Xilinx 7-series FPGA logic - 1GB DDR3 SDRAM (2 x 512MB, 32-bit), 4GB emmc, 32MB QSPI Flash - On-board Gigabit
More informationSoC Platforms and CPU Cores
SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationMYD-C7Z010/20 Development Board
MYD-C7Z010/20 Development Board MYC-C7Z010/20 CPU Module as Controller Board Two 0.8mm pitch 140-pin Connectors for Board-to-Board Connections 667MHz Xilinx XC7Z010/20 Dual-core ARM Cortex-A9 Processor
More informationARM Cortex-A9 ARM v7-a. A programmer s perspective Part1
ARM Cortex-A9 ARM v7-a A programmer s perspective Part1 ARM: Advanced RISC Machine First appeared in 1985 as Acorn RISC Machine from Acorn Computers in Manchester England Limited success outcompeted by
More informationS2C K7 Prodigy Logic Module Series
S2C K7 Prodigy Logic Module Series Low-Cost Fifth Generation Rapid FPGA-based Prototyping Hardware The S2C K7 Prodigy Logic Module is equipped with one Xilinx Kintex-7 XC7K410T or XC7K325T FPGA device
More informationSoC FPGAs. Your User-Customizable System on Chip Altera Corporation Public
SoC FPGAs Your User-Customizable System on Chip Embedded Developers Needs Low High Increase system performance Reduce system power Reduce board size Reduce system cost 2 Providing the Best of Both Worlds
More informationDesigning Multi-Channel, Real-Time Video Processors with Zynq All Programmable SoC Hyuk Kim Embedded Specialist Jun, 2014
Designing Multi-Channel, Real-Time Video Processors with Zynq All Programmable SoC Hyuk Kim Embedded Specialist Jun, 2014 Broadcast & Pro A/V Landscape Xilinx Smarter Vision in action across the entire
More informationSimplify System Complexity
Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint
More informationXMC-ZU1. XMC Module Xilinx Zynq UltraScale+ MPSoC. Overview. Key Features. Typical Applications
XMC-ZU1 XMC Module Xilinx Zynq UltraScale+ MPSoC Overview PanaTeQ s XMC-ZU1 is a XMC module based on the Zynq UltraScale+ MultiProcessor SoC device from Xilinx. The Zynq UltraScale+ integrates a Quad-core
More informationCopyright 2017 Xilinx.
All Programmable Automotive SoC Comparison XA Zynq UltraScale+ MPSoC ZU2/3EG, ZU4/5EV Devices XA Zynq -7000 SoC Z-7010/7020/7030 Devices Application Processor Real-Time Processor Quad-core ARM Cortex -A53
More informationTHE FIRST GENERATION OF EXTENSIBLE PROCESSING PLATFORMS: A NEW LEVEL OF PERFORMANCE, FLEXIBILITY AND SCALABILITY
PROCESSOR-CENTRIC EXTENSIBLE PLATFORMS FOR POWERFUL, SCALABLE, COST-EFFICIENT EMBEDDED DESIGNS THE FIRST GENERATION OF S: A NEW LEVEL OF PERFORMANCE, FLEXIBILITY AND SCALABILITY Embedded Systems Challenges
More informationZynq AP SoC Family
Programmable Logic (PL) Processing System (PS) Zynq -7000 AP SoC Family Cost-Optimized Devices Mid-Range Devices Device Name Z-7007S Z-7012S Z-7014S Z-7010 Z-7015 Z-7020 Z-7030 Z-7035 Z-7045 Z-7100 Part
More informationXMC-RFSOC-A. XMC Module Xilinx Zynq UltraScale+ RFSOC. Overview. Key Features. Typical Applications. Advanced Information Subject To Change
Advanced Information Subject To Change XMC-RFSOC-A XMC Module Xilinx Zynq UltraScale+ RFSOC Overview PanaTeQ s XMC-RFSOC-A is a XMC module based on the Zynq UltraScale+ RFSoC device from Xilinx. The Zynq
More informationSimplify System Complexity
1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller
More informationDefense-grade Zynq-7000Q SoC Data Sheet: Overview
Defense-grade Zynq-7000Q SoC Data Sheet: Overview Product Specification Defense-grade Zynq-7000Q SoC First Generation Architecture The Defense-grade Zynq -7000Q family is based on the Xilinx SoC architecture.
More informationZynq-7000 Extensible Processing Platform Overview
Advance Product Specification Zynq-7000 Extensible Processing Platform (EPP) First Generation Architecture The Zynq -7000 family is based on the Xilinx Extensible Processing Platform (EPP) architecture.
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationEttus Research Update
Ettus Research Update Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 Recent New Products 3 Third Generation Introduction Who am I? Core GNU Radio contributor since 2001 Designed
More informationDesigning a Multi-Processor based system with FPGAs
Designing a Multi-Processor based system with FPGAs BRINGING BRINGING YOU YOU THE THE NEXT NEXT LEVEL LEVEL IN IN EMBEDDED EMBEDDED DEVELOPMENT DEVELOPMENT Frank de Bont Trainer / Consultant Cereslaan
More information«Real Time Embedded systems» Cyclone V SOC - FPGA
«Real Time Embedded systems» Cyclone V SOC - FPGA Ref: http://www.altera.com rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 SOC + FPGA (ex. Cyclone V,
More informationZynq-7000 All Programmable SoC Data Sheet: Overview
Zynq-7000 All Programmable SoC Data Sheet: Overview Product Specification Zynq-7000 All Programmable SoC First Generation Architecture The Zynq -7000 family is based on the Xilinx All Programmable SoC
More informationAll Programmable: from Silicon to System
All Programmable: from Silicon to System Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates Variability Page 3 Industry Debates on Cost Page 4
More informationNetFPGA Hardware Architecture
NetFPGA Hardware Architecture Jeffrey Shafer Some slides adapted from Stanford NetFPGA tutorials NetFPGA http://netfpga.org 2 NetFPGA Components Virtex-II Pro 5 FPGA 53,136 logic cells 4,176 Kbit block
More informationKeyStone C66x Multicore SoC Overview. Dec, 2011
KeyStone C66x Multicore SoC Overview Dec, 011 Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution Challenge Before KeyStone Multicore performance degradation Lack of efficient
More informationIntroduction to Sitara AM437x Processors
Introduction to Sitara AM437x Processors AM437x: Highly integrated, scalable platform with enhanced industrial communications and security AM4376 AM4378 Software Key Features AM4372 AM4377 High-performance
More informationEffective System Design with ARM System IP
Effective System Design with ARM System IP Mentor Technical Forum 2009 Serge Poublan Product Marketing Manager ARM 1 Higher level of integration WiFi Platform OS Graphic 13 days standby Bluetooth MP3 Camera
More informationVPX3-ZU1. 3U OpenVPX Module Xilinx Zynq UltraScale+ MPSoC with FMC HPC Site. Overview. Key Features. Typical Applications
VPX3-ZU1 3U OpenVPX Module Xilinx Zynq UltraScale+ MPSoC with FMC HPC Site Overview PanaTeQ s VPX3-ZU1 is a 3U OpenVPX module based on the Zynq UltraScale+ MultiProcessor SoC device from Xilinx. The Zynq
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationVPX3-ZU1. 3U OpenVPX Module Xilinx Zynq UltraScale+ MPSoC with FMC HPC Site. Overview. Key Features. Typical Applications
VPX3-ZU1 3U OpenVPX Module Xilinx Zynq UltraScale+ MPSoC with FMC HPC Site Overview PanaTeQ s VPX3-ZU1 is a 3U OpenVPX module based on the Zynq UltraScale+ MultiProcessor SoC device from Xilinx. The Zynq
More informationCoreTile Express for Cortex-A5
CoreTile Express for Cortex-A5 For the Versatile Express Family The Versatile Express family development boards provide an excellent environment for prototyping the next generation of system-on-chip designs.
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationRenesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development
Renesas Synergy MCUs Build a Foundation for Groundbreaking Integrated Embedded Platform Development New Family of Microcontrollers Combine Scalability and Power Efficiency with Extensive Peripheral Capabilities
More informationA Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company
More informationExtending Fixed Subsystems at the TLM Level: Experiences from the FPGA World
I N V E N T I V E Extending Fixed Subsystems at the TLM Level: Experiences from the FPGA World Frank Schirrmeister, Steve Brown, Larry Melling (Cadence) Dave Beal (Xilinx) Agenda Virtual Platforms Xilinx
More informationDesigning with NXP i.mx8m SoC
Designing with NXP i.mx8m SoC Course Description Designing with NXP i.mx8m SoC is a 3 days deep dive training to the latest NXP application processor family. The first part of the course starts by overviewing
More informationThe Challenges of System Design. Raising Performance and Reducing Power Consumption
The Challenges of System Design Raising Performance and Reducing Power Consumption 1 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2 Product Challenge - Software
More informationNew System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics
New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics Introduction Recently, the laser printer market has started to move away from custom OEM-designed 1 formatter
More informationProgrammable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures
Programmable Logic Design Grzegorz Budzyń Lecture 15: Advanced hardware in FPGA structures Plan Introduction PowerPC block RocketIO Introduction Introduction The larger the logical chip, the more additional
More informationNetFPGA Update at GEC4
NetFPGA Update at GEC4 http://netfpga.org/ NSF GENI Engineering Conference 4 (GEC4) March 31, 2009 John W. Lockwood http://stanford.edu/~jwlockwd/ jwlockwd@stanford.edu NSF GEC4 1 March 2009 What is the
More informationExperience with the NetFPGA Program
Experience with the NetFPGA Program John W. Lockwood Algo-Logic Systems Algo-Logic.com With input from the Stanford University NetFPGA Group & Xilinx XUP Program Sunday, February 21, 2010 FPGA-2010 Pre-Conference
More informationP51: High Performance Networking
P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed
More informationA Generation Ahead Designing Advanced Embedded Systems with Xilinx Zynq All Programmable SoCs
A Generation Ahead Designing Advanced Embedded Systems with Xilinx Zynq All Programmable SoCs 김혁 Embedded Processor Specialist, Xilinx 1 st - Oct, 2013 Market Challenges for Embedded Solutions Integrating
More informationD Demonstration of disturbance recording functions for PQ monitoring
D6.3.7. Demonstration of disturbance recording functions for PQ monitoring Final Report March, 2013 M.Sc. Bashir Ahmed Siddiqui Dr. Pertti Pakonen 1. Introduction The OMAP-L138 C6-Integra DSP+ARM processor
More informationAT-501 Cortex-A5 System On Module Product Brief
AT-501 Cortex-A5 System On Module Product Brief 1. Scope The following document provides a brief description of the AT-501 System on Module (SOM) its features and ordering options. For more details please
More informationChapter 5. Introduction ARM Cortex series
Chapter 5 Introduction ARM Cortex series 5.1 ARM Cortex series variants 5.2 ARM Cortex A series 5.3 ARM Cortex R series 5.4 ARM Cortex M series 5.5 Comparison of Cortex M series with 8/16 bit MCUs 51 5.1
More informationMotivation to Teach Network Hardware
NetFPGA: An Open Platform for Gigabit-rate Network Switching and Routing John W. Lockwood, Nick McKeown Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo JWLockwd@stanford.edu
More informationCONTACT: ,
S.N0 Project Title Year of publication of IEEE base paper 1 Design of a high security Sha-3 keccak algorithm 2012 2 Error correcting unordered codes for asynchronous communication 2012 3 Low power multipliers
More informationSTM32F7 series ARM Cortex -M7 powered Releasing your creativity
STM32F7 series ARM Cortex -M7 powered Releasing your creativity STM32 high performance Very high performance 32-bit MCU with DSP and FPU The STM32F7 with its ARM Cortex -M7 core is the smartest MCU and
More information5. ARM 기반모니터프로그램사용. Embedded Processors. DE1-SoC 보드 (IntelFPGA) Application Processors. Development of the ARM Architecture.
Embedded Processors 5. ARM 기반모니터프로그램사용 DE1-SoC 보드 (IntelFPGA) 2 Application Processors Development of the ARM Architecture v4 v5 v6 v7 Halfword and signed halfword / byte support System mode Thumb instruction
More informationImplementing MPI_Barrier with the NetFPGA
Implementing MPI_Barrier with the NetFPGA O. Arap 1, G. Brown 2, B. Himebaugh 2, and M. Swany 1 1 enter for Research in Extreme Scale Technologies, Indiana University, Bloomington, IN, USA 2 School of
More informationARM Cortex core microcontrollers 3. Cortex-M0, M4, M7
ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7 Scherer Balázs Budapest University of Technology and Economics Department of Measurement and Information Systems BME-MIT 2018 Trends of 32-bit microcontrollers
More informationSTM32 Journal. In this Issue:
Volume 1, Issue 2 In this Issue: Bringing 32-bit Performance to 8- and 16-bit Applications Developing High-Quality Audio for Consumer Electronics Applications Bringing Floating-Point Performance and Precision
More informationFPGA Adaptive Software Debug and Performance Analysis
white paper Intel Adaptive Software Debug and Performance Analysis Authors Javier Orensanz Director of Product Management, System Design Division ARM Stefano Zammattio Product Manager Intel Corporation
More informationHi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan
Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU
More informationSoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator
SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator FPGA Kongress München 2017 Martin Heimlicher Enclustra GmbH Agenda 2 What is Visual System Integrator? Introduction Platform
More informationNew STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU
New STM32 F7 Series World s 1 st to market, ARM Cortex -M7 based 32-bit MCU 7 Keys of STM32 F7 series 2 1 2 3 4 5 6 7 First. ST is first to sample a fully functional Cortex-M7 based 32-bit MCU : STM32
More informationProduct Technical Brief S3C2416 May 2008
Product Technical Brief S3C2416 May 2008 Overview SAMSUNG's S3C2416 is a 32/16-bit RISC cost-effective, low power, high performance micro-processor solution for general applications including the GPS Navigation
More informationFPGA Entering the Era of the All Programmable SoC
FPGA Entering the Era of the All Programmable SoC Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Cost Page 3 Design Cost Estimated Chip
More informationKeyStone II. CorePac Overview
KeyStone II ARM Cortex A15 CorePac Overview ARM A15 CorePac in KeyStone II Standard ARM Cortex A15 MPCore processor Cortex A15 MPCore version r2p2 Quad core, dual core, and single core variants 4096kB
More informationQsys and IP Core Integration
Qsys and IP Core Integration Stephen A. Edwards (after David Lariviere) Columbia University Spring 2016 IP Cores Altera s IP Core Integration Tools Connecting IP Cores IP Cores Cyclone V SoC: A Mix of
More informationFPQ6 - MPC8313E implementation
Formation MPC8313E implementation: This course covers PowerQUICC II Pro MPC8313 - Processeurs PowerPC: NXP Power CPUs FPQ6 - MPC8313E implementation This course covers PowerQUICC II Pro MPC8313 Objectives
More informationMidterm Exam. Solutions
Midterm Exam Solutions Problem 1 List at least 3 advantages of implementing selected portions of a complex design in software Software vs. Hardware Trade-offs Improve Performance Improve Energy Efficiency
More informationIntelop. *As new IP blocks become available, please contact the factory for the latest updated info.
A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment
More informationDesigning with STM32F2x & STM32F4
Designing with STM32F2x & STM32F4 Course Description Designing with STM32F2x & STM32F4 is a 3 days ST official course. The course provides all necessary theoretical and practical know-how for start developing
More informationGigaX API for Zynq SoC
BUM002 v1.0 USER MANUAL A software API for Zynq PS that Enables High-speed GigaE-PL Data Transfer & Frames Management BERTEN DSP S.L. www.bertendsp.com gigax@bertendsp.com +34 942 18 10 11 Table of Contents
More informationFCQ2 - P2020 QorIQ implementation
Formation P2020 QorIQ implementation: This course covers NXP QorIQ P2010 and P2020 - Processeurs PowerPC: NXP Power CPUs FCQ2 - P2020 QorIQ implementation This course covers NXP QorIQ P2010 and P2020 Objectives
More informationMYD-Y6ULX Development Board
MYD-Y6ULX Development Board MYC-Y6ULX CPU Module as Controller Board 528Hz NXP i.mx 6UL/6ULL ARM Cortex-A7 Processors 1.0mm pitch 140-pin Stamp Hole Expansion Interface for Board-to-Board Connections 256MB
More informationRiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner
RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and
More informationSDSoC: Session 1
SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the
More informationInterconnects, Memory, GPIO
Interconnects, Memory, GPIO Dr. Francesco Conti f.conti@unibo.it Slide contributions adapted from STMicroelectronics and from Dr. Michele Magno, others Processor vs. MCU Pipeline Harvard architecture Separate
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationUsing partial reconfiguration in an embedded message-passing system. FPGA UofT March 2011
Using partial reconfiguration in an embedded message-passing system FPGA Seminar @ UofT March 2011 Manuel Saldaña, Arun Patel ArchES Computing Hao Jun Liu, Paul Chow University of Toronto ArchES-MPI framework
More informationINT 1011 TCP Offload Engine (Full Offload)
INT 1011 TCP Offload Engine (Full Offload) Product brief, features and benefits summary Provides lowest Latency and highest bandwidth. Highly customizable hardware IP block. Easily portable to ASIC flow,
More informationYet Another Implementation of CoRAM Memory
Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS
More informationProduct Technical Brief S3C2413 Rev 2.2, Apr. 2006
Product Technical Brief Rev 2.2, Apr. 2006 Overview SAMSUNG's is a Derivative product of S3C2410A. is designed to provide hand-held devices and general applications with cost-effective, low-power, and
More informationCopyright 2014 Xilinx
IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able
More informationLesson 6 Intel Galileo and Edison Prototype Development Platforms. Chapter-8 L06: "Internet of Things ", Raj Kamal, Publs.: McGraw-Hill Education
Lesson 6 Intel Galileo and Edison Prototype Development Platforms 1 Intel Galileo Gen 2 Boards Based on the Intel Pentium architecture Includes features of single threaded, single core and 400 MHz constant
More informationFirst hour Zynq architecture
Introduction to the Zynq SOC INF3430/INF4431 Tønnes Nygaard tonnesfn@ifi.uio.no First hour Zynq architecture Computational platforms Design flow System overview PS APU IOP MIO EMIO Datapath PS/PL interconnect
More informationProtecting Embedded Systems from Zero-Day Attacks
Protecting Embedded Systems from Zero-Day Attacks Professor Stephen Taylor Thayer School of Engineering at Dartmouth stnh.email@icloud.com (603) 727-8945 MicroArx.com Apiotics.com 1 Research Support Current
More informationVivado Design Suite User Guide: Embedded Processor Hardware Design
Vivado Design Suite User Guide: Embedded Processor Hardware Design Notice of Disclaimer The information disclosed to you hereunder (the "Materials") is provided solely for the selection and use of Xilinx
More informationXMC-SDR-A. XMC Zynq MPSoC + Dual ADRV9009 module. Preliminary Information Subject To Change. Overview. Key Features. Typical Applications
Preliminary Information Subject To Change XMC-SDR-A XMC Zynq MPSoC + Dual ADRV9009 module Overview PanaTeQ s XMC-SDR-A is a XMC module based on the Zynq UltraScale+ MultiProcessor SoC device from Xilinx
More informationRISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi
Power Matters. TM RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi chowdhary.musunuri@microsemi.com RIC217 1 Agenda A brief introduction
More informationFreescale i.mx6 Architecture
Freescale i.mx6 Architecture Course Description Freescale i.mx6 architecture is a 3 days Freescale official course. The course goes into great depth and provides all necessary know-how to develop software
More informationProduct Series SoC Solutions Product Series 2016
Product Series Why SPI? or We will discuss why Serial Flash chips are used in many products. What are the advantages and some of the disadvantages. We will explore how SoC Solutions SPI and QSPI IP Cores
More informationProduct Technical Brief S3C2412 Rev 2.2, Apr. 2006
Product Technical Brief S3C2412 Rev 2.2, Apr. 2006 Overview SAMSUNG's S3C2412 is a Derivative product of S3C2410A. S3C2412 is designed to provide hand-held devices and general applications with cost-effective,
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationMYD-JA5D2X Development Board
MYD-JA5D2X Development Board MYC-JA5D2X CPU Module as Controller Board 500MHz Atmel SAMA5D26/27 ARM Cortex-A5 Processor 256MB DDR3 SDRAM, 256MB Nand Flash, 4MB Data FLASH, 64KB EEPROM Serial ports, USB,
More informationMapping applications into MPSoC
Mapping applications into MPSoC concurrency & communication Jos van Eijndhoven jos@vectorfabrics.com March 12, 2011 MPSoC mapping: exploiting concurrency 2 March 12, 2012 Computation on general purpose
More informationHow to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs
Delivering a Generation Ahead How to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs Agenda Introduction to Mobile Network Introduction to Xilinx Solution
More informationHello, and welcome to this presentation of the STM32L4 System Configuration Controller.
Hello, and welcome to this presentation of the STM32L4 System Configuration Controller. 1 Please note that this presentation has been written for STM32L47x/48x devices. The key differences with other devices
More informationOverview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router
Overview Implementing Gigabit Routers with NetFPGA Prof. Sasu Tarkoma The NetFPGA is a low-cost platform for teaching networking hardware and router design, and a tool for networking researchers. The NetFPGA
More informationThe ARM Cortex-A9 Processors
The ARM Cortex-A9 Processors This whitepaper describes the details of the latest high performance processor design within the common ARM Cortex applications profile ARM Cortex-A9 MPCore processor: A multicore
More informationIntroduction to Embedded System Design using Zynq
Introduction to Embedded System Design using Zynq Zynq Vivado 2015.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able
More informationIBM "Broadway" 512Mb GDDR3 Qimonda
ffl Wii architecture ffl Wii components ffl Cracking Open" the Wii 20 1 CMPE112 Spring 2008 A. Di Blas 112 Spring 2008 CMPE Wii Nintendo ffl Architecture very similar to that of the ffl Fully backwards
More information