Martin Dubois, ing. Contents

Similar documents
HP WORKSTATIONS GRAPHICS CARD OPTIONS

Video capture using GigE Vision with MIL. What is GigE Vision

Maya , MotionBuilder & Mudbox 2018.x August 2018

ATS-GPU Real Time Signal Processing Software

Videocard Benchmarks Over 800,000 Video Cards Benchmarked

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi

HP Z Workstations graphics card options

Optimizing Performance: Intel Network Adapters User Guide

Accelerating Data Centers Using NVMe and CUDA

Videocard Benchmarks Over 800,000 Video Cards Benchmarked

Laptop Requirement: Technical Specifications and Guidelines. Frequently Asked Questions

TESLA DRIVER VERSION (LINUX)/411.82(WINDOWS)

7 DAYS AND 8 NIGHTS WITH THE CARMA DEV KIT

TESLA DRIVER VERSION (LINUX)/411.98(WINDOWS)

GpuWrapper: A Portable API for Heterogeneous Programming at CGG

Intel Xeon Phi Coprocessors

GPGPU, 4th Meeting Mordechai Butrashvily, CEO GASS Company for Advanced Supercomputing Solutions

NVIDIA s Compute Unified Device Architecture (CUDA)

NVIDIA s Compute Unified Device Architecture (CUDA)

Pedraforca: a First ARM + GPU Cluster for HPC

NAMD GPU Performance Benchmark. March 2011

NLVMUG 16 maart Display protocols in Horizon

Building NVLink for Developers

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

physics on screen Supported platforms and minimum system requirements

A Real Time Controller for E-ELT

A Real Time Controller for E-ELT

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging

EMBEDDED MACHINE VISION

Building the Most Efficient Machine Learning System

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

CME 213 S PRING Eric Darve

p2pmem: Enabling PCIe Peer-2-Peer in Linux Stephen Bates, PhD Raithlin Consulting

Addressing Heterogeneity in Manycore Applications

Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ Electronics

FPGA Augmented ASICs: The Time Has Come

A Next Generation Home Access Point and Router

The rcuda middleware and applications

[PST, GMT -8] Network Testing and Emulation Solutions

GPUs have enormous power that is enormously difficult to use

SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

The Rise of Open Programming Frameworks. JC BARATAULT IWOCL May 2015

MemcachedGPU: Scaling-up Scale-out Key-value Stores

Link-Layer Layer Broadcast Protocol for SpaceWire

Introduction to GPU hardware and to CUDA

Transport Layer TCP & UDP Week 7. Module : Computer Networks Lecturers : Lucy White Office : 324

AMD CORPORATE TEMPLATE AMD Radeon Open Compute Platform Felix Kuehling

Porting CPU-based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools

GPU Architecture. Alan Gray EPCC The University of Edinburgh

CUDA. Matthew Joyner, Jeremy Williams

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Realtime Signal Processing on Embedded GPUs

NVIDIA QUADRO QUICK START GUIDE

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

Distributed Systems. 29. Firewalls. Paul Krzyzanowski. Rutgers University. Fall 2015

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Tesla GPU Computing A Revolution in High Performance Computing

Building Real-Time Professional Visualization Solutions on GPUs. Kristof Denolf Samuel Maroy Ronny Dewaele

46PaQ. Dimitris Miras, Saleem Bhatti, Peter Kirstein Networks Research Group Computer Science UCL. 46PaQ AHM 2005 UKLIGHT Workshop, 19 Sep

Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing

Heterogenous Computing

Performance Evaluation of Tcpdump

To hear the audio, please be sure to dial in: ID#

Building the Most Efficient Machine Learning System

M100 GigE Series. Multi-Camera Vision Controller. Easy cabling with PoE. Multiple inspections available thanks to 6 GigE Vision ports and 4 USB3 ports

Di Zhao Ohio State University MVAPICH User Group (MUG) Meeting, August , Columbus Ohio

Specifications are based on current hardware, firmware and software revisions, and are subject to change without notice.

Point-to-Point Network Switching. Computer Networks Term B10

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance

An FPGA-Based Optical IOH Architecture for Embedded System

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs

CT LANforge ICE 48-port WAN Emulator

PEARL. Programmable Virtual Router Platform Enabling Future Internet Innovation

NVIDIA CAPTURE SDK 7.1 (WINDOWS)

Recommended OS (tested)

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

High performance, multiple expansion Born for machine vision applications. ABOX-E7 Series

TCP Wirespeed Verify TCP Throughput up to 10G Using the T-BERD /MTS-6000A Multi- Services Application Module (MSAM)

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

Expose NVIDIA s performance counters to the userspace for NV50/Tesla

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

5051 & 5052 PCIe Card Overview

AES Cryptosystem Acceleration Using Graphics Processing Units. Ethan Willoner Supervisors: Dr. Ramon Lawrence, Scott Fazackerley

Parallel Programming Libraries and implementations

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

PNY Technologies, Inc. 299 Webro Rd. Parsippany, NJ Tel: Fax:

QuickSpecs. NVIDIA Quadro K4200 4GB Graphics INTRODUCTION. NVIDIA Quadro K4200 4GB Graphics. Technical Specifications

LegUp: Accelerating Memcached on Cloud FPGAs

Fast Hardware For AI

Transcription:

Martin Dubois, ing Contents Without OpenNet vs With OpenNet Technical information Possible applications Artificial Intelligence Deep Packet Inspection Image and Video processing Network equipment development and validation 1

Without OpenNet 1. The network adapter receives the data packets 2. The network adapter writes the data packets into the main memory through the PCIe link 3. The network stack process the data packets (possibly copying them) 4. The graphic card read the data from the main memory into its own memory 5. The GPU process the data With OpenNet 1. The network adapter receive the data packets 2. The network adapter write the data packet into the graphic card s memory through the PCIe link 3. The GPU process the data Patent pending 2

Receive and Send Once the GPU processed a data packet, and possibly modify it, it can Forward it to the operating system network stack Forward it using one of network adapter connected to the OpenNet system (including the one that received the data packet) Drop it! When the packet is forwarded using a network adapter, the network adapter read it directly from the graphic card s memory! How it work? 1. The application uses the OpenNet library to configure the OpenNet system. 2. The application also interact directly with the OpenCL TM or CUDA environment. 3. The OpenNet library controls the execution of kernels processing data packets 4. The OpenCL TM or CUDA environment controls the GPU. 5. The OpenNet library configures and controls the special device drivers controlling the network adapters. 3

How it work? 6. The device driver controls the network adapters and instructs them to write received data directly into the graphic card s memory. 7. After the network adapter received the data, the driver writes the packet size into the graphic card s memory. When the buffer is full, the kernel processing it is launch. 8. After the GPU processed the buffer, the driver verify if the packets must be transmitted. If so, the sending network adapter is instructed to do it. Little bit more details 1. The network adapter A receives data packets and write them into a buffer. 2. When the buffer is full, OpenNet launch the OpenCL TM or CUDA kernel. 3. The kernel process all the data packets at the same time, possibly modifies them and indicate if the packets must be retransmitted. The kernel can uses other data buffers or interacts with other kernels. 4. Network adapter B (or any other network adapter in the OpenNet system) retransmit the data packets as requested. 4

Technical information Operating systems Windows 10 64 bits Linux Soon! 64 bits Ubuntu 18.04.1 5

GPU programming language Windows OpenCL TM Linux Soon! CUDA Graphic cards Windows All cards supporting the AMD DirectGMA TM technology Radeon TM Pro WX5100, WX7100, WX8200 and WX9100 Linux Soon! All cards supporting the NVIDIA GPUDirect technology Quadro TM Tesla TM 6

Network adapters Windows Intel 1 Gb/s 82576 10 Gb/s 82599 Soon! Linux Soon! Intel 1 Gb/s 82576 10 Gb/s - 82599 Possible applications 7

Artificial Intelligence Feed training data or data to process Share large amount of data between graphic cards installed in different computers Share large amount of data between graphic cards When buffer is ready to be filled, the GPU in computer A runs a kernel that simply break shared data in packets and indicate which packets must be sent. When a buffer is filled with received data, the GPU in computer B runs a kernel reconstructing the shared data. The communication protocol is completely implemented in OpenCL TM or CUDA. It can operate on top of: Ethernet for efficiency IP to be routable UDP to be router friendly 8

Deep Packet Inspection Deep Packet Inspection Filtering and selecting Filtering and selecting The OpenCL TM or CUDA code inspects each data packet and decide which equipment or computer will do the final inspection or processing. The OpenCL TM or CUDA code can access the full data packet. The next computer in the chain can also use OpenNet and GPU. 9

Image and Video processing GigE Vision receiver Video streaming GigE Vision Receiver 10

Video streaming Network equipment development and validation Test of OpenCL TM code aimed to FPGA Traffic verification Traffic generation 11

Other possible applications Service implementation TCP or other protocols offload Service implementation DNS 12

The end! Many other applications are possible! Any question? Thanks! http://www.kms-quebec.com/en/opennet.htm 13