Embedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017

Similar documents
A176 C clone. GPGPU Fanless Small FF RediBuilt Supercomputer. Aitech

A176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O

THE LEADER IN VISUAL COMPUTING

. SMARC 2.0 Compliant

Elaborazione dati real-time su architetture embedded many-core e FPGA

A191 RediBuilt GPGPU Based Rugged HPEC

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

User Manual. Nvidia Jetson Series Carrier board Aetina ACE-N622

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

Compact form factor. High speed MXM edge connector. Processor. Max Cores 4. Max Thread 4. Memory. Graphics. Video Interfaces.

Hugo Cunha. Senior Firmware Developer Globaltronics

SOM PRODUCTS BRIEF. S y s t e m o n M o d u l e. Engicam. SOMProducts ver

IOT-GATE-iMX7 Datasheet

INTELLIGENCE AT THE EDGE -HIGH PERFORMANCE EMBEDDED COMPUTING TRENDS (HPEC)

Single Board Computer

G3399 Single Board Computer Introduction

. Micro SD Card Socket. SMARC 2.0 Compliant

M100 GigE Series. Multi-Camera Vision Controller. Easy cabling with PoE. Multiple inspections available thanks to 6 GigE Vision ports and 4 USB3 ports

ARM and x86 on Qseven & COM Express Mini. Zeljko Loncaric, Marketing Engineer, congatec AG

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Embedded GPGPU and Deep Learning for Industrial Market

SBC3100 (Cortex-A72) Single Board Computer

SOM IB8000 Quad Core SOM (System-On-Module) Rev 1.3

M100 GigE Series. Multi-Camera Vision Controller. Easy cabling with PoE. Multiple inspections available thanks to 6 GigE Vision ports and 4 USB3 ports

Embedded Vision Solutions.

EVS Series. Intel Xeon /Core i7/i5/i3 Fanless GPU Computing System with Intel C236 & NVIDIA GeForce GT K Isolated DIO.

fit-pc Intense 2 Overview

COM EXPRESS STANDARD ADVANTAGES

Reducing Time-to-Market with i.mx6-based Qseven Modules

NXP-Freescale i.mx6 MicroSoM i4pro. Quad Core SoM (System-On-Module) Rev 1.3

A172. Rugged Compact PC. Aitech. Rugged Reduced SWaP PC Flexible Configuration Options

Multimedia SoC System Solutions

M2-SM6-xx - i.mx 6 based SMARC Modules

Kontron s ARM-based COM solutions and software services

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Your Strategic Partner for Renesas RZ/G1x Products & Solutions

ROC-RK3328-CC Product Specifications

IOT-GATE-RPI. Reference Guide

Basic: 125x95 mm Compact: 95x95 mm. Processor. Max Cores 4. Video Interfaces. Video Resolution. Mass Storage 2 x S-ATA Gen3 Channels.

Introduction to the Tegra SoC Family and the ARM Architecture. Kristoffer Robin Stokke, PhD FLIR UAS

NXP-Freescale i.mx6. Dual Core SOM (System-On-Module) Rev 1.5

Benchmarking Real-World In-Vehicle Applications

C th Gen. Intel Core i7 6U VME SBC. Aitech. Rugged 6U VME Single-Slot SBC 5 th Generation Intel Core i7 CPU

A172. Rugged Compact PC. Wide Input Voltage Range MIL-STD-704 and MIL-STD-1275 Compliant Modular Design Operating System Support

SAMSUNG ELECTRONICS RESERVES THE RIGHT TO CHANGE PRODUCTS, INFORMATION AND SPECIFICATIONS WITHOUT NOTICE. Products and specifications discussed

SOM i1 Single Core SOM (System-On-Module) Rev 1.5

AT-501 Cortex-A5 System On Module Product Brief

i.mx 8M MINI System-On-Module (SOM) Hardware Architecture

Brief of A80 OptimusBoard

XMC-ZU1. XMC Module Xilinx Zynq UltraScale+ MPSoC. Overview. Key Features. Typical Applications

Raspberry Pi 3 Model B

EyeCheck Smart Cameras

INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.

General. Display. Ethernet Interface. MIO-5271 Startup Manual 1

MYD-Y6ULX Development Board

TEK Series. Unique Expansion Possibilities. Power and Networking Expansion Module. Automation I/O Expansion Module

NXP-Freescale i.mx6 MicroSoM i2. Dual Core SoM (System-On-Module) Rev 1.3

Make technology more simple, Make life more intelligent. All In One Board Specifications. V Original version

fitlet-rm specifications

Fit-PC3 Product Specification

Quick Start Guide Multisensory Enablement Kit i.mx 8QuadXPlus MEK CPU Board. Based on i.mx 8QuadXPlus Applications Processor

Excellence in Electronics

Break out into the AI world

TEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich

STM32MP1 Microprocessor Continuing the STM32 Success Story. Press Presentation

NVIDIA AI BRAIN OF SELF DRIVING AND HD MAPPING. September 13, 2016

MYC-C437X CPU Module

Jumping Hurdles. High Expectations in a Low Power Environment. Christopher Fadeley Software Engineering Manager EIZO Rugged Solutions

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

The Mobile Internet: The Potential of Handhelds to Bring Internet to the Masses. April 2008

Quick Start Guide. SABRE Platform for Smart Devices Based on the i.mx 6 Series

Freescale i.mx6 Architecture

NXP-Freescale i.mx6 MicroSoM i2. Dual Core SoM (System-On-Module)

Features. MIC-7700Q: 3 x independent displays (third display output via optional cable) MIC-7700H: 2 x independent displays Controller

Rudi Embedded System with NVIDIA Jetson TX2 or TX1

MYD-C437X-PRU Development Board

MYC-C7Z010/20 CPU Module

CM10 Rugged COM Express with TI Sitara ARM Cortex-A15

F28HS Hardware-Software Interface: Systems Programming

arm MULTICORE PLATFORMS FOR ADVANCED APPLICATIONS Product Longevity

Kevin Meehan Stephen Moskal Computer Architecture Winter 2012 Dr. Shaaban

Kontron Technology ARM based Embedded

ARM in competition with x86 on COM solutions. ICC Media, July 2014 Gerhard Szczuka Portfoliomanager COM, SBC, Motherboards, PC104

VT988. Key Features. Benefits. High speed 16 ADC at 3 GSPS with Synchronous Capture VT ADC for synchronous capture

RISC Computing Platforms

IVH-9000 Series. 4K Isolated DIO

MIL-STD-1553 (T4240/T4160/T4080) 12/8/4 2 PMC/XMC 2.0 WWDT, ETR, RTC, 4 GB DDR3

P I X E V I A : A I B A S E D, R E A L - T I M E C O M P U T E R V I S I O N S Y S T E M F O R D R O N E S

The Dell Precision T3620 tower as a Smart Client leveraging GPU hardware acceleration

Multi-Function VPX COTS Boards

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers

Published on EMAC Inc. ( Source URL: AIMB-215 Mini-ITX

INDEX. Legend. Qseven ETX. Single Board Computer. Modular HMI & Boxed Solutions SMARC. COM Express

Heavy Duty Fanless Touch Panel PC GOT3217W-881-PCT.

XMC-SDR-A. XMC Zynq MPSoC + Dual ADRV9009 module. Preliminary Information Subject To Change. Overview. Key Features. Typical Applications

Introduction to Sitara AM437x Processors

Ten (or so) Small Computers

AI Solution

DINO. DPW-100 Power Blade. DEX-303 Surveillance. 50% Space save! PCIe x8 PCIe x4 PCIe x8

WaveView. System Requirement V6. Reference: WST Page 1. WaveView System Requirements V6 WST

Transcription:

Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM - Systems GTC Israel 2017

Agenda Current GPGPU systems NVIDIA Jetson TX1 and TX2 evaluation Conclusions New Products 2

GPGPU Product Line 3

Current GPGPU Products 4

A191 Block Diagram J3 J1 J2 18 36V Input Power Gigabit Ethernet 2 2 USB Serial 2 RGBHV DVI/HDMI SD-SDI Composite Video 2/0 4/7 DVI/HDMI RGBHV Power Supply 2.5" SSD (optional) Frame Grabber Mezzanine On-Board SSD C873 4 th Gen. Core i7 SBC C530 GPGPU Board SATA PCIe x8 5

We need SwaP System 6

Jetson TX1 SFF - 50x87mm SoM with Linux support Good for SWaP systems Supercomputing performance Quad-core ARM Cortex -A57 CPUs GPU - NVIDIA Maxwell, 1 TFLOP/s with 256 CUDA Cores 7

400-pin board-to-board connector pin-out will be backward-compatible with future versions draws as little as 1 watt of power or lower while idle 8-10 watts under typical CUDA load up to 15 watts TDP when the module is fully utilized automatically scaling of CPU,GPU, memory 1 TFLOPS (GTX 770M is 1.36 TFLOPS) HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras 8

Jetson TX1 Evaluation - Non-Graphical Benchmark The smaller is the number the faster is calculation on GPU using CUDA. TX1 Max is Jetson TX1 running with maximum GPU frequency C873 & C530 which is about 120 Watts, only x 1.8 faster than Jetson TX1 which is only 15 Watt 9

Jetson TX1 Evaluation - Conclusions Jetson TX1 get a real boost in rendering and CUDA calculation power CUDA calculation performance TX1 vs TK1 x 2 to x 4 for TX1 TX1 vs C873&C530 (770M) only x 1.8 for C873&C530 (770M) If Linux is not an obstacle for our customers, Jetson TX1 based product will be success 10

Comparison table: TX2 vs TX1 Jetson TX2 Jetson TX1 GPU NVIDIA Pascal, 256 CUDA cores NVIDIA Maxwell, 256 CUDA cores CPU HMP Dual Denver 2/2 MB L2 + Quad ARM A57/2 MB L2 Quad ARM A57/2 MB L2 Memory 8 GB 128 bit LPDDR4 58.3 GB/s 4 GB 64 bit LPDDR4 25.6 GB/s Display 2x DSI, 2x DP 1.2 / HDMI 2.0 / edp 1.4 2x DSI, 1x edp 1.4 / DP 1.2 / HDMI PCIE Gen 2 1x4 + 1x1 OR 2x1 + 1x2 Gen 2 1x4 + 1x1 Data Storage 32 GB emmc, SDIO, SATA 16 GB emmc, SDIO, SATA Other CAN, UART, SPI, I2C, I2S, GPIOs UART, SPI, I2C, I2S, GPIOs USB USB 3.0 + USB 2.0 Connectivity Mechanical 1 Gigabit Ethernet, 802.11ac WLAN, Bluetooth 50 mm x 87 mm (400-Pin Compatible Board-to-Board Connector) 11

Dual Operating Modes 12

non-graphical benchmark (CUDA algorithms) - lower is better [ms] TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1 n-body number 4096 4096 4096 Time for 10 iterations [msec] 22.533 68.4 16.421-67% 27% n-body number 8192 8192 8192 Time for 10 iterations [msec] 81.491 272.97 65.24-70% 20% n-body number 16384 16384 16384 Time for 10 iterations [msec] 206.799 527.47 154-61% 25.5 % TX2 has a better performance when using MAXN power mode 13

CPU benchmark - lower is better [ms] - nbody algorithm running on CPU TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1 n-body number 4096 4096 4096 Time for 10 iterations [msec] 30492.172 57837.430 7169.735-47% 76.5% n-body number 8192 8192 8192 Time for 10 iterations [msec] 121315.578 232723.719 11340.421-48% 90% TX2 has a better CPU performance when using MAXN power mode 14

Conclusions TX2 getting a boost in GPU CUDA calculation power using MAXN power mode MAXN power mode - increase of about 24% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 66% in performance (max power consumption 7.5 W) TX2 getting a boost in CPU calculation power using MAXN power mode MAXN power mode - increase of about 83% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 47% in performance (max power consumption 7.5 W) The SW release is "Developer Preview Release", so I hope it should be a lot of improvement and optimizations in near future As we see from above, the half power coming with half of performance. The full power coming with the boost for GPU (CUDA 24%) and CPU (83%). 15

16

Special Features 17

Technical Features A176 Cyclone GPGPU Fanless Small FF RediBuilt Supercomputer 18

A176 Cyclone Based on NVIDIA Jetson TX1/TX2 Pinout will be backward-compatible with future versions Draws as little as 1 Watt of power or lower while idle Automatically scaling of CPU,GPU, memory 1 TFLOPS Hardware encoder (H264/H265) and decoder 8-10 Watts under typical CUDA load Up to 17 Watts when the CPU/GPU are fully utilized Ultra Small Form Factor 129 mm [5.1"] square, 840g [1.85 lbs.] 19

A176 Block Diagram NVIDIA Jetson TX1 System on Module NVIDIA GPU Quad-Core ARM CPU 4GB RAM LPDDR4 16GB Flash emmc 5.1 I 2 C PCIe PCIe ETR Optional Optional Expansion Expansion Module Module Mini SATA SSD Isolated Power Supply Gigabit Ethernet Line Filter 2 USB 2.0 2 UART 2 Discrete I/O 8 DVI/HDMI Output Optional I/O - 8 x Composite Inputs - 1 x SDI Input Front Panel Connectors 20

A176 Highlights SWaP Optimized Rugged HPEC Ultra Small Form Factor 129 mm [5.1"] square, < 1 kg [2.2 lbs.] NVIDIA Jetson TX1 System on Module NVIDIA Maxwell Architecture GPU, with 256 CUDA cores ARM Cortex A57 Quad-Core CPU 1 TFLOPS H.264/H.265 HW Encoder Best Available Performance per Watt 60 GFLOPS/W SATA SSD with Quick Erase & Secure Erase 4 GB LPDDR4 Video Capture SDI (SD/HD) w/dedicated H.264 encoder Composite (RS-170A [NTSC]/PAL), 8 channels available simultaneously I/O Gigabit Ethernet UART Serial USB 2.0 Discretes DVI/HDMI Output Composite Input SDI Input CUDA, OpenGL, OpenGL ES, EGL Low Power Consumption Development Platforms Available Additional expansions: 1. Dual Channel 1553 2. ARINC 429 3. Camera Link Frame Grabber 21

Technical Features C535 Typhoon GPGPU 3U VPX Supercomputer Board 22

C535 Typhoon Highlights Rugged 3U VPX HPEC Board SBC with on-board GPGPU Rugged 3U VPX HPEC Board SBC with on-board GPGPU NVIDIA Jetson TX1 System on Module NVIDIA Maxwell Architecture GPU, with 256 CUDA cores ARM Cortex A57 Quad-Core CPU 1 TFLOPS H.264/H.265 HW Encoder Best Available Performance per Watt 60 GFLOPS/W SATA SSD with Quick Erase & Secure Erase 4 GB LPDDR4 Video Capture SDI (SD/HD) w/dedicated H.264 encoder Composite (RS-170A [NTSC]/PAL), 8 channels available simultaneously I/O Gigabit Ethernet UART Serial USB 2.0 Discretes DVI/HDMI Output Composite Input SDI Input CUDA, OpenGL, OpenGL ES, EGL Low Power Consumption Development Platforms Available 23

C535 Block Diagram NVIDIA Jetson TX1 System on Module NVIDIA GPU Quad-Core ARM CPU 4GB RAM LPDDR4 16GB Flash emmc 5.1 I 2 C PCIe PCIe PCIe x4 ETR SD Optional Expansion Optional Expansion Module Module Mini SATA SSD PCIe Switch PSU Gigabit Ethernet 2 USB 2.0 2 UART 2 Discrete I/O 8 DVI/HDMI Output PCIe x4 Optional I/O - 8 x Composite Inputs - 1 x SDI Input PCIe x4 Front Panel Connectors 24

Special Features A176/C535 Interface Expansions Currently available: FG Simultaneously captures 8 composite PAL/NTSC inputs FG HD/SD-SDI H264 dedicated encoder (streaming) Available upon request: FG CameraLink input ARINC-429 6 channels 1553 2 channels 25

Technical Features EV176 Development System for A176/C535 26

EV176 Development System for A176 Cyclone Start SW development right now! 27

Applications GPU rendering (navigation, maps, etc ) CUDA based (algorithms) Image Processing (CUDA accelerated) Radars Flight Simulators Video recorders/streaming Surveillance Autonomous Vehicles/Drones Smart Cities GPGPU extensions to existing systems 28

Thank you! 29