Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM - Systems GTC Israel 2017
Agenda Current GPGPU systems NVIDIA Jetson TX1 and TX2 evaluation Conclusions New Products 2
GPGPU Product Line 3
Current GPGPU Products 4
A191 Block Diagram J3 J1 J2 18 36V Input Power Gigabit Ethernet 2 2 USB Serial 2 RGBHV DVI/HDMI SD-SDI Composite Video 2/0 4/7 DVI/HDMI RGBHV Power Supply 2.5" SSD (optional) Frame Grabber Mezzanine On-Board SSD C873 4 th Gen. Core i7 SBC C530 GPGPU Board SATA PCIe x8 5
We need SwaP System 6
Jetson TX1 SFF - 50x87mm SoM with Linux support Good for SWaP systems Supercomputing performance Quad-core ARM Cortex -A57 CPUs GPU - NVIDIA Maxwell, 1 TFLOP/s with 256 CUDA Cores 7
400-pin board-to-board connector pin-out will be backward-compatible with future versions draws as little as 1 watt of power or lower while idle 8-10 watts under typical CUDA load up to 15 watts TDP when the module is fully utilized automatically scaling of CPU,GPU, memory 1 TFLOPS (GTX 770M is 1.36 TFLOPS) HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras 8
Jetson TX1 Evaluation - Non-Graphical Benchmark The smaller is the number the faster is calculation on GPU using CUDA. TX1 Max is Jetson TX1 running with maximum GPU frequency C873 & C530 which is about 120 Watts, only x 1.8 faster than Jetson TX1 which is only 15 Watt 9
Jetson TX1 Evaluation - Conclusions Jetson TX1 get a real boost in rendering and CUDA calculation power CUDA calculation performance TX1 vs TK1 x 2 to x 4 for TX1 TX1 vs C873&C530 (770M) only x 1.8 for C873&C530 (770M) If Linux is not an obstacle for our customers, Jetson TX1 based product will be success 10
Comparison table: TX2 vs TX1 Jetson TX2 Jetson TX1 GPU NVIDIA Pascal, 256 CUDA cores NVIDIA Maxwell, 256 CUDA cores CPU HMP Dual Denver 2/2 MB L2 + Quad ARM A57/2 MB L2 Quad ARM A57/2 MB L2 Memory 8 GB 128 bit LPDDR4 58.3 GB/s 4 GB 64 bit LPDDR4 25.6 GB/s Display 2x DSI, 2x DP 1.2 / HDMI 2.0 / edp 1.4 2x DSI, 1x edp 1.4 / DP 1.2 / HDMI PCIE Gen 2 1x4 + 1x1 OR 2x1 + 1x2 Gen 2 1x4 + 1x1 Data Storage 32 GB emmc, SDIO, SATA 16 GB emmc, SDIO, SATA Other CAN, UART, SPI, I2C, I2S, GPIOs UART, SPI, I2C, I2S, GPIOs USB USB 3.0 + USB 2.0 Connectivity Mechanical 1 Gigabit Ethernet, 802.11ac WLAN, Bluetooth 50 mm x 87 mm (400-Pin Compatible Board-to-Board Connector) 11
Dual Operating Modes 12
non-graphical benchmark (CUDA algorithms) - lower is better [ms] TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1 n-body number 4096 4096 4096 Time for 10 iterations [msec] 22.533 68.4 16.421-67% 27% n-body number 8192 8192 8192 Time for 10 iterations [msec] 81.491 272.97 65.24-70% 20% n-body number 16384 16384 16384 Time for 10 iterations [msec] 206.799 527.47 154-61% 25.5 % TX2 has a better performance when using MAXN power mode 13
CPU benchmark - lower is better [ms] - nbody algorithm running on CPU TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1 n-body number 4096 4096 4096 Time for 10 iterations [msec] 30492.172 57837.430 7169.735-47% 76.5% n-body number 8192 8192 8192 Time for 10 iterations [msec] 121315.578 232723.719 11340.421-48% 90% TX2 has a better CPU performance when using MAXN power mode 14
Conclusions TX2 getting a boost in GPU CUDA calculation power using MAXN power mode MAXN power mode - increase of about 24% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 66% in performance (max power consumption 7.5 W) TX2 getting a boost in CPU calculation power using MAXN power mode MAXN power mode - increase of about 83% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 47% in performance (max power consumption 7.5 W) The SW release is "Developer Preview Release", so I hope it should be a lot of improvement and optimizations in near future As we see from above, the half power coming with half of performance. The full power coming with the boost for GPU (CUDA 24%) and CPU (83%). 15
16
Special Features 17
Technical Features A176 Cyclone GPGPU Fanless Small FF RediBuilt Supercomputer 18
A176 Cyclone Based on NVIDIA Jetson TX1/TX2 Pinout will be backward-compatible with future versions Draws as little as 1 Watt of power or lower while idle Automatically scaling of CPU,GPU, memory 1 TFLOPS Hardware encoder (H264/H265) and decoder 8-10 Watts under typical CUDA load Up to 17 Watts when the CPU/GPU are fully utilized Ultra Small Form Factor 129 mm [5.1"] square, 840g [1.85 lbs.] 19
A176 Block Diagram NVIDIA Jetson TX1 System on Module NVIDIA GPU Quad-Core ARM CPU 4GB RAM LPDDR4 16GB Flash emmc 5.1 I 2 C PCIe PCIe ETR Optional Optional Expansion Expansion Module Module Mini SATA SSD Isolated Power Supply Gigabit Ethernet Line Filter 2 USB 2.0 2 UART 2 Discrete I/O 8 DVI/HDMI Output Optional I/O - 8 x Composite Inputs - 1 x SDI Input Front Panel Connectors 20
A176 Highlights SWaP Optimized Rugged HPEC Ultra Small Form Factor 129 mm [5.1"] square, < 1 kg [2.2 lbs.] NVIDIA Jetson TX1 System on Module NVIDIA Maxwell Architecture GPU, with 256 CUDA cores ARM Cortex A57 Quad-Core CPU 1 TFLOPS H.264/H.265 HW Encoder Best Available Performance per Watt 60 GFLOPS/W SATA SSD with Quick Erase & Secure Erase 4 GB LPDDR4 Video Capture SDI (SD/HD) w/dedicated H.264 encoder Composite (RS-170A [NTSC]/PAL), 8 channels available simultaneously I/O Gigabit Ethernet UART Serial USB 2.0 Discretes DVI/HDMI Output Composite Input SDI Input CUDA, OpenGL, OpenGL ES, EGL Low Power Consumption Development Platforms Available Additional expansions: 1. Dual Channel 1553 2. ARINC 429 3. Camera Link Frame Grabber 21
Technical Features C535 Typhoon GPGPU 3U VPX Supercomputer Board 22
C535 Typhoon Highlights Rugged 3U VPX HPEC Board SBC with on-board GPGPU Rugged 3U VPX HPEC Board SBC with on-board GPGPU NVIDIA Jetson TX1 System on Module NVIDIA Maxwell Architecture GPU, with 256 CUDA cores ARM Cortex A57 Quad-Core CPU 1 TFLOPS H.264/H.265 HW Encoder Best Available Performance per Watt 60 GFLOPS/W SATA SSD with Quick Erase & Secure Erase 4 GB LPDDR4 Video Capture SDI (SD/HD) w/dedicated H.264 encoder Composite (RS-170A [NTSC]/PAL), 8 channels available simultaneously I/O Gigabit Ethernet UART Serial USB 2.0 Discretes DVI/HDMI Output Composite Input SDI Input CUDA, OpenGL, OpenGL ES, EGL Low Power Consumption Development Platforms Available 23
C535 Block Diagram NVIDIA Jetson TX1 System on Module NVIDIA GPU Quad-Core ARM CPU 4GB RAM LPDDR4 16GB Flash emmc 5.1 I 2 C PCIe PCIe PCIe x4 ETR SD Optional Expansion Optional Expansion Module Module Mini SATA SSD PCIe Switch PSU Gigabit Ethernet 2 USB 2.0 2 UART 2 Discrete I/O 8 DVI/HDMI Output PCIe x4 Optional I/O - 8 x Composite Inputs - 1 x SDI Input PCIe x4 Front Panel Connectors 24
Special Features A176/C535 Interface Expansions Currently available: FG Simultaneously captures 8 composite PAL/NTSC inputs FG HD/SD-SDI H264 dedicated encoder (streaming) Available upon request: FG CameraLink input ARINC-429 6 channels 1553 2 channels 25
Technical Features EV176 Development System for A176/C535 26
EV176 Development System for A176 Cyclone Start SW development right now! 27
Applications GPU rendering (navigation, maps, etc ) CUDA based (algorithms) Image Processing (CUDA accelerated) Radars Flight Simulators Video recorders/streaming Surveillance Autonomous Vehicles/Drones Smart Cities GPGPU extensions to existing systems 28
Thank you! 29