To hear the audio, please be sure to dial in: ID#

Similar documents
Dr. Yassine Hariri CMC Microsystems

HPP-Heterogeneous Processing Platform Programming Models, Performance and Power Profiling for the HPP

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

Part Number Class Category Title UPC Code Image Download Country of Origin Cost MSRP. Short Description. Long Description. Main Specifications

Part Number Class Category Title UPC Code Image Download Country of Origin Cost MSRP. Short Description. Long Description. Main Specifications

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Suggested use: infrastructure applications, collaboration/ , web, and virtualized desktops in a workgroup or distributed environments.

CME 213 S PRING Eric Darve

Pedraforca: a First ARM + GPU Cluster for HPC

Altos T310 F3 Specifications

IBM Power AC922 Server

OpenFOAM Performance Testing and Profiling. October 2017

Altos R320 F3 Specifications. Product overview. Product views. Internal view

Trends and Challenges in Multicore Programming

GPUs and Emerging Architectures

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

System Configuration Guide

HP ProLiant DL360 G7 Server - Overview

Intel Xeon E v4, Windows Server 2016 Standard, 16GB Memory, 1TB SAS Hard Drive and a 3 Year Warranty

ThinkServer RD630 Photography - 3.5" Disk Bay*

There s STILL plenty of room at the bottom! Andreas Olofsson

Ultimate Workstation Performance

Page 1 of 18 Dave Pimm Avid Technology Dec 6th, 2013 Rev B

Avid Configuration Guidelines HP Z640 Dual 8-Core / Dual 10-Core / Dual 12-Core CPU Workstation

Avid Configuration Guidelines HP Z440 Single 6-core / Single 8-Core CPU Workstation

Page 1 of 20 David Pimm Avid Technology Dec 7th, 2012 Rev - B

Intel Xeon E v4, Optional Operating System, 8GB Memory, 2TB SAS H330 Hard Drive and a 3 Year Warranty

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Page 1 of 20 David Pimm Avid Technology 10 December 2013 Rev - D

Avid Configuration Guidelines HP Z420 Six-Core CPU Workstation Media Composer 6.x Symphony 6.x NewsCutter 10.x and later

Precision T3620MT Precision T3620MT Precision T3420 SFF DEL-SNST36M003 DEL-SNST36M004 DEL-SNST34SF03

Avid Configuration Guidelines Dell T5610 Dual 6-Core, Dual 8-Core & Dual 12-Core CPU Media Composer Symphony NewsCutter 10.5.

THE LEADER IN VISUAL COMPUTING

Cisco MCS 7825-I1 Unified CallManager Appliance

Avid Configuration Guidelines Dell Precision T Core, CPU Workstation Media Composer Symphony NewsCutter 10.5.

ANNEX II + III : TECHNICAL SPECIFICATIONS + TECHNICAL OFFER

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers

Next Generation Computing Architectures for Cloud Scale Applications

Avid Configuration Guidelines LENOVO ThinkStation S30 Six-Core CPU Workstation Media Composer 6.x Symphony 6.x NewsCutter 10.

GPU Architecture. Alan Gray EPCC The University of Edinburgh

HostEngine 4U Host Computer User Guide

HostEngine 5URP24 Computer User Guide

Matrox Supersight Solo

Avid Configuration Guidelines HP Z640 Dual CPU Workstation Dual 8, 10, 12, 14 and 16 core

Page 1 of 20 David Pimm Avid Technology September 15th, 2012 Rev - A

Avid Configuration Guidelines Dell 7820 Tower workstation Dual 8 to 28 Core CPU System

HostEngine 3U Host Computer User Guide

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Intel Parallel Studio XE 2015

Smarter Clusters from the Supercomputer Experts

Avid Configuration Guidelines HP Z820 Dual Six-Core / Dual Eight-Core CPU Workstation

CISCO MEDIA CONVERGENCE SERVER 7815-I1

Building supercomputers from embedded technologies

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016

VXS-610 Dual FPGA and PowerPC VXS Multiprocessor

HostEngine 4U Host Computer User Guide

Avid Configuration Guidelines Dell 7920 Tower workstation Dual 8 to 28 Core CPU System

Cisco MCS 7815-I2 Unified CallManager Appliance

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

Avid Configuration Guidelines Dell R7920 Rack workstation Dual 8 to 28 Core CPU System

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Pactron FPGA Accelerated Computing Solutions

Acer AT110 F2 Specifications

NEC Express5800/R120h-2M System Configuration Guide

Avid Configuration Guidelines Dell T7600 Dual Six-Core / Dual Eight-Core CPU Workstation Media Composer 6.x Symphony 6.x NewsCutter 10.

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

Avid Configuration Guidelines Dell R7920 Rack workstation Dual 8 to 28 Core CPU System

High Performance Computing with Accelerators

FAST FORWARD TO YOUR <NEXT> CREATION

Mercury Computer Systems & The Cell Broadband Engine

ECE 571 Advanced Microprocessor-Based Design Lecture 20

Revolutionizing Open. Cecilia Carniel IBM Power Systems Scale Out sales

Cisco MCS 7828-I5 Unified Communications Manager Business Edition 5000 Appliance

HostEngine 4U Host Computer User Guide

Cisco MCS 7815-I1 Unified CallManager Appliance

IBM Power Systems HPC Cluster

ABySS Performance Benchmark and Profiling. May 2010

An NVMe-based Offload Engine for Storage Acceleration Sean Gibb, Eideticom Stephen Bates, Raithlin

INDIAN INSTITUTE OF SCIENCE EDUCATION AND RESEARCH PUNE

Cisco MCS 7815-I2. Serviceable SATA Disk Drives

Heterogeneous Computing and OpenCL

NEC Express5800/R120e-1M System Configuration Guide

Advances of parallel computing. Kirill Bogachev May 2016

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

CISCO MEDIA CONVERGENCE SERVER 7825-I1

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

Getting Started with Intel SDK for OpenCL Applications

Avid Configuration Guidelines HP Z440 Single 6-core / Single 8-Core CPU Workstation

Cisco HyperFlex HX220c M4 Node

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

ECE 8823: GPU Architectures. Objectives

Features. MIC-7700Q: 3 x independent displays (third display output via optional cable) MIC-7700H: 2 x independent displays Controller

Transcription:

Introduction to the HPP-Heterogeneous Processing Platform A combination of Multi-core, GPUs, FPGAs and Many-core accelerators To hear the audio, please be sure to dial in: 1-866-440-4486 ID# 4503739 Yassine Hariri Senior Engineer, Platform Design Hariri@cmc.ca 15-10-2013 Hugh W. Pollitt-Smith Senior System Design Engineer Pollitt-smith@cmc.ca

Agenda Introduction emsyscan Development Systems Update Processor Eras : Historical background HPP-Heterogenous Processing Platform Hardware architecture Software architecture Research topics enabled Getting Started with the HPP Live demo HPP Roadmap Discussion 2

PROCESSOR ERAS : HISTORICAL BACKGROUND 3

Multicore processors Further growth of established markets Multiprocessors are used everywhere Automotive Mobiles Desktops HPC 4

35 YEARS OF MICROPROCESSOR TREND DATA Single Core Era 2 - Multi issue 1 - Scale Frequency 3 - SIMD Multi and Many-Core Era Frequency Power Memory Original data collected and plotted by : M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond and C. Batten Dotted line extrapolations by C. Moor 5

Three Eras of Processor Performance Single-Core Era Enabled by - Moore s Law - Voltage Scaling - Microarchitecture Constrained by X Power X Complexity Multi-Many Core Heterogeneous Systems Era Era Enabled by Enabled by - Moore s Law - Moore s Law - Desire for throughput - Abundant data parallelism - Desire for performance - Power efficient GPUs Constrained by Constrained by X Power X Power X Parallel SW availability X Programming models X Scalability X Communication overheads Assembly>C>C++,Java Pthreads>OpenMP/TBB Shader>Cuda>OpenCL Source : The Salishan Conference on High Speed Computing, DATA PROCESSING IN EXASCALE-CLASS COMPUTER SYSTEMS Chuck Moore AMD Corporate Fellow & Technology Group CTO 6

Heterogeneous environment Software applications Programming models APIs Extended OS API Legacy API Application Specific API Software stack Tools and Compilers Parallelizers Compilers Debug Trace OSes, Runtimes, Drivers Real-time GP OS Special OS Hardware Multi-core GPU FPGA ASIC 7

A heterogeneous landscape Integration of different type of compute units: Big CPUs, Little CPUs GPU FPGA 1000 s of RISC processors Examples: ARM : Big Little Xilinx : Zynq Qualcomm : Snap dragon BIG CPU HW-IP1 HW-IP2 HW-IP2 GPU BIG CPU Little CPU Little CPU 1000 s of RISCs Serious software challenge: How to program a heterogeneous computing fabric? FPGA ANALOG 8

Heterogeneous Systems Software Stack Reuse of legacy software, Write new Software Tools for developing highly optimized parallel software Use standard parallel programming models or create new ones OS : heterogeneity and Real-time processing capability Power efficiency, Performance, Reliability for safety, flexibility, scalability, reconfigurability Applications Medical nursing Automotive Computer vision Tools Parallelizing tools System analysis Profiling Debug/Trace Programming models APIs Extended OS API Legacy API Application Specific API OS / Hypervisor Real-time GP OS Special OS Heterogeneous Processing Platform 9

HPP HARDWARE ARCHITECTURE 10

HPP main components The HPP workstation integrates the following main components: Dual core Intel Xeon E5-2620 V3 NVidia GPU (Tesla K20) FPGA board (Nallatech P385-A72). Xeon Phi 7120A Key Platform Benefits Customizability: Select the right mix of accelerators for your application Greater flexibility for HW/SW exploration Scalability: Create one node and scale up by adding more nodes Fast automated setup and configuration Technical support and training from CMC Microsystems 11

HPP fully installed system 12

HPP wallpaper 13

HPP workstation Processor/Cache CPU 2 x Intel Xeon Processor E5-2620 Series, v3, (6 cores, 85w) CPU Cache 15MB / Processor System Memory System Memory 128GB ECC, 2133MHz DDR4 RDIMM Expansion Slots 2 PCI-E 3.0 x16 double-wide form factor PCI-Express 2 PCI-E 3.0 x8 half-height, half-length single-width Network and Video Network Intel i217 Digabit Ethernet Controllers Video Nvidia k620 2 GB Power Supply 1300W power supply, 120 VAC 60Hz UL/CSA power source Storage 500 GB SATA 1st Solid state (SSD) 1TB SATA 7200 2nd HDD DVD-R/W System/CPU cooling Other Features USB Keyboard and Mouse 3-year parts and labour Canada-wide on-site warranty service (9:00am-5:00pm, Monday-Friday) 22 LCD Monitor; 1920 x 1080 resolution connected via DVI or HDMI 14

HPP Accelerators Accelerator Features Host interface Compute performance Power Nallatech 385 Altera Stratix V Mem. 2 banks of 4GB PCIe 3.0 x 8 unavailable Typical application 25W TESLA K20 2496 CUDA cores 5GB 208 GB/s PCIe 2.0 x 16 3.52 TFLOPS (single precision) 1.17 TFLOPs (double precision) 225W Xeon Phi 7120a 61 Cores, 1.33 GHz Mem. 16GB at 352 Gb/sec PCIe 2.0 x 16 Peak Double Precision 1.003 TFLOPs 300W 15

HPP SOFTWARE ARCHITECTURE 16

HPP Pre-Installed Software Components These software components required by the HPP are pre-installed on the workstation: RedHat Enterprise Linux 6.6 (Kernel version: 2.6.32-504.el6.x86_64) Java Runtime Environment (JRE) gcc compiler and toolchain Accelerator Software and tools GPU Tesla K20 NVIDIA CUDA 7 Toolkit Nallatech FPGA P385 Altera Quartus 15.0, Altera SDK for OpenCL 1.0 Nallatech FPGA P385 Board Support Package Xeon Phi 7120a Intel Manycore Platform Software Stack (MPSS) 3.5.1 for Linux Intel Parallel Studio XE 2015, Professional Studio for C++, Linux version o Intel C++/C/FORTRAN compilers, Intel Math Kernel Library o Debuggers, Performance and Correctness Analysis Tools o OpenMP, MPI, OFED messaging infrastructure (Linux only), OpenCL o Programming Models: Offload, Native and mixed Offload+Native 17

Research topics enabled Research Topics targeting parallel embedded systems: Software IPs and applications targeting heterogeneous parallel systems (e.g. imaging, video, and next-generation immersive applications such as computational photography and augmented reality) Software stack Parallel programming models, Compilers, middleware, Runtime, drivers and OSes Heterogeneous parallel architecture Hardware/Software exploration Debug and trace of applications running on a heterogeneous parallel system 18

GETTING STARTED WITH THE HPP 19

HPP Remote Access In order to install the VNC server on the HPP system, enter the following command in a terminal: # yum install tigervnc-server In order to start a VNC session on the HPP, enter the following command in a terminal: # vncserver -depth 24 -geometry 1680x1050 Note: The optimal values for the geometry setting depends on your screen size and resolution. The output of this command will include a line similar to the following: New 'HPPPrototype:# (root)' desktop is HPPPrototype:# Starting applications specified in /root/.vnc/xstartup 20

Transferring Files To or From the HPP System FileZilla is an open source, GUI-based client application for Windows and Linux available for free from http://filezilla-project.org/. To connect to the HPP sftp site using FileZilla, perform the following tasks: 1. From the field Host, enter the IP address of your HPP system. 2. From the fields Username and Password, enter your HPP user ID and password, respectively. 3. Select SFTP in the protocol field 4. Leave the Port field blank. 5. Click Connect. 21

Setting up Altera Design Tools License using CADpass CADpass on Linux platforms is required to run Altera tools. In order to use the Altera design tools, you need to launch CADpass from the CMC web portal by performing the following tasks: Right-click on "agclient.jnlp" located at: /CMC/tools. From the menu, select: Open with> IcedTea Web Start. Click Allow in the security window. Enter your CMC subscription: email address and password. Select Altera tool by double clicking Altera icon from the tools list 22

LIVE DEMO

HPP Roadmap Status (12 universities selected 18 systems G1, 7 universities selected 8 G2) o Assembled, cloned and tested 18/18 units. Shipping is in progress User Guides o (1) Quick start guide : Heterogeneous Parallel Platform (HPP) (to be released next week) o (2) User Guide: Performance and Power profiling for the HPP (In progress) Webinars series for the HPP o Introduction to the HPP-Heterogeneous Parallel Platform: A combination of Multicores, GPUs, FPGAs and Many-cores accelerators (August 26 th ) o Programming models, performance and power profiling for the HPP-Heterogeneous Parallel Platform (October) o Computer vision using OpenCV/OpenCL targeting the HPP- Heterogeneous Processing Platform (November) HPP Workshops (November) o User group workshop: presentations/demo/discussion o Training: tools and programming models for the HPP 24

DISCUSSION 25