TEGRA K1 による GPU コンピューティング

Size: px
Start display at page:

Download "TEGRA K1 による GPU コンピューティング"

Transcription

1 TEGRA K1 による GPU コンピューティング COMPUTE WITH TEGRA K1 馬路徹 シニア ソリューション アーキテクト NVIDIA

2 AGENDA Introducing Tegra K1 Tegra K1 Compute Software Capabilities OpenGL GLSL OpenCL CUDA/Unified Memory Google Renderscript

3 Tegra K1 Kepler Architecture ISA Compatible to GeForce, Quadro, Tesla Tesla In Super Computers A15 A15 LP A15 A15 A15 64kB L1 Cache and Shared Memory Quadro In Work Stations 192 CUDA Cores 128kB L2 Cache GeForce In PCs Mobile Kepler In Tegra

4 TEGRA K1 DEVELOPMENT PLATFORMS Coming to Android K1 Devices soon JETSON TK1 GigE, USB3.0, HDMI running Linux4Tegra JETSON X3 (TK1 PRO) GigE, USB3.0, HDMI, 8 x Cameras, CANBUS running Vibrante Linux AUTOMOTIVE GRADE

5 SOFTWARE FOR COMPUTE Tegra K1 can accelerate Renderscript OpenGL /OpenGL ES with Compute Shaders NPP cufft, cublas, cusparse OpenCV OpenCL full profile CUDA and a whole list of libraries enabling compute on the GPU

6 TEGRA K1 FOR OPENGL/GLSL Kepler Architecture 192 CUDA Cores Cortex-A15 4-Plus-1 Shared Physical Memory 2D Engine / ISP

7 COMPUTE SHADERS Standard OpenGL API Execute algorithmically general purpose GLSL shaders Operate on buffers, images and textures Process graphics data in the context of the graphics pipeline Easier than interoperating with a compute API for graphics apps Standard part of all OpenGL 4.3+ implementations And now OpenGL ES 3.1! Image processing AI Simulation Ray Tracing Wave Simulation Global Illumination

8 OPENGL COMPUTE SHADERS From Application From Application Element Array Buffer b Vertex Puller Dispatch Indirect Buffer b Dispatch Draw Indirect Buffer b Vertex Shader Image Load / Store t/b Compute Shader Vertex Buffer Object b Tessellation Control Shader Atomic Counter b Tessellation Primitive Gen. Shader Storage b Tessellation Eval. Shader Geometry Shader Texture Fetch t/b Transform Feedback Buffer b Transform Feedback Uniform Block b Legend Rasterization From Application Fixed Function Stage Programmable Stage Fragment Shader Pixel Assembly Pixel Unpack Buffer b b Buffer Binding Per-Fragment Operations Pixel Operations Texture Image t t Texture Binding Arrows indicate data flow Framebuffer Pixel Pack Pixel Pack Buffer b

9 TEGRA K1 FOR COMPUTE Kepler Architecture 192 CUDA Cores Cortex-A15 4-Plus-1 Shared Physical Memory 2D Engine / ISP

10 TEGRA K1 FOR OPENCL OpenCL 1.2 Full profile support (OpenCL and OpenCL Embedded) True portability from desktop Higher precision, higher limits Awesome performance Related Session: US S Real-Time Facial Motion Capture and Animation on Mobile Emiliano Gambaretto

11 TEGRA K1 CUDA 6 AND SHARED PHYSICAL MEMORY Kepler Architecture 192 CUDA Cores Cortex-A15 4-Plus-1 Shared Physical Memory 2D Engine / ISP

12 CUDA REQUIRES MEMORY COPY Programmers are forced do perform additional work to allocate memories both in host/device and copy data from/to host to/from device Conventional Discrete GPU global void saxpy(int n, float a, float *x, float *y) { } int i = blockidx.x*blockdim.x + threadidx.x; if (i < n) y[i] = a*x[i] + y[i]; PCIe 1 int N = 1<<20; cudamemcpy(d_x, x, N, cudamemcpyhosttodevice); cudamemcpy(d_y, y, N, cudamemcpyhosttodevice); 1 // Perform SAXPY on 1M elements saxpy<<<4096,256>>>(n, 2.0, d_x, d_y); Host Memory (CPUMemory) 2 Device Memory (GPU Memory) 2 cudamemcpy(y, d_y, N, cudamemcpydevicetohost);

13 CUDA UNIFIED MEMORY (FROM CUDA 6) Developer View Today Developer View With Unified Memory System Memory GPU Memory Unified Memory Dramatically Lower Developer Effort Faster performance on SoC

14 WHAT IS HAPPENING IN THE BACKGROUND

15 Audio Processor ARM7 TEGRA K1 PHYSICALLY SHARED MEMORY Physically CPU Quad Cortex-A15 + Shadow LP C-A15 CPU HD Video Processor Unified Memory without the need of Data SATA2 x1 USB 2.0 x3 PCIe G2 x4 + x1 Image Processor GPU Kepler 192 CUDA Cores Migration USB 3.0 x2 UART x4 I2C x5 SPI x4 SDIO/MMC x4 Display x2 HDMI edp/lvds CSI x4 + x4 NOR Flash DDR3 Ctlr 64b Security Engine DAP x5 (1 2 S/TDM) Unified Memory

16 CUDA UNIFIED MEMORY (FROM CUDA 6) Developer View Today Developer View With Unified Memory System Memory GPU Memory Unified Memory Dramatically Lower Developer Effort Faster performance on SoC

17 TEGRA K1 FOR RENDERSCRIPT Kepler Architecture 192 CUDA Cores Cortex-A15 4-Plus-1 Shared Physical Memory 2D Engine / ISP

18 RENDERSCRIPT C99 based kernel language Easy programmability with host and device portability. Portable across wide range of devices, fastest on Tegra K1 GPU (+ CPU) and more Renderscript API 19 Support

19 RENDERSCRIPT ON THE SOC Acceleration of Renderscript Scripts over GPU ScriptC, not just ScriptIntrinsics Huge gains in performance and performance/watt Runtime capable of scheduling work across units CPU GPU 2D Engine/ISP Related Session: US S Efficient Parallel Computation on Android Jason Sams, Tim Murray

20 INTRODUCTION (REF S4885)

21 CONSTRAINTS #1, #2 (Ref S4885)

22 CONSTRAINTS #3 (Ref S4885)

23 GPU OR CPU? (REF S4885)

24 DESKTOP PERFORMANCE TODAY (REF S4885)

25 MOBILE PERFORMANCE TODAY (SHIPPING) (REF S4885)

26 ARCHITECTUAL DIVERSITY? (REF S4885)

27 GOAL OF RENDERSCRIPT (REF S4885)

28 WHAT IS RENDERSCRIPT? (REF S4885)

29 RENDERSCRIPT INTRINSICS (REF S4885)

30 TEGRA K1? (REF S4885)

31 TEGRA K1? (REF S4885)

32 SUMMARY Tegra K1 内蔵のKeplerはTesla/Quadro/GeForceとアーキテクチャを共通とするスケーラブルなGPU これによりTesla/Quadro/GeForceで熟成されたCUDA, OpenCL, OpenGL Shader Languageのソフト資産 開発環境が使用可能 さらにHPC WS PCとは対極にあるモバイル用のRenderscriptに関しても GPUを活用することにより 優れた性能を発揮する

33 THANK YOU

INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.

INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp. INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp. Computer Vision in Mobile Tegra K1 It s time! AGENDA Use cases categories

More information

THE LEADER IN VISUAL COMPUTING

THE LEADER IN VISUAL COMPUTING MOBILE EMBEDDED THE LEADER IN VISUAL COMPUTING 2 TAKING OUR VISION TO REALITY HPC DESIGN and VISUALIZATION AUTO GAMING 3 BEST DEVELOPER EXPERIENCE Tools for Fast Development Debug and Performance Tuning

More information

TEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich

TEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich TEGRA K1 AND THE AUTOMOTIVE INDUSTRY Gernot Ziegler, Timo Stich Previously: Tegra in Automotive Infotainment / Navigation Digital Instrument Cluster Passenger Entertainment TEGRA K1 with Kepler GPU GPU:

More information

GPU programming CUDA C. GPU programming,ii. COMP528 Multi-Core Programming. Different ways:

GPU programming CUDA C. GPU programming,ii. COMP528 Multi-Core Programming. Different ways: COMP528 Multi-Core Programming GPU programming,ii www.csc.liv.ac.uk/~alexei/comp528 Alexei Lisitsa Dept of computer science University of Liverpool a.lisitsa@.liverpool.ac.uk Different ways: GPU programming

More information

IMAGE AND VISION PROCESSING ON TEGRA K1. Elif Albuz

IMAGE AND VISION PROCESSING ON TEGRA K1. Elif Albuz IMAGE AND VISION PROCESSING ON TEGRA K1 Elif Albuz IMAGE AND VISION USE CASES Driven by using camera as a sensor Computational Photography and Videography Face, Body and Gesture Tracking 3D Scene/Object

More information

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1 Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Ecosystem @neilt3d Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon

More information

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming

More information

Dave Shreiner, ARM March 2009

Dave Shreiner, ARM March 2009 4 th Annual Dave Shreiner, ARM March 2009 Copyright Khronos Group, 2009 - Page 1 Motivation - What s OpenGL ES, and what can it do for me? Overview - Lingo decoder - Overview of the OpenGL ES Pipeline

More information

Unofficial Redmine Cooking - QA #782 yaml_db を使った DB のマイグレーションで失敗する

Unofficial Redmine Cooking - QA #782 yaml_db を使った DB のマイグレーションで失敗する Unofficial Redmine Cooking - QA #782 yaml_db を使った DB のマイグレーションで失敗する 2018/03/26 10:04 - Tamura Shinji ステータス : 新規開始日 : 2018/03/26 優先度 : 通常期日 : 担当者 : 進捗率 : 0% カテゴリ : 予定工数 : 0.00 時間 対象バージョン : 作業時間 : 0.00 時間

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013

GPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013 GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are

More information

https://login.microsoftonline.com/ /oauth2 Protected API Your Client App Your Client App Your Client App Microsoft Account v2.0 endpoint Unified AuthN/Z endpoint Outlook.com (https://login.microsoftonline.com/common/oauth2/v2.0)

More information

SIGGRAPH Briefing August 2014

SIGGRAPH Briefing August 2014 Copyright Khronos Group 2014 - Page 1 SIGGRAPH Briefing August 2014 Neil Trevett VP Mobile Ecosystem, NVIDIA President, Khronos Copyright Khronos Group 2014 - Page 2 Significant Khronos API Ecosystem Advances

More information

Introduction to CUDA CME343 / ME May James Balfour [ NVIDIA Research

Introduction to CUDA CME343 / ME May James Balfour [ NVIDIA Research Introduction to CUDA CME343 / ME339 18 May 2011 James Balfour [ jbalfour@nvidia.com] NVIDIA Research CUDA Programing system for machines with GPUs Programming Language Compilers Runtime Environments Drivers

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

DynamIQ Processor Solutions (Using Cortex- A75 & Cortex-A55) for 5G Networks & Mobile

DynamIQ Processor Solutions (Using Cortex- A75 & Cortex-A55) for 5G Networks & Mobile DynamIQ Processor Solutions (Using Cortex- A75 & Cortex-A55) for 5G Networks & Mobile Satoshi Nakajima FAE manager Arm Norio Kisumi Sr Manager, Operator Relations Arm 2017 Arm Limited Arm Tech Symposia

More information

Androidプログラミング 2 回目 迫紀徳

Androidプログラミング 2 回目 迫紀徳 Androidプログラミング 2 回目 迫紀徳 前回の復習もかねて BMI 計算アプリを作ってみよう! 2 3 BMI の計算方法 BMI = 体重 [kg] 身長 [m] 2 状態も表示できると GOOD 状態低体重 ( 痩せ型 ) 普通体重肥満 (1 度 ) 肥満 (2 度 ) 肥満 (3 度 ) 肥満 (4 度 ) 指標 18.5 未満 18.5 以上 25 未満 25 以上 30 未満 30

More information

今日の予定 1. 展開図の基礎的な知識 1. 正多面体の共通の展開図. 2. 複数の箱が折れる共通の展開図 :2 時間目 3. Rep-Cube: 最新の話題 4. 正多面体に近い立体と正 4 面体の共通の展開図 5. ペタル型の紙で折るピラミッド型 :2 時間目 ~3 時間目

今日の予定 1. 展開図の基礎的な知識 1. 正多面体の共通の展開図. 2. 複数の箱が折れる共通の展開図 :2 時間目 3. Rep-Cube: 最新の話題 4. 正多面体に近い立体と正 4 面体の共通の展開図 5. ペタル型の紙で折るピラミッド型 :2 時間目 ~3 時間目 今日の予定 このミステリー (?) の中でメイントリックに使われました! 1. 展開図の基礎的な知識 1. 正多面体の共通の展開図 2. 複数の箱が折れる共通の展開図 :2 時間目 3. Rep-Cube: 最新の話題 4. 正多面体に近い立体と正 4 面体の共通の展開図 5. ペタル型の紙で折るピラミッド型 :2 時間目 ~3 時間目 Some nets are available at http://www.jaist.ac.jp/~uehara/etc/origami/nets/index-e.html

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information

Cloud Connector 徹底解説. 多様な基盤への展開を可能にするための Citrix Cloud のキーコンポーネント A-5 セールスエンジニアリング本部パートナー SE 部リードシステムズエンジニア. 哲司 (Satoshi Komiyama) Citrix

Cloud Connector 徹底解説. 多様な基盤への展開を可能にするための Citrix Cloud のキーコンポーネント A-5 セールスエンジニアリング本部パートナー SE 部リードシステムズエンジニア. 哲司 (Satoshi Komiyama) Citrix 1 2017 Citrix Cloud Connector 徹底解説 多様な基盤への展開を可能にするための Citrix Cloud のキーコンポーネント A-5 セールスエンジニアリング本部パートナー SE 部リードシステムズエンジニア 小宮山 哲司 (Satoshi Komiyama) 2 2017 Citrix このセッションのもくじ Cloud Connector 徹底解説 Cloud Connector

More information

GPU CUDA Programming

GPU CUDA Programming GPU CUDA Programming 이정근 (Jeong-Gun Lee) 한림대학교컴퓨터공학과, 임베디드 SoC 연구실 www.onchip.net Email: Jeonggun.Lee@hallym.ac.kr ALTERA JOINT LAB Introduction 차례 Multicore/Manycore and GPU GPU on Medical Applications

More information

Yamaha Steinberg USB Driver V for Mac Release Notes

Yamaha Steinberg USB Driver V for Mac Release Notes Yamaha Steinberg USB Driver V1.10.2 for Mac Release Notes Contents System Requirements for Software Main Revisions and Enhancements Legacy Updates System Requirements for Software - Note that the system

More information

PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE

PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE April 4-7, 2016 Silicon Valley PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE Pradeep Chandrahasshenoy, Automotive Solutions Architect, NVIDIA Stefan Schoenefeld, ProViz DevTech, NVIDIA 4 th April 2016

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Verify99. Axis Systems

Verify99. Axis Systems Axis Systems Axis Systems Mission Axis Systems, Inc. is a technology leader in the logic design verification market. Founded in 1996, the company offers breakthrough technologies and high-speed simulation

More information

Programming shaders & GPUs Christian Miller CS Fall 2011

Programming shaders & GPUs Christian Miller CS Fall 2011 Programming shaders & GPUs Christian Miller CS 354 - Fall 2011 Fixed-function vs. programmable Up until 2001, graphics cards implemented the whole pipeline for you Fixed functionality but configurable

More information

GPGPU on Mobile Devices

GPGPU on Mobile Devices GPGPU on Mobile Devices Introduction Addressing GPGPU for very mobile devices Tablets Smartphones Introduction Why dedicated GPUs in mobile devices? Gaming Physics simulation for realistic effects 3D-GUI

More information

Introduction to OpenGL ES 3.0

Introduction to OpenGL ES 3.0 Introduction to OpenGL ES 3.0 Eisaku Ohbuchi Digital Media Professionals Inc. 2012 Digital Media Professionals Inc. All rights reserved. 12/Sep/2012 Page 1 Agenda DMP overview (quick!) OpenGL ES 3.0 update

More information

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015 Copyright Khronos Group 2015 - Page 1 Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

More information

CS4621/5621 Fall Computer Graphics Practicum Intro to OpenGL/GLSL

CS4621/5621 Fall Computer Graphics Practicum Intro to OpenGL/GLSL CS4621/5621 Fall 2015 Computer Graphics Practicum Intro to OpenGL/GLSL Professor: Kavita Bala Instructor: Nicolas Savva with slides from Balazs Kovacs, Eston Schweickart, Daniel Schroeder, Jiang Huang

More information

NVIDIA Fermi Architecture

NVIDIA Fermi Architecture Administrivia NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 4 grades returned Project checkpoint on Monday Post an update on your blog beforehand Poster

More information

GTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation:

GTC 2013 March San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation: GTC 2013 March 18-21 San Jose, CA The Smartest People. The Best Ideas. The Biggest Opportunities. Opportunities for Participation: SPEAK - Showcase your work among the elite of graphics computing - Call

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 Gaming Market Briefing Overview of APIs GDC March 2016 Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Copyright

More information

Embedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017

Embedded Computing without Compromise. Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems GTC Israel 2017 Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM - Systems GTC Israel 2017 Agenda Current GPGPU systems NVIDIA Jetson TX1 and TX2 evaluation Conclusions New Products 2 GPGPU Product

More information

DirectX10 Effects and Performance. Bryan Dudash

DirectX10 Effects and Performance. Bryan Dudash DirectX10 Effects and Performance Bryan Dudash Today s sessions Now DX10のエフェクトとパフォーマンスならび使用法 Bryan Dudash NVIDIA 16:50 17:00 BREAK 17:00 18:30 NVIDIA GPUでの物理演算 Simon Green NVIDIA Motivation Direct3D 10

More information

Antonio R. Miele Marco D. Santambrogio

Antonio R. Miele Marco D. Santambrogio Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First

More information

J の Lab システムの舞台裏 - パワーポイントはいらない -

J の Lab システムの舞台裏 - パワーポイントはいらない - JAPLA 研究会資料 2011/6/25 J の Lab システムの舞台裏 - パワーポイントはいらない - 西川利男 学会の発表などでは 私は J の Lab を活用している 多くの人が使っているパワーポイントなぞ使う気にはならない J の Lab システムは会場の大きなスクリーンで説明文書が出来ることはもちろんだが システム自身が J の上で動いていることから J のプログラムが即実行出来て

More information

Getting Started with CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

Getting Started with CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator Getting Started with CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator Heterogeneous Computing CPU GPU Once upon a time Past Massively Parallel Supercomputers Goodyear MPP Thinking Machine MasPar Cray 2 1.31

More information

ECE 574 Cluster Computing Lecture 15

ECE 574 Cluster Computing Lecture 15 ECE 574 Cluster Computing Lecture 15 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 March 2017 HW#7 (MPI) posted. Project topics due. Update on the PAPI paper Announcements

More information

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Deep Learning: Transforming Engineering and Science The MathWorks, Inc. Deep Learning: Transforming Engineering and Science 1 2015 The MathWorks, Inc. DEEP LEARNING: TRANSFORMING ENGINEERING AND SCIENCE A THE NEW RISE ERA OF OF GPU COMPUTING 3 NVIDIA A IS NEW THE WORLD S ERA

More information

Chapter 1 Videos Lesson 61 Thrillers are scary ~Reading~

Chapter 1 Videos Lesson 61 Thrillers are scary ~Reading~ LESSON GOAL: Can read about movies. 映画に関する文章を読めるようになろう Choose the word to match the underlined word. 下線の単語から考えて どんな映画かを言いましょう 1. The (thriller movie, sports video) I watched yesterday was scary. 2. My

More information

Introduction to Information and Communication Technology (a)

Introduction to Information and Communication Technology (a) Introduction to Information and Communication Technology (a) 6 th week: 1.5 Information security and management Kazumasa Yamamoto Dept. Computer Science & Engineering Introduction to ICT(a) 6th week 1

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Firefox for mac

Firefox for mac Мобильный портал WAP версия: wap.altmaster.ru Firefox for mac 10.6.8 Download old versions of Firefox for Mac.. Firefox. A multi-platform web browser with open source code. Mozilla Firefox for Mac latest

More information

本書について... 7 本文中の表記について... 7 マークについて... 7 MTCE をインストールする前に... 7 ご注意... 7 推奨 PC 仕様... 8 MTCE をインストールする... 9 MTCE をアンインストールする... 11

本書について... 7 本文中の表記について... 7 マークについて... 7 MTCE をインストールする前に... 7 ご注意... 7 推奨 PC 仕様... 8 MTCE をインストールする... 9 MTCE をアンインストールする... 11 Installation Guide FOR English 2 About this guide... 2 Notations used in this document... 2 Symbols... 2 Before installing MTCE... 2 Notice... 2 Recommended computer specifications... 3 Installing MTCE...

More information

A176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O

A176 Cyclone. GPGPU Fanless Small FF RediBuilt Supercomputer. IT and Instrumentation for industry. Aitech I/O The A176 Cyclone is the smallest and most powerful Rugged-GPGPU, ideally suited for distributed systems. Its 256 CUDA cores reach 1 TFLOPS, and it consumes less than 17W at full load (8-10W at typical

More information

Methods to Detect Malicious MS Document File using File Structure Inspection

Methods to Detect Malicious MS Document File using File Structure Inspection MS 1,a) 2,b) 2 MS Rich Text Compound File Binary MS MS MS 98.4% MS MS Methods to Detect Malicious MS Document File using File Structure Inspection Abstract: Today, the number of targeted attacks is increasing,

More information

April 4-7, 2016 Silicon Valley

April 4-7, 2016 Silicon Valley April 4-7, 2016 Silicon Valley TEGRA PLATFORMS GAMING DRONES ROBOTICS IVA AUTOMOTIVE 2 Compile Debug Profile Trace C/C++ NVTX NVIDIA Tools extension Getting Started CodeWorks JetPack Installers IDE Integration

More information

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT Manycore and GPU Channelisers Seth Hall High Performance Computing Lab, AUT GPU Accelerated Computing GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate

More information

CUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA

CUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA PROGRAMMING MODEL Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA: COMMON UNIFIED DEVICE ARCHITECTURE Parallel computing architecture and programming model GPU Computing Application Includes

More information

GPU Memory Model Overview

GPU Memory Model Overview GPU Memory Model Overview John Owens University of California, Davis Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization SciDAC Institute for Ultrascale Visualization

More information

Yamaha Steinberg USB Driver V for Windows Release Notes

Yamaha Steinberg USB Driver V for Windows Release Notes Yamaha Steinberg USB Driver V1.10.4 for Windows Release Notes Contents System Requirements for Software Main Revisions and Enhancements Legacy Updates System Requirements for Software - Note that the system

More information

Accelerating Realism with the (NVIDIA Scene Graph)

Accelerating Realism with the (NVIDIA Scene Graph) Accelerating Realism with the (NVIDIA Scene Graph) Holger Kunz Manager, Workstation Middleware Development Phillip Miller Director, Workstation Middleware Product Management NVIDIA application acceleration

More information

Yamaha Steinberg USB Driver V for Windows Release Notes

Yamaha Steinberg USB Driver V for Windows Release Notes Yamaha Steinberg USB Driver V1.9.11 for Windows Release Notes Contents System Requirements for Software Main Revisions and Enhancements Legacy Updates System Requirements for Software - Note that the system

More information

OpenGL BOF Siggraph 2011

OpenGL BOF Siggraph 2011 OpenGL BOF Siggraph 2011 OpenGL BOF Agenda OpenGL 4 update Barthold Lichtenbelt, NVIDIA OpenGL Shading Language Hints/Kinks Bill Licea-Kane, AMD Ecosystem update Jon Leech, Khronos Viewperf 12, a new beginning

More information

サンプル. NI TestStand TM I: Introduction Course Manual

サンプル. NI TestStand TM I: Introduction Course Manual NI TestStand TM I: Introduction Course Manual Course Software Version 4.1 February 2009 Edition Part Number 372771A-01 NI TestStand I: Introduction Course Manual Copyright 2009 National Instruments Corporation.

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

S CUDA on Xavier

S CUDA on Xavier S8868 - CUDA on Xavier Anshuman Bhat CUDA Product Manager Saikat Dasadhikari CUDA Engineering 29 th March 2018 1 CUDA ECOSYSTEM 2018 CUDA DOWNLOADS IN 2017 3,500,000 CUDA REGISTERED DEVELOPERS 800,000

More information

Lecture 4 Branch & cut algorithm

Lecture 4 Branch & cut algorithm Lecture 4 Branch & cut algorithm 1.Basic of branch & bound 2.Branch & bound algorithm 3.Implicit enumeration method 4.B&B for mixed integer program 5.Cutting plane method 6.Branch & cut algorithm Slide

More information

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014) A practitioner s view of challenges faced with power and performance on mobile GPU Prashant Sharma Samsung R&D Institute UK LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014) SERI

More information

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?

More information

Take GPU Processing Power Beyond Graphics with Mali GPU Computing

Take GPU Processing Power Beyond Graphics with Mali GPU Computing Take GPU Processing Power Beyond Graphics with Mali GPU Computing Roberto Mijat Visual Computing Marketing Manager August 2012 Introduction Modern processor and SoC architectures endorse parallelism as

More information

携帯電話の 吸収率 (SAR) について / Specific Absorption Rate (SAR) of Mobile Phones

携帯電話の 吸収率 (SAR) について / Specific Absorption Rate (SAR) of Mobile Phones 携帯電話の 吸収率 (SAR) について / Specific Absorption Rate (SAR) of Mobile Phones 1. SC-02L の SAR / About SAR of SC-02L ( 本語 ) この機種 SC-02L の携帯電話機は 国が定めた電波の 体吸収に関する技術基準および電波防護の国際ガイドライ ンに適合しています この携帯電話機は 国が定めた電波の 体吸収に関する技術基準

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

QuantaPlex Series T41S-2U/T41SP-2U

QuantaPlex Series T41S-2U/T41SP-2U QuantaPlex Series T41S-2U/T41SP-2U 2U 4-Node Server Featuring Latest DDR4 Technology User's Guide Version: 2.0.0 Copyright Copyright 2014 Quanta Computer Inc. This publication, including all photographs,

More information

Online Meetings with Zoom

Online Meetings with Zoom Online Meetings with Zoom Electronic Applications の下の部分に Zoom への入り口 What is Zoom? This Web Conferencing service is offered free of charge to eligible officers of technical committees, subcommittees, working

More information

Vehicle Calibration Techniques Established and Substantiated for Motorcycles

Vehicle Calibration Techniques Established and Substantiated for Motorcycles Technical paper Vehicle Calibration Techniques Established and Substantiated for Motorcycles モータサイクルに特化した車両適合手法の確立と実証 Satoru KANNO *1 Koichi TSUNOKAWA *1 Takashi SUDA *1 菅野寛角川浩一須田玄 モータサイクル向け ECU は, 搭載性をよくするため小型化が求められ,

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Real-time Graphics 9. GPGPU

Real-time Graphics 9. GPGPU 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing GPGPU general-purpose

More information

PRODUCT DESCRIPTIONS AND METRICS

PRODUCT DESCRIPTIONS AND METRICS PRODUCT DESCRIPTIONS AND METRICS 1. Multiple-User Access. 1.1 If On-Premise Software licensed on a per-user basis is installed on a Computer accessible by more than one User, then the total number of Users

More information

Cg 2.0. Mark Kilgard

Cg 2.0. Mark Kilgard Cg 2.0 Mark Kilgard What is Cg? Cg is a GPU shading language C/C++ like language Write vertex-, geometry-, and fragmentprocessing kernels that execute on massively parallel GPUs Productivity through a

More information

携帯電話の 吸収率 (SAR) について / Specific Absorption Rate (SAR) of Mobile Phones

携帯電話の 吸収率 (SAR) について / Specific Absorption Rate (SAR) of Mobile Phones 携帯電話の 吸収率 (SAR) について / Specific Absorption Rate (SAR) of Mobile Phones 1. Z-01K の SAR / About SAR of Z-01K ( 本語 ) この機種 Z-01K の携帯電話機は 国が定めた電波の 体吸収に関する技術基準および電波防護の国際ガイドライン に適合しています この携帯電話機は 国が定めた電波の 体吸収に関する技術基準

More information

UB-U01III/U02III/U03II User s Manual

UB-U01III/U02III/U03II User s Manual English UB-U01III/U02III/U03II User s Manual Standards and Approvals Copyright 2003 by Seiko Epson Corporation Printed in China The following standards are applied only to the boards that are so labeled.

More information

Centralized (Indirect) switching networks. Computer Architecture AMANO, Hideharu

Centralized (Indirect) switching networks. Computer Architecture AMANO, Hideharu Centralized (Indirect) switching networks Computer Architecture AMANO, Hideharu Textbook pp.92~130 Centralized interconnection networks Symmetric: MIN (Multistage Interconnection Networks) Each node is

More information

A Trip Down The (2011) Rasterization Pipeline

A Trip Down The (2011) Rasterization Pipeline A Trip Down The (2011) Rasterization Pipeline Aaron Lefohn - Intel / University of Washington Mike Houston AMD / Stanford 1 This talk Overview of the real-time rendering pipeline available in ~2011 corresponding

More information

Preparing Information Design-Oriented. Posters. easy to. easy to. See! Understand! easy to. Convey!

Preparing Information Design-Oriented. Posters. easy to. easy to. See! Understand! easy to. Convey! Preparing Information Design-Oriented Posters easy to Convey! easy to See! easy to Understand! Introduction What is the purpose of a presentation? It is to convey accurately what you want to convey to

More information

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair OpenGL ES in the Mobile Graphics Ecosystem Tom Olson OpenGL ES working group chair Director, Graphics Research, ARM Ltd 1 Outline Why Mobile Graphics? OpenGL ES Overview Getting Started with OpenGL ES

More information

Real-time Graphics 9. GPGPU

Real-time Graphics 9. GPGPU Real-time Graphics 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing

More information

Bifrost - The GPU architecture for next five billion

Bifrost - The GPU architecture for next five billion Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor

More information

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools

Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition. Jeff Kiel Director, Graphics Developer Tools Enabling the Next Generation of Computational Graphics with NVIDIA Nsight Visual Studio Edition Jeff Kiel Director, Graphics Developer Tools Computational Graphics Enabled Problem: Complexity of Computation

More information

Direct Rendering of Trimmed NURBS Surfaces

Direct Rendering of Trimmed NURBS Surfaces Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended

More information

MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境

MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境 MySQL Cluster 7.3 リリース記念!! 5 分で作る MySQL Cluster 環境 日本オラクル株式会社山崎由章 / MySQL Senior Sales Consultant, Asia Pacific and Japan 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. New!! 外部キー

More information

Hands-On Workshop: 3D Automotive Graphics on Connected Radios Using Rayleigh and OpenGL ES 2.0

Hands-On Workshop: 3D Automotive Graphics on Connected Radios Using Rayleigh and OpenGL ES 2.0 Hands-On Workshop: 3D Automotive Graphics on Connected Radios Using Rayleigh and OpenGL ES 2.0 FTF-AUT-F0348 Hugo Osornio Luis Olea A P R. 2 0 1 4 TM External Use Agenda Back to the Basics! What is a GPU?

More information

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University Part 1: General introduction Ch. Hoelbling Wuppertal University Lattice Practices 2011 Outline 1 Motivation 2 Hardware Overview History Present Capabilities 3 Programming model Past: OpenGL Present: CUDA

More information

IRS16: 4 byte ASN. Version: 1.0 Date: April 22, 2008 Cisco Systems 2008 Cisco, Inc. All rights reserved. Cisco Systems Japan

IRS16: 4 byte ASN. Version: 1.0 Date: April 22, 2008 Cisco Systems 2008 Cisco, Inc. All rights reserved. Cisco Systems Japan IRS16: 4 byte ASN Version: 1.0 Date: April 22, 2008 Cisco Systems hkanemat@cisco.com 1 目次 4 byte ASN の対応状況 運用での変更点 2 4 byte ASN の対応状況 3 4 byte ASN の対応状況 IOS XR 3.4 IOS: 12.0S 12.2SR 12.2SB 12.2SX 12.5T

More information

Rechargeable LED Work Light

Rechargeable LED Work Light Rechargeable LED Work Light 充電式 LED 作業灯 Model:SWL-150R1 Using LED:LG innotek SMD, HI-POWER(150mA 15 position) Color Temperature:5,700 kelvin Using Battery:LG chemical Li-ion Battery(2,600mA 1set) Brightness

More information

frame buffer depth buffer stencil buffer

frame buffer depth buffer stencil buffer Final Project Proposals Programmable GPUS You should all have received an email with feedback Just about everyone was told: Test cases weren t detailed enough Project was possibly too big Motivation could

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Ciril Bohak. - INTRODUCTION TO WEBGL

Ciril Bohak. - INTRODUCTION TO WEBGL 2016 Ciril Bohak ciril.bohak@fri.uni-lj.si - INTRODUCTION TO WEBGL What is WebGL? WebGL (Web Graphics Library) is an implementation of OpenGL interface for cmmunication with graphical hardware, intended

More information

Hardware- Software Co-design at Arm GPUs

Hardware- Software Co-design at Arm GPUs Hardware- Software Co-design at Arm GPUs Johan Grönqvist MCC 2017 - Uppsala About Arm Arm Mali GPUs: The World s #1 Shipping Graphics Processor 151 Total Mali licenses 21 Mali video and display licenses

More information

gopro silver edition 3B894937B25EC9AF E4F5DA Gopro Silver Edition

gopro silver edition 3B894937B25EC9AF E4F5DA Gopro Silver Edition Gopro Silver Edition Thank you very much for reading. Maybe you have knowledge that, people have search numerous times for their favorite novels like this, but end up in harmful downloads. Rather than

More information

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited

More information

Googleの強みは ささえるのは世界一のインフラ. Google File System 2008年度後期 情報システム構成論2 第10回 クラウドと協調フィルタリング. 初期(1999年)の Googleクラスタ. 最近のデータセンタ Google Chrome Comicより

Googleの強みは ささえるのは世界一のインフラ. Google File System 2008年度後期 情報システム構成論2 第10回 クラウドと協調フィルタリング. 初期(1999年)の Googleクラスタ. 最近のデータセンタ Google Chrome Comicより Googleの強みは 2008年度後期 情報システム構成論2 第10回 クラウドと協調フィルタリング 西尾 信彦 nishio@cs.ritsumei.ac.jp 立命館大学 情報理工学部 Cloud Computing 全地球規模で構成された圧倒的なPCクラスタ 部分的な機能不全を補う機能 あらゆる種類の情報へのサービスの提供 Web上の 全 情報 地図情報 (実世界情報) どのように利用されているかを機械学習

More information

JASCO-HPLC Operating Manual. (Analytical HPLC)

JASCO-HPLC Operating Manual. (Analytical HPLC) JASCO-HPLC Operating Manual (Analytical HPLC) Index A) Turning on Equipment and Starting ChromNav... 3 B) For Manual Measurement... 6 (1) Making Control Method... 7 (2) Preparation for Measurement... 9

More information

API サーバの URL. <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE COMPLIANCE_SCAN SYSTEM "

API サーバの URL. <?xml version=1.0 encoding=utf-8?> <!DOCTYPE COMPLIANCE_SCAN SYSTEM Policy Compliance PC スキャン結果の XML Policy Compliance(PC) スキャンの結果は ユーザインタフェースのスキャン履歴リストから XML 形式でダウンロードできます UI からダウンロードした XML 形式の PC スキャン結果には その他のサポートされている形式 (PDF HTML MHT および CSV) の PC スキャン結果と同じ内容が表示されます

More information

Modern editor-independent development environment for PHP

Modern editor-independent development environment for PHP エディタ中立な PHP 開発環境の現在 Modern editor-independent development environment for PHP 2018-11-23 Akiba Tokyo, Japan VimConf 2018 #vimconf 日本語でおk 筆者の英語は残念なので 発表済みの日本語資料を 先に読むことをおすすめ Who am I...? aka @tadsan Kenta

More information

EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL. Debalina Bhattacharjee Sharan Ashwathnarayan

EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL. Debalina Bhattacharjee Sharan Ashwathnarayan 53023 - EGLSTREAMS: INTEROPERABILITY FOR CAMERA, CUDA AND OPENGL Debalina Bhattacharjee Sharan Ashwathnarayan Tegra SOC and typical use-cases Why Interops EGLStream and Its Key Features Agenda Examples

More information

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Introduction to CUDA programming KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Introduction to CUDA programming 1 Agenda GPU Architecture Overview Tools of the Trade Introduction to CUDA C Patterns of Parallel

More information