Torch Internals. Ronan Collobert torch.ch
|
|
- Marvin Brendan Hines
- 5 years ago
- Views:
Transcription
1 Torch Internals Ronan Collobert torch.ch
2 Introduction
3 Machine Learning Library Around since versions, implemented in C, C++, Objective C Implements main machine learning algorithms: SVMs, Neural Networks, HMMs, GMMs Goal: research on large-scale machine learning
4 It is for researchers It has to be modular! organized as packages It must be easy to extend! It must be simple to understand! a high-level language is essential It must be fast (large-scale machine learning) JIT-compile or efficient C interface for core routines
5 Overview
6 Overview optim torch (lua) nn nngraph TH (C) svm sndfile image
7 Overview core optim torch (lua) nn nngraph TH (C) svm sndfile image
8 Overview packages core optim torch (lua) nn nngraph TH (C) svm sndfile image
9 Overview packages core optim torch (lua) nn nngraph TH (C) svm sndfile image
10 Overview packages core optim torch (lua) nn nngraph TH (C) svm sndfile image
11 Overview packages core optim torch (lua) nn nngraph TH (C) svm sndfile image
12 Overview packages glue core optim cwrap torch (lua) nn nngraph luat (C) TH (C) svm sndfile image
13 Which Programming Language? LISP? (Lush A subset could be compiled Could inline C code JIT-compiled Languages? (e.g. LuaJIT) Only a subset is actually compiled Python? err Lua? Looks like pseudo-code (yet very powerful) Simple C API
14 Which Programming Language? C++? (Torch3, PLearn, EBLearn ) Objective C? (Torch4) Avoid too many abstractions Avoid complicated syntax We chose C for Torch7
15 The Core TH and torch
16 The core library (TH+torch) ML algorithms manipulate all kind of data Represent data as a n-dimensional Tensor 1D: a bunch of features 2D: a gray image, some audio features 3D: RGB images 4D: Videos The core lib provides many tensor operations Available from C (TH lib) and Lua (torch package)
17 The Core Library (TH+torch) Avoid memory copies A Tensor is a view of a Storage (memory chunk) Storages might have different views (Tensors) Storages can be in-memory, or on-disk (mmap!)
18 The Core Library (TH+torch) Avoid memory copies A Tensor is a view of a Storage (memory chunk) Storages might have different views (Tensors) Storages can be in-memory, or on-disk (mmap!) Easy to get different views from another tensor x: x:narrow(dim, idx, size) x:select(dim, idx) x[idx] x:unfold(dim, kw, dw) x:transpose(dim1, dim2)
19 The Core Library (TH+torch) Avoid memory copies A Tensor is a view of a Storage (memory chunk) Storages might have different views (Tensors) Storages can be in-memory, or on-disk (mmap!) Easy to get different views from another tensor x: x:narrow(dim, idx, size) x:select(dim, idx) x[idx] new tensors share same storage x:unfold(dim, kw, dw) x:transpose(dim1, dim2)
20 The Core Library (TH+torch) Avoid memory copies All C functions are of the form THTensor_foobar(THTensor *dst, THTensor *src1, THTensor *src2) The Lua interface supports the destination as an optional argument: torch.foobar([dst], src1, src2) or dst:foobar(src1, src2)
21 The Glue luat and cwrap
22 C/Lua Glue Lua is a stack-based language In Lua: In C: a = f("how", t.x, 14) How to extend it? (luat or FFI) Lua provides the userdata type (a GCed C pointer) Need to do proper type checking both in C and Lua How to limit the pain? (cwrap or FFI)
23 C/Lua Glue (luat)
24 C/Lua Glue (luat) check arguments on the stack
25 C/Lua Glue (luat) check arguments on the stack push the result
26 C/Lua Glue (luat) check arguments on the stack push the result call TH
27 C/Lua Glue (cwrap)
28 C/Lua Glue (cwrap) for all tensor types
29 C/Lua Glue (cwrap) for all tensor types torch.cmul(dst, src) or dst:cmul(src)
30 C/Lua Glue (cwrap)
31 C/Lua Glue (cwrap) another default value
32 C/Lua Glue (cwrap) two different C functions another default value
33
34 C/Lua Glue (FFI) Cut-and-paste the C header Caveats: Need to do proper argument checking (sub-classes ) Need to overload methods/functions properly Less robust across different systems (#define ) Need also to handle different types (templates)
35 Tensor Types
36 Dynamic Typing (lua) Write generic code with torch.tensor() Specify alias with (e.g.) torch.setdefaulttensortype( torch.floattensor )
37 Dynamic Typing (lua) Write generic code with torch.tensor() Specify alias with (e.g.) torch.setdefaulttensortype( torch.floattensor ) return a torch.tensor()
38 CUDA cutorch package defines CudaTensor Most CPU Tensor methods are available Relies on few custom-made iterators and thrust x:cuda() or x:float() for GPU <-> CPU copies cunn package provides a GPU backend to nn Most popular nn layers have a GPU backend
39 Deep Learning nn
40 nn package
41 nn package
42 nn package Two main classes: Module and Criterion Three main methods: updateoutput(input) updategradinput(input, gradoutput) accgradparameters(input, gradoutput)
43 nn package
44 Tutorial Torch7 nn package itorch Installation instructions Get the tutorial
Deep Learning with Torch
Deep Learning with Torch The good, the bad, the ugly since 2002 Jimmy Ba jimmy@psi.utoronto.ca What is Torch? Year 2012 Google Answer: Torch7 provides a Matlab-like environment for state-of-the-art machine
More informationDeep Learning Frameworks. COSC 7336: Advanced Natural Language Processing Fall 2017
Deep Learning Frameworks COSC 7336: Advanced Natural Language Processing Fall 2017 Today s lecture Deep learning software overview TensorFlow Keras Practical Graphical Processing Unit (GPU) From graphical
More information21 Implementing Neural Networks Efficiently
21 Implementing Neural Networks Efficiently Ronan Collobert 1,KorayKavukcuoglu 2,andClémentFarabet 3,4 1 Idiap Research Institute Martigny, Switzerland 2 NEC Laboratories America Princeton, NJ, USA 3 Courant
More informationImplementing Neural Networks Efficiently
Implementing Neural Networks Efficiently Ronan Collobert 1, Koray Kavukcuoglu 2, and Clément Farabet 3,4 1 Idiap Research Institute Martigny, Switzerland 2 NEC Laboratories America Princeton, NJ, USA 3
More informationMapping C++ AMP to OpenCL / HSA Wen-Heng Jack Chung
Mapping C++ AMP to OpenCL / HSA Wen-Heng Jack Chung 1 MulticoreWare Founded in 2009 Largest Independent OpenCL Team Locations Changchun Champaign Beijing St. Louis Taiwan Sunnyvale
More informationTitan: A System Programming Language made for Lua
Titan: A System Programming Language made for Lua Hugo Musso Gualandi, PUC-Rio in collaboration with André Maidl, Fabio Mascarenhas, Gabriel Ligneul and Hisham Muhammad Part 1: Why Titan We started out
More informationarxiv: v2 [cs.ne] 17 Dec 2015
rnn : Recurrent Library for Torch7 arxiv:1511.07889v2 [cs.ne] 17 Dec 2015 Nicholas Léonard Element Inc. New York, NY nick@nikopia.org Sagar Waghmare Element Inc. New York, NY sw@discoverelement.com Jin-Hwa
More informationUsable while performant: the challenges building. Soumith Chintala
Usable while performant: the challenges building Soumith Chintala Problem Statement Deep Learning Workloads Problem Statement Deep Learning Workloads for epoch in range(max_epochs): for data, target in
More informationFrameworks in Python for Numeric Computation / ML
Frameworks in Python for Numeric Computation / ML Why use a framework? Why not use the built-in data structures? Why not write our own matrix multiplication function? Frameworks are needed not only because
More informationRNN LSTM and Deep Learning Libraries
RNN LSTM and Deep Learning Libraries UDRC Summer School Muhammad Awais m.a.rana@surrey.ac.uk Outline Recurrent Neural Network Application of RNN LSTM Caffe Torch Theano TensorFlow Flexibility of Recurrent
More informationPorting Fabric Engine to NVIDIA Unified Memory: A Case Study. Peter Zion Chief Architect Fabric Engine Inc.
Porting Fabric Engine to NVIDIA Unified Memory: A Case Study Peter Zion Chief Architect Fabric Engine Inc. What is Fabric Engine? A high-performance platform for building 3D content creation applications,
More informationNeural Network Exchange Format
Copyright Khronos Group 2017 - Page 1 Neural Network Exchange Format Deploying Trained Networks to Inference Engines Viktor Gyenes, specification editor Copyright Khronos Group 2017 - Page 2 Outlook The
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationMultiple Choice Questions. Chapter 5
Multiple Choice Questions Chapter 5 Each question has four choices. Choose most appropriate choice of the answer. 1. Developing program in high level language (i) facilitates portability of nonprocessor
More informationC++ (Non for C Programmer) (BT307) 40 Hours
C++ (Non for C Programmer) (BT307) 40 Hours Overview C++ is undoubtedly one of the most widely used programming language for implementing object-oriented systems. The C++ language is based on the popular
More informationStream Computing using Brook+
Stream Computing using Brook+ School of Electrical Engineering and Computer Science University of Central Florida Slides courtesy of P. Bhaniramka Outline Overview of Brook+ Brook+ Software Architecture
More informationTutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY
Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company
More informationVTA: Open & Flexible DL Acceleration. Thierry Moreau TVM Conference, Dec 12th 2018
VTA: Open & Flexible DL Acceleration Thierry Moreau TVM Conference, Dec 12th 2018 TVM Stack High-Level Differentiable IR Tensor Expression IR LLVM CUDA Metal TVM Stack High-Level Differentiable IR Tensor
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation
More informationProgram Exploitation Intro
Program Exploitation Intro x86 Assembly 04//2018 Security 1 Univeristà Ca Foscari, Venezia What is Program Exploitation "Making a program do something unexpected and not planned" The right bugs can be
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationA Short History of Array Computing in Python. Wolf Vollprecht, PyParis 2018
A Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - - Array computing in general History up to NumPy Libraries after NumPy - Pure Python libraries - JIT / AOT compilers - Deep
More informationW4118: PC Hardware and x86. Junfeng Yang
W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more
More informationSupporting Data Parallelism in Matcloud: Final Report
Supporting Data Parallelism in Matcloud: Final Report Yongpeng Zhang, Xing Wu 1 Overview Matcloud is an on-line service to run Matlab-like script on client s web browser. Internally it is accelerated by
More informationRelay: a high level differentiable IR. Jared Roesch TVMConf December 12th, 2018
Relay: a high level differentiable IR Jared Roesch TVMConf December 12th, 2018!1 This represents months of joint work with lots of great folks:!2 TVM Stack Optimization Relay High-Level Differentiable
More informationAdam Paszke, Sam Gross, Soumith Chintala, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia
Adam Paszke, Sam Gross, Soumith Chintala, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Alban Desmaison, Andreas Kopf, Edward Yang, Zach Devito,
More informationTensorFlow: A System for Learning-Scale Machine Learning. Google Brain
TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine learning is everywhere This is in large part due to: 1. Invention of more sophisticated machine learning models
More informationDECISION SUPPORT SYSTEM USING TENSOR FLOW
Volume 118 No. 24 2018 ISSN: 1314-3395 (on-line version) url: http://www.acadpubl.eu/hub/ http://www.acadpubl.eu/hub/ DECISION SUPPORT SYSTEM USING TENSOR FLOW D.Anji Reddy 1, G.Narasimha 2, K.Srinivas
More informationNVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS
TECHNICAL OVERVIEW NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS A Guide to the Optimized Framework Containers on NVIDIA GPU Cloud Introduction Artificial intelligence is helping to solve some of the most
More informationLesson 13 - Vectors Dynamic Data Storage
Lesson 13 - Vectors Dynamic Data Storage Summary In this lesson we introduce the Standard Template Library by demonstrating the use of Vectors to provide dynamic storage of data elements. New Concepts
More informationLecture 13: OpenGL Shading Language (GLSL)
Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 18, 2018 1/56 Motivation } Last week, we discussed the many of the new tricks in Graphics require low-level access to the Graphics
More informationOptimization Techniques
Smalltalk Implementation: Optimization Techniques Prof. Harry Porter Portland State University 1 Optimization Ideas Just-In-Time (JIT) compiling When a method is first invoked, compile it into native code.
More informationC++ for System Developers with Design Pattern
C++ for System Developers with Design Pattern Introduction: This course introduces the C++ language for use on real time and embedded applications. The first part of the course focuses on the language
More informationOutline 2011/10/8. Memory Management. Kernels. Matrix multiplication. CIS 565 Fall 2011 Qing Sun
Outline Memory Management CIS 565 Fall 2011 Qing Sun sunqing@seas.upenn.edu Kernels Matrix multiplication Managing Memory CPU and GPU have separate memory spaces Host (CPU) code manages device (GPU) memory
More informationHow Software Executes
How Software Executes CS-576 Systems Security Instructor: Georgios Portokalidis Overview Introduction Anatomy of a program Basic assembly Anatomy of function calls (and returns) Memory Safety Programming
More informationtrisycl Open Source C++17 & OpenMP-based OpenCL SYCL prototype Ronan Keryell 05/12/2015 IWOCL 2015 SYCL Tutorial Khronos OpenCL SYCL committee
trisycl Open Source C++17 & OpenMP-based OpenCL SYCL prototype Ronan Keryell Khronos OpenCL SYCL committee 05/12/2015 IWOCL 2015 SYCL Tutorial OpenCL SYCL committee work... Weekly telephone meeting Define
More informationThe Cut and Thrust of CUDA
The Cut and Thrust of CUDA Luke Hodkinson Center for Astrophysics and Supercomputing Swinburne University of Technology Melbourne, Hawthorn 32000, Australia May 16, 2013 Luke Hodkinson The Cut and Thrust
More informationHKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro
HKG18-417 OpenCL Support by NNVM & TVM Jammy Zhou - Linaro Agenda OpenCL Overview OpenCL in NNVM & TVM Current Status OpenCL Introduction Open Computing Language Open standard maintained by Khronos with
More informationDesign Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8
Subroutines and Control Abstraction Textbook, Chapter 8 1 Subroutines and Control Abstraction Mechanisms for process abstraction Single entry (except FORTRAN, PL/I) Caller is suspended Control returns
More informationJCudaMP: OpenMP/Java on CUDA
JCudaMP: OpenMP/Java on CUDA Georg Dotzler, Ronald Veldema, Michael Klemm Programming Systems Group Martensstraße 3 91058 Erlangen Motivation Write once, run anywhere - Java Slogan created by Sun Microsystems
More informationIntermediate Representations
Intermediate Representations Where In The Course Are We? Rest of the course: compiler writer needs to choose among alternatives Choices affect the quality of compiled code time to compile There may be
More informationInterview Questions of C++
Interview Questions of C++ Q-1 What is the full form of OOPS? Ans: Object Oriented Programming System. Q-2 What is a class? Ans: Class is a blue print which reflects the entities attributes and actions.
More informationAbout Codefrux While the current trends around the world are based on the internet, mobile and its applications, we try to make the most out of it. As for us, we are a well established IT professionals
More informationDeep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity
Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms
More informationEnable AI on Mobile Devices
Enable AI on Mobile Devices Scott Wang 王舒翀 Senior Segment Manager Mobile, BSG ARM Tech Forum 2017 14 th June 2017, Shenzhen AI is moving from core to edge Ubiquitous AI Safe and autonomous Mixed reality
More informationIntroduction to Internet of Things Prof. Sudip Misra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Introduction to Internet of Things Prof. Sudip Misra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 23 Introduction to Arduino- II Hi. Now, we will continue
More informationCompiling CUDA and Other Languages for GPUs. Vinod Grover and Yuan Lin
Compiling CUDA and Other Languages for GPUs Vinod Grover and Yuan Lin Agenda Vision Compiler Architecture Scenarios SDK Components Roadmap Deep Dive SDK Samples Demos Vision Build a platform for GPU computing
More informationTVM: An Automated End-to-End Optimizing Compiler for Deep Learning
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos
More informationContents. About This Book...1
Contents About This Book...1 Chapter 1: Basic Concepts...5 Overview...6 SAS Programs...7 SAS Libraries...13 Referencing SAS Files...15 SAS Data Sets...18 Variable Attributes...21 Summary...26 Practice...28
More informationMeta-Programming and JIT Compilation
Meta-Programming and JIT Compilation Sean Treichler 1 Portability vs. Performance Many scientific codes sp ~100% of their cycles in a tiny fraction of the code base We want these kernels to be as fast
More informationObject Oriented Programming with c++ Question Bank
Object Oriented Programming with c++ Question Bank UNIT-1: Introduction to C++ 1. Describe the following characteristics of OOP. i Encapsulation ii Polymorphism, iii Inheritance 2. Discuss function prototyping,
More informationMachine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center
Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction
More informationECE 2400 Computer Systems Programming, Fall 2017 Programming Assignment 2: List and Vector Data Structures
ECE 2400 Computer Systems Programming, Fall 2017 Programming Assignment 2: List and Vector Data Structures School of Electrical and Computer Engineering Cornell University revision: 2017-10-01-16-30 1.
More informationSt. MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad
St. MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad-00 014 Subject: PPL Class : CSE III 1 P a g e DEPARTMENT COMPUTER SCIENCE AND ENGINEERING S No QUESTION Blooms Course taxonomy level Outcomes UNIT-I
More informationIntroduction to Click
Introduction to Click ECE544 Communication Networks II Francesco Bronzino Includes teaching material from Bart Braem and Michael Voorhaen Click Modular Router Extensible toolkit for writing packet processors
More informationOverview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++ Performance, memory
SCRIPTING Overview Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++ Reflection Bindings Serialization Performance, memory Rationale C++ isn't the best choice
More informationConcepts Introduced in Chapter 7
Concepts Introduced in Chapter 7 Storage Allocation Strategies Static Stack Heap Activation Records Access to Nonlocal Names Access links followed by Fig. 7.1 EECS 665 Compiler Construction 1 Activation
More informationOperating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst
Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst Department of Computer Science Why C? Low-level Direct access to memory WYSIWYG (more or less) Effectively
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationBinding to YARA with LuaJIT
Binding to YARA with LuaJIT $ whoami Team Lead @ Kong OpenResty/ModSecurity contributor Previous Dreamhost, Zenedge YARA Overview Pattern-matching Swiss Army knife Used in malware research More generically,
More informationOpenCV. OpenCV Tutorials OpenCV User Guide OpenCV API Reference. docs.opencv.org. F. Xabier Albizuri
OpenCV OpenCV Tutorials OpenCV User Guide OpenCV API Reference docs.opencv.org F. Xabier Albizuri - 2014 OpenCV Tutorials OpenCV Tutorials: Introduction to OpenCV The Core Functionality (core module) Image
More informationAn Introduction to NNs using Keras
An Introduction to NNs using Keras Michela Paganini michela.paganini@cern.ch Yale University 1 Keras Modular, powerful and intuitive Deep Learning python library built on Theano and TensorFlow Minimalist,
More informationFull file at
Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 2 Multiple Choice 1. A is an example of a systems program. A) command
More informationDynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California
Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff daniel.egloff@quantalea.net +41 44 520 01 17 +41 79 430 03 61 About Us! Software development and consulting company!
More informationProgramming Languages Third Edition. Chapter 7 Basic Semantics
Programming Languages Third Edition Chapter 7 Basic Semantics Objectives Understand attributes, binding, and semantic functions Understand declarations, blocks, and scope Learn how to construct a symbol
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationTopics. bool and string types input/output library functions comments memory allocation templates classes
C++ Primer C++ is a major extension of c. It is similar to Java. The lectures in this course use pseudo-code (not C++). The textbook contains C++. The labs involve C++ programming. This lecture covers
More informationMax and Programming Is Max a Programming Language?
There are several questions that come up from time to time on the Max discussion list. Is Max a real programming language? if so how do I do [loop, switch, bitmap, recursion] and other programming tricks
More informationAPL on GPUs A Progress Report with a Touch of Machine Learning
APL on GPUs A Progress Report with a Touch of Machine Learning Martin Elsman, DIKU, University of Copenhagen Joined work with Troels Henriksen and Cosmin Oancea @ Dyalog 17, Elsinore 1 Motivation Goal:
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationRPM Python (and friends) Paul Nasrat. Copyright 2004 Paul NasratRed Hat Inc.
RPM Python (and friends) Paul Nasrat Copyright 2004 Paul NasratRed Hat Inc. Summary Background of rpm python bindings Reasons for interacting with RPM RPM python itself Friends Copyright 2004 Paul NasratRed
More informationNVIDIA DIGITS CONTAINER
NVIDIA DIGITS CONTAINER DU-09194-001 _v1.0 January 2019 User Guide TABLE OF CONTENTS Chapter 1. Overview... 1 Chapter 2. Creating A Dataset Using Data From An S3 Endpoint... 2 Chapter 3. Writing a DIGITS
More informationCompuScholar, Inc. Alignment to Nevada "Computer Science" Course Standards
CompuScholar, Inc. Alignment to Nevada "Computer Science" Course Standards Nevada Course Details: Course Name: Computer Science Primary Cluster: Information and Media Technologies Standards Course Code(s):
More informationOverview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++
Scripting 1 Overview Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++ Rationale C++ isn't the best choice for all problems Complicated feature set, syntax Low-level,
More informationHETEROGENEOUS MEMORY MANAGEMENT. Linux Plumbers Conference Jérôme Glisse
HETEROGENEOUS MEMORY MANAGEMENT Linux Plumbers Conference 2018 Jérôme Glisse EVERYTHING IS A POINTER All data structures rely on pointers, explicitly or implicitly: Explicit in languages like C, C++,...
More informationA GPU-Accelerated Node Based Framework for Hair Simulation and Rendering
A GPU-Accelerated Node Based Framework for Hair Simulation and Rendering Francesco Giordana Sarah Macdonald Gianluca Vatinno Double Negative VFX double negative visual effects Hair Creatures: Digi-doubles
More informationIntroduction to Computers and C++ Programming p. 1 Computer Systems p. 2 Hardware p. 2 Software p. 7 High-Level Languages p. 8 Compilers p.
Introduction to Computers and C++ Programming p. 1 Computer Systems p. 2 Hardware p. 2 Software p. 7 High-Level Languages p. 8 Compilers p. 9 Self-Test Exercises p. 11 History Note p. 12 Programming and
More informationNVIDIA DLI HANDS-ON TRAINING COURSE CATALOG
NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG Valid Through July 31, 2018 INTRODUCTION The NVIDIA Deep Learning Institute (DLI) trains developers, data scientists, and researchers on how to use artificial
More informationvoid setup(){ void loop() { The above setup works, however the function is limited in the fact it can not be reused easily. To make the code more gene
Passing arrays to functions A big topic for beginners is how to write a function that can be passed an array. A very common way of achieving this is done using pointers. This method can be seen all through
More informationIntroduction to Programming Using Java (98-388)
Introduction to Programming Using Java (98-388) Understand Java fundamentals Describe the use of main in a Java application Signature of main, why it is static; how to consume an instance of your own class;
More informationCaching and Buffering in HDF5
Caching and Buffering in HDF5 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 Software stack Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5
More informationORG ; TWO. Assembly Language Programming
Dec 2 Hex 2 Bin 00000010 ORG ; TWO Assembly Language Programming OBJECTIVES this chapter enables the student to: Explain the difference between Assembly language instructions and pseudo-instructions. Identify
More informationCIS 190: C/C++ Programming. Lecture 12 Student Choice
CIS 190: C/C++ Programming Lecture 12 Student Choice Outline Hash Maps Collisions Using Open Addressing Collisions Chaining Collisions In C++ C++ STL Containers C++ GUI Resources Hash Maps (AKA Hash Tables)
More informationLecture 18 Tao Wang 1
Lecture 18 Tao Wang 1 Abstract Data Types in C++ (Classes) A procedural program consists of one or more algorithms that have been written in computerreadable language Input and display of program output
More informationWu Zhiwen.
Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms
More informationExperience with MELCOR user defined extensions in C and Lua. Paul Boneham - Jacobsen Analytics Ltd
Experience with MELCOR user defined extensions in C and Lua Paul Boneham - Jacobsen Analytics Ltd www.jacobsen-analytics.com Presentation at EMUG Zagreb Croatia - 25 to 27 April 2018 Overview MELCOR capabilities
More informationMetaprogramming. CS315B Lecture 7
Metaprogramming CS315B Lecture 7 Prof. Aiken CS 315B Lecture 7 1 Projects Time to start thinking about projects! A regent program/library of your choosing List of suggested projects will be published later
More informationDeclara've Parallel Analysis in ROOT: TDataFrame
Declara've Parallel Analysis in ROOT: TDataFrame D. Piparo For the ROOT Team CERN EP-SFT Introduction Novel way to interact with ROOT columnar format Inspired by tools such as Pandas or Spark Analysis
More informationThe Anatomy of Deep Learning Frameworks* *Everything you wanted to know about DL Frameworks but were afraid to ask
The Anatomy of Deep Learning Frameworks* *Everything you wanted to know about DL Frameworks but were afraid to ask $whoami Master s Student in CS @ ETH Zurich, bachelors from BITS Pilani Contributor to
More informationExperiences with Achieving Portability across Heterogeneous Architectures
Experiences with Achieving Portability across Heterogeneous Architectures Lukasz G. Szafaryn +, Todd Gamblin ++, Bronis R. de Supinski ++ and Kevin Skadron + + University of Virginia ++ Lawrence Livermore
More informationSkip the FFI! Embedding Clang for C Interoperability. Jordan Rose Compiler Engineer, Apple. John McCall Compiler Engineer, Apple
Skip the FFI! Embedding Clang for C Interoperability Jordan Rose Compiler Engineer, Apple John McCall Compiler Engineer, Apple Problem Problem Languages don t exist in a vacuum Problem Languages don t
More informationCUDA Development Using NVIDIA Nsight, Eclipse Edition. David Goodwin
CUDA Development Using NVIDIA Nsight, Eclipse Edition David Goodwin NVIDIA Nsight Eclipse Edition CUDA Integrated Development Environment Project Management Edit Build Debug Profile SC'12 2 Powered By
More informationIntroduction to C++ Introduction to C++ 1
1 What Is C++? (Mostly) an extension of C to include: Classes Templates Inheritance and Multiple Inheritance Function and Operator Overloading New (and better) Standard Library References and Reference
More informationMocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT
Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision:
More informationCSE 4/521 Introduction to Operating Systems
CSE 4/521 Introduction to Operating Systems Lecture 3 Operating Systems Structures (Operating-System Services, User and Operating-System Interface, System Calls, Types of System Calls, System Programs,
More informationApplications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?
Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter
More informationNumbaPro CUDA Python. Square matrix multiplication
NumbaPro Enables parallel programming in Python Support various entry points: Low-level (CUDA-C like) programming language High-level array oriented interface CUDA library bindings Also support multicore
More informationDay 1 Lecture 6. Software Frameworks for Deep Learning
Day 1 Lecture 6 Software Frameworks for Deep Learning Packages Caffe Theano NVIDIA Digits Lasagne Keras Blocks Torch TensorFlow MxNet MatConvNet Nervana Neon Leaf Caffe Deep learning framework from Berkeley
More informationNVJPEG. DA _v0.2.0 October nvjpeg Libary Guide
NVJPEG DA-06762-001_v0.2.0 October 2018 Libary Guide TABLE OF CONTENTS Chapter 1. Introduction...1 Chapter 2. Using the Library... 3 2.1. Single Image Decoding... 3 2.3. Batched Image Decoding... 6 2.4.
More information