Using Metal 2 for Compute
|
|
- Aleesha Miles
- 6 years ago
- Views:
Transcription
1 Session Graphics and Games #WWDC17 Using Metal 2 for Compute 608 Anna Tikhonova, GPU Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission from Apple.
2 Metal 2 Ecosystem Metal API and language GPU Tools MetalKit Metal Performance Shaders Metal 2
3 Metal 2 Ecosystem Metal API and language GPU Tools MetalKit Metal Performance Shaders Metal 2
4 Metal Performance Shaders (MPS) GPU accelerated primitives Image Processing Linear Algebra Machine Learning Inference Optimized for ios What s New in Metal, Part 2 WWDC 2016 What s New in Metal, Part 2 WWDC 2015
5 Metal Performance Shaders (MPS) NEW GPU accelerated primitives Image Processing Linear Algebra Machine Learning Inference Optimized for ios and macos What s New in Metal, Part 2 WWDC 2016 What s New in Metal, Part 2 WWDC 2015
6 Image Processing
7 Image Processing Primitives available in ios 10 Convolution Equalization and Specification Gaussian Blur Median Box, Tent Thresholding Sobel Transpose Morphology Image Integral Lanczos Resampling Color Conversion Histogram Gaussian Pyramid
8 Image Processing New primitives NEW Image Keypoints Bilinear Rescale Image Statistics Element-wise Arithmetic Operations With broadcasting
9 Linear Algebra
10 Linear Algebra New primitives NEW Matrix-Matrix Multiplication Matrix-Vector Multiplication Triangular Matrix Factorization and Linear Solvers
11 Data Representations MPSVector Interprets data in MTLBuffer as a 1-dimensional array
12 Data Representations MPSVector Interprets data in MTLBuffer as a 1-dimensional array MPSMatrix Interprets data in MTLBuffer as a rectangular array Row-major order
13 Data Representations MPSVector Interprets data in MTLBuffer as a 1-dimensional array MPSMatrix Interprets data in MTLBuffer as a rectangular array Row-major order MPSTemporaryMatrix Allocated from MTLHeap Use for most of your intermediate matrices
14 MPSVector and MPSMatrix Input types Single Precision Floating-Point Half Precision Floating-Point 16-bit Signed Integer 8-bit Signed Integer
15 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)
16 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)
17 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)
18 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)
19 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)
20 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)
21 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)
22 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)
23 Primitives Matrix-Matrix and Matrix-Vector Multiplication API modeled after standard BLAS GEMM and GEMV interfaces Triangular Matrix Factorization and Linear Solvers API modeled after standard LAPACK decomposition and solve interfaces
24 // Example: Matrix-Matrix Multiply: C = A B // Create matrices A, B and C let A = MPSMatrix(buffer: ABuffer, descriptor: MPSMatrixDescriptor(rows: M, columns: K, rowbytes: ARowBytes, datatype:.float32)) let B = MPSMatrix(buffer: BBuffer, descriptor: MPSMatrixDescriptor(rows: K, columns: N, rowbytes: BRowBytes, datatype:.float32)) let C = MPSMatrix(buffer: CBuffer, descriptor: MPSMatrixDescriptor(rows: M, columns: N, rowbytes: CRowBytes, datatype:.float32))
25 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()
26 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()
27 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()
28 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()
29 Sample Code MPSMatrixMultiplication Triangular Matrix Factorization and Linear Solvers Coming soon
30 Machine Learning
31 Machine Learning at Apple Architecture Applications Domain Specific Frameworks Vision NLP ML Framework Core ML ML Performance Primitives Accelerate MPS
32 What Is Deep Learning?
33
34
35 panda
36
37 house ocean dress dog girl sunset bicycle giraffe horse ramp man plant skateboard lights
38 Training and Inference cat rabbit dog giraffe horse Training to Classify Images
39 Training giraffe cat rabbit dog dog cat cat rabbit rabbit horse horse dog cat rabbit dog giraffe horse Training to Classify Images
40 Training cat rabbit dog giraffe horse Training to Classify Images
41 Training cat rabbit dog giraffe Trained Parameters horse Training to Classify Images
42 Inference cat rabbit dog giraffe Trained Parameters horse Training to Classify Images
43 Inference Input Image cat rabbit CNN dog cat giraffe horse Inference Training to Classify Images
44 Agenda Recap on Convolutional Neural Networks (CNN) What s New in Metal, Part 2 WWDC 2016
45 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)
46 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)
47 What Are Convolutional Neural Networks?
48 Convolutional Neural Networks Biologically-inspired, resemble the visual cortex
49 Convolutional Neural Networks Biologically-inspired, resemble the visual cortex Hierarchical representation Organized into a hierarchy of layers Higher-level features are derived from lower-level features
50 Convolutional Neural Networks Biologically-inspired, resemble the visual cortex Hierarchical representation Organized into a hierarchy of layers Higher-level features are derived from lower-level features Think of a feature as a filter that filters data for that feature
51 Convolutional Neural Networks Primitives available in ios 10 Convolution Fully-Connected Pooling Average Max Normalization Cross-Channel Local Contrast Spatial Softmax Neuron Linear ReLU Sigmoid TanH Absolute
52 Convolutional Neural Networks Primitives available in ios 10 Convolution Fully-Connected Pooling Average Max Normalization Cross-Channel Local Contrast Spatial Softmax Neuron Linear ReLU Sigmoid TanH Absolute
53 Convolution Core building block Recognizes features in input
54 1 filter 3 x 3 1-channel input 1-channel output
55 1 filter 3 x 3 1-channel input 1-channel output
56 1 filter 3 x 3 1-channel input 1-channel output
57 1 filter 3 x 3 1-channel input 1-channel output
58 1 filter 3 x 3 1-channel input 1-channel output
59 1 filter 3 x 3 1-channel input 1-channel output
60 16 5x5 filters 3-channel input 40 x channel output 40 x 40
61 3*16 5x5 filters 3-channel input 40 x channel output 40 x 40
62 3*16 5x5 filters 3-channel input 40 x channel output 40 x 40
63 3*16 5x5 filters 3-channel input 40 x channel output 40 x 40
64 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)
65 Convolutional Neural Networks New primitives NEW New Convolution weight types Binary and XNOR Convolution Sub-Pixel Convolution Dilated Convolution Convolution Transpose L2Norm Pooling Dilated Max Pooling Log Softmax Resampling Lanczos, Bilinear Upsampling Arithmetic Operators Addition, Subtraction, Multiplication, Division New Neuron layers Hard Sigmoid, SoftPlus, SoftSign, ELU
66 Convolutional Neural Networks New primitives NEW New Convolution weight types Binary and XNOR Convolution Sub-Pixel Convolution Dilated Convolution Convolution Transpose L2Norm Pooling Dilated Max Pooling Log Softmax Resampling Lanczos, Bilinear Upsampling Arithmetic Operators Addition, Subtraction, Multiplication, Division New Neuron layers Hard Sigmoid, SoftPlus, SoftSign, ELU
67 Convolution Filter weight types NEW Single Precision Floating-Point To reduce memory footprint and improve performance Half Precision Floating-Point 8-bit Integer Binary
68 Convolution Primitives NEW Standard Binary and XNOR Dilated Sub-Pixel Transpose
69 Binary and XNOR Convolution Same operation as regular Convolution Input Weights Improved performance Regular Convolution Less memory
70 Binary and XNOR Convolution Binary Convolution Input Weights Full-sized input, binary weights Regular Convolution Binary Convolution
71 Binary and XNOR Convolution Binary Convolution Input Weights Full-sized input, binary weights Regular Convolution XNOR Convolution Binary input, binary weights Binary Convolution XNOR Convolution
72 Dilated Convolution Comparison to regular convolution Input Output
73 Dilated Convolution Comparison to regular convolution Input Output
74 Dilated Convolution Comparison to regular convolution Input Output 3 x 3 kernel
75 Dilated Convolution Comparison to regular convolution Input Output 3 x 3 kernel
76 Dilated Convolution How it works Input Output 3 x 3 kernel dilationfactorx = 2 dilationfactory = 2
77 Dilated Convolution How it works Input Output 3 x 3 kernel dilationfactorx = 2 dilationfactory = 2
78 Sub-Pixel Convolution and Convolution Transpose Commonly used for upscaling
79 Upscaling Using a box filter Fixed operation with a constant filter Input W x H Output 2W x 2H
80 Upscaling Using a box filter Fixed operation with a constant filter Input W x H Output 2W x 2H
81 Upscaling Using a box filter Fixed operation with a constant filter Input W x H Output 2W x 2H
82 Sub-Pixel Convolution How it works Trained Parameters One-channel input W x H 4 filters for 2x upscaling One-channel output 2W x 2H
83 Sub-Pixel Convolution How it works One-channel input W x H 4 filters for 2x upscaling One-channel output 2W x 2H
84 Sub-Pixel Convolution How it works Reshuffle One-channel input W x H 4 filters for 2x upscaling One-channel output 2W x 2H
85 Convolution Transpose How it works Input W x H
86 Convolution Transpose How it works Input W x H
87 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H
88 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H
89 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H
90 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H
91 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H
92 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H
93 New Convolution Primitives Example: colorizing black and white images
94 New Convolution Primitives Example: colorizing black and white images Input Output Convolution Dilated Convolution Batch Normalization Convolution Transpose SoftMax Colorization network* *Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016,
95 New Convolution Primitives Example: colorizing black and white images Dilated Convolution integrate wider global context Convolution Dilated Convolution Batch Normalization Convolution Transpose SoftMax Colorization network* *Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016,
96 New Convolution Primitives Example: colorizing black and white images Dilated Convolution integrate wider global context Convolution Transpose upscale output Convolution Dilated Convolution Batch Normalization Convolution Transpose SoftMax Colorization network* *Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016,
97 Demo Image colorization
98 Performance Improvements in ios Higher is better Percentage Improvement 20 0 iphone 6S iphone 7 Plus ipad Pro 9.7 ipad Pro 10.5" Inception-v3 network *Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, CVPR 2015,
99 Performance Improvements in ios Higher is better Percentage Improvement 20 22% 22% 29% 21% 0 iphone 6S iphone 7 Plus ipad Pro 9.7 ipad Pro 10.5" Inception-v3 network *Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, CVPR 2015,
100 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)
101 Neural Network Graph API Overview NEW Describe neural network using graph API
102 Neural Network Graph API Overview NEW Describe neural network using graph API Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
103 Neural Network Graph API Overview NEW Describe neural network using graph API Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
104 Neural Network Graph API Overview NEW Describe neural network using graph API Filter nodes Operations Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
105 Neural Network Graph API Overview NEW Describe neural network using graph API Filter nodes Operations Image nodes Data Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
106 Neural Network Graph API Ease of use Compact representation
107 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding)
108 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse
109 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call
110 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call No intermediate images to manage, just input/output
111 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call No intermediate images to manage, just input/output Auto-configuration of image sizes, padding, centering
112 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call No intermediate images to manage, just input/output Auto-configuration of image sizes, padding, centering MetalImageRecognition code sample* 4x less code with NN Graph API
113 Neural Network Graph API Deliver best performance Easy to parallelize between CPU and GPU
114 Neural Network Graph API Deliver best performance Easy to parallelize between CPU and GPU Fuse graph nodes
115 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
116 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
117 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Optimize away Concatenation nodes Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
118 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Optimize away Concatenation nodes Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image
119 Filter Nodes Convolution node Create a MPSNNConvolutionNode with data source provider let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat ))
120 Filter Nodes Convolution node Create a MPSNNConvolutionNode with data source provider let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat ))
121 Filter Nodes Convolution node Create a MPSNNConvolutionNode with data source provider let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat ))
122 Feeding Parameters to Convolution Layer Just-in-time loading and purging of weights data Minimize memory footprint class MyWeights: NSObject, MPSCNNConvolutionDataSource { // Initialize the data source object init(file: String) { } } public func load() -> Bool { } public func descriptor() -> MPSCNNConvolutionDescriptor { } public func weights() -> UnsafeMutableRawPointer { } public func purge() { }
123 Feeding Parameters to Convolution Layer Just-in-time loading and purging of weights data Minimize memory footprint class MyWeights: NSObject, MPSCNNConvolutionDataSource { // Initialize the data source object init(file: String) { } } public func load() -> Bool { } public func descriptor() -> MPSCNNConvolutionDescriptor { } public func weights() -> UnsafeMutableRawPointer { } public func purge() { }
124 Feeding Parameters to Convolution Layer Just-in-time loading and purging of weights data Minimize memory footprint class MyWeights: NSObject, MPSCNNConvolutionDataSource { // Initialize the data source object init(file: String) { } } public func load() -> Bool { } public func descriptor() -> MPSCNNConvolutionDescriptor { } public func weights() -> UnsafeMutableRawPointer { } public func purge() { }
125 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 pool1 conv2 pool2 conv3 pool3 conv4 fc1 fc2 }
126 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) pool1 conv2 pool2 conv3 pool3 conv4 fc1 fc2 }
127 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 pool1 let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) let pool1 = MPSCNNPoolingMaxNode(source: conv1.resultimage, filtersize: 2) conv2 pool2 conv3 pool3 conv4 fc1 fc2 }
128 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 pool1 conv2 pool2 conv3 pool3 conv4 let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) let pool1 = MPSCNNPoolingMaxNode(source: conv1.resultimage, filtersize: 2) let conv2 = MPSCNNConvolutionNode(source: pool1.resultimage, weights: MyWeights(file: conv2.dat )) let pool2 = MPSCNNPoolingMaxNode(source: conv2.resultimage, filtersize: 2) let conv3 = MPSCNNConvolutionNode(source: pool2.resultimage, weights: MyWeights(file: conv3.dat )) let pool3 = MPSCNNPoolingMaxNode(source: conv3.resultimage, filtersize: 2) let conv4 = MPSCNNConvolutionNode(source: pool3.resultimage, weights: MyWeights(file: conv4.dat )) let fc1 = MPSCNNFullyConnectedNode(source: conv4.resultimage, weights: MyWeights(file: fc1.dat )) fc1 fc2 } let fc2 = MPSCNNFullyConnectedNode(source: return fc1.resultimage, weights: MyWeights(file: fc2.dat )) fc2.resultimage
129 // Example: create a graph func makegraph() -> MPSNNImageNode { let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) let pool1 = MPSCNNPoolingMaxNode(source: conv1.resultimage, filtersize: 2) let conv2 = MPSCNNConvolutionNode(source: pool1.resultimage, weights: MyWeights(file: conv2.dat )) let pool2 = MPSCNNPoolingMaxNode(source: conv2.resultimage, filtersize: 2) let conv3 = MPSCNNConvolutionNode(source: pool2.resultimage, weights: MyWeights(file: conv3.dat )) let pool3 = MPSCNNPoolingMaxNode(source: conv3.resultimage, filtersize: 2) let conv4 = MPSCNNConvolutionNode(source: pool3.resultimage, weights: MyWeights(file: conv4.dat )) let fc1 = MPSCNNFullyConnectedNode(source: conv4.resultimage, weights: MyWeights(file: fc1.dat )) let fc2 = MPSCNNFullyConnectedNode(source: fc1 fc1.resultimage, weights: MyWeights(file: fc2.dat )) } return fc2 fc2.resultimage
130 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()
131 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()
132 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()
133 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()
134 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()
135 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()
136 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()
137 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! task1 let commandqueue = device.makecommandqueue() CPU GPU encode Bubble let commandbuffer = commandqueue.makecommandbuffer() execute task1 // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted() encode task2 Bubble encode task2 Bubble encode task2 time Bubble execute task2 Bubble execute task2 Bubble
138 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task
139 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task
140 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task
141 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task
142 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task
143 // Example: execute graph on the GPU asynchronously // Metal setup CPU encode let device = MTLCreateSystemDefaultDevice()! task1 GPU // Initialize graph encode task2 execute task1 let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image encode task3 let input = MPSImage(texture: texture, ) execute task2 // Encode graph let output = graph?.executeasync(sourceimages: [input]) { } resultimage, error in // check for error and use resultimage inside closure // Don t wait, encode new GPU task encode task4 encode task5 encode task6 time execute task3 execute task4 execute task5
144 Demo Inception-v3 using Neural Network Graph API
145 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)
146 What Are Recurrent Neural Networks?
147 CNN One - to - one One input Image
148 CNN One - to - one CNN dog grass Inference One input Image One output Set of probabilities
149 RNN Sequences: one - to - many CNN Inference
150 RNN Sequences: one - to - many CNN RNN A black and white dog laying in the grass Inference Inference One input Set of probabilities Sequence of outputs Words / image caption
151 RNN Sequences: many - to - many A black and RNN white dog laying in the grass Inference Sequence of inputs Sentence in English
152 RNN Sequences: many - to - many A black and white dog laying in the grass RNN Чёрно-белая собака лежит на траве Mustan ja valkoisen värinen koira makaa ruohikolla Inference Sequence of inputs Sentence in English Sequence of outputs Translated sentence
153 Recurrent Neural Networks New primitives NEW Single Gate Long Short-Term Memory (LSTM) Gated Recurrent Unit (GRU) Minimally Gated Unit (MGU)
154 Single Gate RNN Recurrent Unit enables previous output to affect Output the output of subsequent iterations Recurrent Unit Input
155 Long Short-Term Memory (LSTM) Built from Single Gate RNNs Output Has an internal Memory Cell Gates control information flow inside the LSTM LSTM and what is stored in the Memory Cell Input
156 Long Short-Term Memory (LSTM) Built from Single Gate RNNs Output Has an internal Memory Cell Gates control information flow inside the LSTM LSTM and what is stored in the Memory Cell Memory Cell Input
157 LSTM Architecture Output LSTM Memory Cell Input
158 LSTM Architecture LSTM Memory Cell
159 LSTM Architecture LSTM Old Memory New Memory
160 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply LSTM * + Point-wise operations What to keep from old memory Old Memory * New Memory Previous Output Input M M Forget Gate
161 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory How new input affects new memory Old Memory * New Memory * Previous Output M Forget Previous Output M Input Previous Output M Cell Input M Gate Input M Gate Input M Gate
162 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory How new input affects new memory Old Memory * New Memory * Previous Output M Forget Previous Output M Input Previous Output M Cell Input M Gate Input M Gate Input M Gate
163 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory How new input affects new memory Old Memory * New + Memory * Previous Output Input M M Forget Gate Previous Output Input M M Input Gate Previous Output Input M M Cell Gate
164 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory Output How new input affects new memory Previous Output Input M M Output Gate * How previous output, current input, new memory affect new output Old Memory * New Memory * Previous Output M Forget Previous Output M Input Previous Output M Cell Input M Gate Input M Gate Input M Gate
165 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)
166 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)
167 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)
168 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)
169 // Example: Running a LSTM RNN on the GPU // Create input and output data var inputsequence: [MPSMatrix] = [] var outputsequence: [MPSMatrix] = [] for i in 0..< N { // Matrix size is (1, inputsize), inputsize is number of columns inputsequence.append(mpsmatrix( )) // Matrix size is (1, outputsize), outputsize is number of columns outputsequence.append(mpsmatrix( )) } // Submit work to GPU layer.encodesequence(commandbuffer: commandbuffer, sourcematrices: inputsequence, destinationmatrices: outputsequence, recurrentinputstate: nil, recurrentoutputstates: nil) // Tell GPU to start executing work commandbuffer.commit()
170 // Example: Running a LSTM RNN on the GPU // Create input and output data var inputsequence: [MPSMatrix] = [] var outputsequence: [MPSMatrix] = [] for i in 0..< N { // Matrix size is (1, inputsize), inputsize is number of columns inputsequence.append(mpsmatrix( )) // Matrix size is (1, outputsize), outputsize is number of columns outputsequence.append(mpsmatrix( )) } // Submit work to GPU layer.encodesequence(commandbuffer: commandbuffer, sourcematrices: inputsequence, destinationmatrices: outputsequence, recurrentinputstate: nil, recurrentoutputstates: nil) // Tell GPU to start executing work commandbuffer.commit()
171 // Example: Running a LSTM RNN on the GPU // Create input and output data var inputsequence: [MPSMatrix] = [] var outputsequence: [MPSMatrix] = [] for i in 0..< N { // Matrix size is (1, inputsize), inputsize is number of columns inputsequence.append(mpsmatrix( )) // Matrix size is (1, outputsize), outputsize is number of columns outputsequence.append(mpsmatrix( )) } // Submit work to GPU layer.encodesequence(commandbuffer: commandbuffer, sourcematrices: inputsequence, destinationmatrices: outputsequence, recurrentinputstate: nil, recurrentoutputstates: nil) // Tell GPU to start executing work commandbuffer.commit()
172 Example: Image Captioning Training Training to Caption Images
173 Example: Image Captioning Training caption caption caption Trained Parameters Training to Caption Images
174 Example: Image Captioning Training caption caption caption Determine what is depicted in Generate image the image caption CNN RNN Trained Parameters
175 Example: Image Captioning Inference Trained Parameters
176 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption CNN RNN Trained Parameters
177 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption CNN RNN Trained Parameters control CNN layers Trained Parameters control RNN gates
178 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption CNN RNN
179 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption a man riding a wave on top of a surfboard CNN RNN
180 Example: Image Captioning Inference a man riding a wave on top of a surfboard Determine what is depicted in the image Generate image caption LSTM Inception-v3 Memory Cell Image Captioning Network* *Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, IEEE Transactions on Pattern Analysis and Machine Intelligence,
181 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Memory Cell Inception-v3
182 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Memory Cell Inception-v3
183 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Inception-v3 Feature vector Memory Cell
184 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Inception-v3 Feature vector Memory Cell
185 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token LSTM Memory Cell Output
186 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token LSTM Memory Cell Output 3 best one-word captions
187 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token 3 best one-word captions LSTM Memory Cell Output 3 best one-word captions
188 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token 3 best one-word captions LSTM LSTM Memory Cell Memory Cell Output 3 best one-word captions 3 best two-word captions
189 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token 3 best one-word captions 3 best N-word captions LSTM Memory Cell LSTM Memory Cell... LSTM Memory Cell Output 3 best one-word captions 3 best two-word captions End
190 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the Top three captions:
191 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the Top three captions:
192 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions:
193 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions: a man a person a surfer
194 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions: a man a person a surfer the man the surfer the young
195 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions: a man a person a surfer the man the surfer the young
196 Caption Generation Iteration 2 Iteration 3 Caption Probability Caption Probability Top three captions: man on man in man surfing a man a person a surfer the man the surfer the young a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in
197 Caption Generation Iteration 2 Iteration 3 Caption Probability Caption Probability Top three captions: man on man in man surfing a man a person a surfer the man the surfer the young a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in
198 Caption Generation Iteration 3 Iteration 4 Caption Probability Caption Probability Top three captions: a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in a man riding a a man riding on a man riding the a man on a a man on his a man on the a man is surfing a man is riding a man is on
199 Caption Generation Iteration 3 Iteration 4 Caption Probability Caption Probability Top three captions: a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in a man riding a a man riding on a man riding the a man on a a man on his a man on the a man is surfing a man is riding a man is on
200 Caption Generation Top three captions: 1. a man riding a wave on top of a surfboard 2. a man on a surfboard riding a wave 3. a man riding a wave on a surfboard
201 Caption Generation Top three captions: 1. a man riding a wave on top of a surfboard 2. a man on a surfboard riding a wave 3. a man riding a wave on a surfboard
202 Demo Image captioning CNN + LSTM
203 Summary GPU accelerated primitives Expanded support for Image Processing and Convolutional Neural Networks Added support for Linear Algebra and Recurrent Neural Networks Optimized for ios and macos New Neural Network Graph API
204 Related Sessions Introducing Metal 2 Executive Ballroom Tuesday 1:50PM Introducing Core ML Hall 3 Tuesday 3:10PM VR with Metal 2 Hall 3 Wednesday 10:00AM Vision Framework: Building on Core ML Hall 2 Wednesday 3:10PM Core ML in depth Hall 3 Thursday 09:00AM Accelerate and Sparse Solvers Executive Ballroom Thursday 10:00AM Metal 2 Optimization and Debugging Grand Ballroom B Thursday 3:10PM
205 Labs Metal 2 Lab Technology Lab Friday 09:00AM 12:00PM
206 More Information
207
What s New in Metal. Part 2 #WWDC16. Graphics and Games. Session 605
Graphics and Games #WWDC16 What s New in Metal Part 2 Session 605 Charles Brissart GPU Software Engineer Dan Omachi GPU Software Engineer Anna Tikhonova GPU Software Engineer 2016 Apple Inc. All rights
More informationWhat s New in Metal, Part 2
Graphics and Games #WWDC15 What s New in Metal, Part 2 Session 607 Dan Omachi GPU Software Frameworks Engineer Anna Tikhonova GPU Software Frameworks Engineer 2015 Apple Inc. All rights reserved. Redistribution
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationCore ML in Depth. System Frameworks #WWDC17. Krishna Sridhar, Core ML Zach Nation, Core ML
System Frameworks #WWDC17 Core ML in Depth Krishna Sridhar, Core ML Zach Nation, Core ML 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission from
More informationMetal for Ray Tracing Acceleration
Session #WWDC18 Metal for Ray Tracing Acceleration 606 Sean James, GPU Software Engineer Wayne Lister, GPU Software Engineer 2018 Apple Inc. All rights reserved. Redistribution or public display not permitted
More informationIntroducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer
Session Graphics and Games #WWDC17 Introducing Metal 2 601 Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display
More informationHello Edge: Keyword Spotting on Microcontrollers
Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of
More informationMartian lava field, NASA, Wikipedia
Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia
More informationGrounded Compositional Semantics for Finding and Describing Images with Sentences
Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational
More informationCS 523: Multimedia Systems
CS 523: Multimedia Systems Angus Forbes creativecoding.evl.uic.edu/courses/cs523 Today - Convolutional Neural Networks - Work on Project 1 http://playground.tensorflow.org/ Convolutional Neural Networks
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationUsing Accelerate and simd
Session #WWDC18 Using Accelerate and simd 701 Matthew Badin, CoreOS, Vector and Numerics Luke Chang, CoreOS, Vector and Numerics 2018 Apple Inc. All rights reserved. Redistribution or public display not
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationDeep Learning and Its Applications
Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent
More informationMIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius
MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety
More informationMetal for OpenGL Developers
#WWDC18 Metal for OpenGL Developers Dan Omachi, Metal Ecosystem Engineer Sukanya Sudugu, GPU Software Engineer 2018 Apple Inc. All rights reserved. Redistribution or public display not permitted without
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationWorking With Metal Advanced
Graphics and Games #WWDC14 Working With Metal Advanced Session 605 Gokhan Avkarogullari GPU Software Aaftab Munshi GPU Software Serhat Tekin GPU Software 2014 Apple Inc. All rights reserved. Redistribution
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationNeural Nets & Deep Learning
Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg
More informationResidual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina
Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,
More informationGPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13
GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning
More informationLecture 20: Neural Networks for NLP. Zubin Pahuja
Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple
More informationThe OpenVX Computer Vision and Neural Network Inference
The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos
More informationCNN for Low Level Image Processing. Huanjing Yue
CNN for Low Level Image Processing Huanjing Yue 2017.11 1 Deep Learning for Image Restoration General formulation: min Θ L( x, x) s. t. x = F(y; Θ) Loss function Parameters to be learned Key issues The
More informationAccelerating Convolutional Neural Nets. Yunming Zhang
Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune
More informationLSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University
LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in
More informationDeep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity
Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms
More informationReal Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications
Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,
More informationInference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos
More informationDeep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm
Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm What is Machine Learning (ML)? Artificial Intelligence Machine Learning Deep Learning Neural Networks Additional
More informationDeep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?
Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of
More informationKnow your data - many types of networks
Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for
More informationUnsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang
Unsupervised Deep Learning James Hays slides from Carl Doersch and Richard Zhang Recap from Previous Lecture We saw two strategies to get structured output while using deep learning With object detection,
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More informationComo funciona o Deep Learning
Como funciona o Deep Learning Moacir Ponti (com ajuda de Gabriel Paranhos da Costa) ICMC, Universidade de São Paulo Contact: www.icmc.usp.br/~moacir moacir@icmc.usp.br Uberlandia-MG/Brazil October, 2017
More informationUsing and Extending the Xcode Source Editor
Developer Tools #WWDC16 Using and Extending the Xcode Source Editor Session 414 Mike Swingler Xcode Infrastructure and Editors Chris Hanson Xcode Infrastructure and Editors 2016 Apple Inc. All rights reserved.
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationAccessibility on OS X
Frameworks #WWDC14 Accessibility on OS X New Accessibility API Session 207 Patti Hoa Accessibility Engineer! Chris Dolan Accessibility Engineer 2014 Apple Inc. All rights reserved. Redistribution or public
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More informationNVIDIA FOR DEEP LEARNING. Bill Veenhuis
NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA
More informationNovel Image Captioning
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationVision Framework. Building on Core ML. Media #WWDC17. Brett Keating, Apple Manager Frank Doepke, He who wires things together
Session Media #WWDC17 Vision Framework Building on Core ML 506 Brett Keating, Apple Manager Frank Doepke, He who wires things together 2017 Apple Inc. All rights reserved. Redistribution or public display
More informationMedia and Gaming Accessibility
Session System Frameworks #WWDC17 Media and Gaming Accessibility 217 Greg Hughes, Software Engineering Manager Charlotte Hill, Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public
More informationIMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.
IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC. CONTENTS Deep Learning Review Implementation on GPU using cudnn Optimization Issues Introduction to VUNO-Net DEEP LEARNING REVIEW BRIEF HISTORY OF NEURAL
More information컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision
1 컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision 김종남 Application Engineer 2017 The MathWorks, Inc. 2 Three Main Topics New capabilities for computer vision system design: Deep Learning 3-D Vision
More informationScene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science
Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving
More informationFully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling
More informationWhat s New in SpriteKit
Graphics and Games #WWDC16 What s New in SpriteKit Session 610 Ross Dexter Games Technologies Engineer Clément Boissière Games Technologies Engineer 2016 Apple Inc. All rights reserved. Redistribution
More informationCS231N Section. Video Understanding 6/1/2018
CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationWu Zhiwen.
Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationCIS 660. Image Searching System using CNN-LSTM. Presented by. Mayur Rumalwala Sagar Dahiwala
CIS 660 using CNN-LSTM Presented by Mayur Rumalwala Sagar Dahiwala AGENDA Problem in Image Searching? Proposed Solution Tools, Library and Dataset used Architecture of Proposed System Implementation of
More informationAll You Want To Know About CNNs. Yukun Zhu
All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationFUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION
Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and
More informationDeep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper
Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision
More informationArtificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )
Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial
More informationConvolutional Networks for Text
CS11-747 Neural Networks for NLP Convolutional Networks for Text Graham Neubig Site https://phontron.com/class/nn4nlp2017/ An Example Prediction Problem: Sentence Classification I hate this movie very
More informationShow, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks
Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of
More informationConvolutional-Recursive Deep Learning for 3D Object Classification
Convolutional-Recursive Deep Learning for 3D Object Classification Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012 Iro Armeni, Manik Dhar Motivation Hand-designed
More informationABC-CNN: Attention Based CNN for Visual Question Answering
ABC-CNN: Attention Based CNN for Visual Question Answering CIS 601 PRESENTED BY: MAYUR RUMALWALA GUIDED BY: DR. SUNNIE CHUNG AGENDA Ø Introduction Ø Understanding CNN Ø Framework of ABC-CNN Ø Datasets
More informationMastering Drag and Drop
Session App Frameworks #WWDC17 Mastering Drag and Drop 213 Tom Adriaenssen, UIKit Wenson Hsieh, WebKit Robb Böhnke, UIKit 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted
More informationDeep Learning Applications
October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning
More informationModern User Interaction on ios
App Frameworks #WWDC17 Modern User Interaction on ios Mastering the UIKit UIGesture System Session 219 Dominik Wagner, UIKit Engineer Michael Turner, UIKit Engineer Glen Low, UIKit Engineer 2017 Apple
More informationDeep Neural Network Evaluation
Lecture 8: Deep Neural Network Evaluation Visual Computing Systems Training/evaluating deep neural networks Technique leading to many high-profile AI advances in recent years Speech recognition/natural
More informationSupplementary Material for: Video Prediction with Appearance and Motion Conditions
Supplementary Material for Video Prediction with Appearance and Motion Conditions Yunseok Jang 1 2 Gunhee Kim 2 Yale Song 3 A. Architecture Details (Section 3.2) We provide architecture details of our
More informationCNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague
CNNS FROM THE BASICS TO RECENT ADVANCES Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague ducha.aiki@gmail.com OUTLINE Short review of the CNN design Architecture progress
More informationMachine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,
Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image
More informationS8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer
S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationMachine Learning. MGS Lecture 3: Deep Learning
Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer
More informationWhat s New in Core Data?
Session App Frameworks #WWDC17 What s New in Core? Persisting since 2004 210 Melissa Turner, Core Engineer Rishi Verma, Core Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display
More informationECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University
ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationNeural Networks with Input Specified Thresholds
Neural Networks with Input Specified Thresholds Fei Liu Stanford University liufei@stanford.edu Junyang Qian Stanford University junyangq@stanford.edu Abstract In this project report, we propose a method
More informationOPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS
April 4-7, 2016 Silicon Valley OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS Jeremy Appleyard, 7 April 2016 RECURRENT NEURAL NETWORKS Output is fed into input Perform the same operation repeatedly
More informationConvolutional Neural Networks
NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise
More informationFace Recognition A Deep Learning Approach
Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison
More informationGetting started with Caffe. Jon Barker, Solutions Architect
Getting started with Caffe Jon Barker, Solutions Architect Caffe tour Overview Agenda Example applications Setup Performance Hands-on lab preview 2 A tour of Caffe 3 What is Caffe? An open framework for
More informationDeep Learning on Graphs
Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 22 March 2017 joint work with Max Welling (University of Amsterdam) BDL Workshop @ NIPS 2016
More informationWhat s New in ARKit 2
Session #WWDC18 What s New in ARKit 2 602 Arsalan Malik, ARKit Engineer Reinhard Klapfer, ARKit Engineer 2018 Apple Inc. All rights reserved. Redistribution or public display not permitted without written
More informationThe Hitchhiker s Guide to TensorFlow:
The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU
More informationObject Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal
Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn
More informationBuilding Visually Rich User Experiences
Session App Frameworks #WWDC17 Building Visually Rich User Experiences 235 Noah Witherspoon, Software Engineer Warren Moore, Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public
More informationXilinx ML Suite Overview
Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame
More informationMIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA
MIXED PRECISION TRAINING OF NEURAL NETWORKS Carl Case, Senior Architect, NVIDIA OUTLINE 1. What is mixed precision training with FP16? 2. Considerations and methodology for mixed precision training 3.
More informationCIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm
CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their
More informationDeep Learning Based Real-time Object Recognition System with Image Web Crawler
, pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department
More informationIntroduction to Deep Learning for Facial Understanding Part III: Regional CNNs
Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Raymond Ptucha, Rochester Institute of Technology, USA Tutorial-9 May 19, 218 www.nvidia.com/dli R. Ptucha 18 1 Fair Use Agreement
More informationImplementing Long-term Recurrent Convolutional Network Using HLS on POWER System
Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign
More informationMetal. GPU-accelerated advanced 3D graphics rendering and data-parallel computation. source rebelsmarket.com
Metal GPU-accelerated advanced 3D graphics rendering and data-parallel computation source rebelsmarket.com Maths The heart and foundation of computer graphics source wallpoper.com Metalmatics There are
More informationarxiv: v1 [cs.cv] 20 Mar 2017
I2T2I: LEARNING TEXT TO IMAGE SYNTHESIS WITH TEXTUAL DATA AUGMENTATION Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo arxiv:1703.06676v1 [cs.cv] 20 Mar 2017 Data Science Institute, Imperial College
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationXES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework
XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research
More information