Using Metal 2 for Compute

Size: px
Start display at page:

Download "Using Metal 2 for Compute"

Transcription

1 Session Graphics and Games #WWDC17 Using Metal 2 for Compute 608 Anna Tikhonova, GPU Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission from Apple.

2 Metal 2 Ecosystem Metal API and language GPU Tools MetalKit Metal Performance Shaders Metal 2

3 Metal 2 Ecosystem Metal API and language GPU Tools MetalKit Metal Performance Shaders Metal 2

4 Metal Performance Shaders (MPS) GPU accelerated primitives Image Processing Linear Algebra Machine Learning Inference Optimized for ios What s New in Metal, Part 2 WWDC 2016 What s New in Metal, Part 2 WWDC 2015

5 Metal Performance Shaders (MPS) NEW GPU accelerated primitives Image Processing Linear Algebra Machine Learning Inference Optimized for ios and macos What s New in Metal, Part 2 WWDC 2016 What s New in Metal, Part 2 WWDC 2015

6 Image Processing

7 Image Processing Primitives available in ios 10 Convolution Equalization and Specification Gaussian Blur Median Box, Tent Thresholding Sobel Transpose Morphology Image Integral Lanczos Resampling Color Conversion Histogram Gaussian Pyramid

8 Image Processing New primitives NEW Image Keypoints Bilinear Rescale Image Statistics Element-wise Arithmetic Operations With broadcasting

9 Linear Algebra

10 Linear Algebra New primitives NEW Matrix-Matrix Multiplication Matrix-Vector Multiplication Triangular Matrix Factorization and Linear Solvers

11 Data Representations MPSVector Interprets data in MTLBuffer as a 1-dimensional array

12 Data Representations MPSVector Interprets data in MTLBuffer as a 1-dimensional array MPSMatrix Interprets data in MTLBuffer as a rectangular array Row-major order

13 Data Representations MPSVector Interprets data in MTLBuffer as a 1-dimensional array MPSMatrix Interprets data in MTLBuffer as a rectangular array Row-major order MPSTemporaryMatrix Allocated from MTLHeap Use for most of your intermediate matrices

14 MPSVector and MPSMatrix Input types Single Precision Floating-Point Half Precision Floating-Point 16-bit Signed Integer 8-bit Signed Integer

15 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)

16 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)

17 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)

18 MPSVector Code example Create a vector of size N // Create a Metal buffer of length N let buffer = device.makebuffer(length: N * MemoryLayout<Float32>.size) // Create a vector descriptor let descriptor = MPSVectorDescriptor(length: N, datatype:.float32) // Create a vector with descriptor let vector = MPSVector(buffer: buffer, descriptor: descriptor)

19 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)

20 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)

21 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)

22 MPSMatrix Code example Create a matrix with M rows and N columns // Get the recommended bytes per row value to use for sizing a Metal buffer let bytesperrow = MPSMatrixDescriptor.rowBytes(forColumns: N, datatype:.float32) // Create a Metal buffer with the recommended bytes per row let buffer = device.makebuffer(length: M * bytesperrow) // Create a matrix descriptor let descriptor = MPSMatrixDescriptor(rows: M, columns: N, rowbytes: bytesperrow, datatype:.float32) // Create a matrix with descriptor let matrix = MPSMatrix(buffer: buffer, descriptor: descriptor)

23 Primitives Matrix-Matrix and Matrix-Vector Multiplication API modeled after standard BLAS GEMM and GEMV interfaces Triangular Matrix Factorization and Linear Solvers API modeled after standard LAPACK decomposition and solve interfaces

24 // Example: Matrix-Matrix Multiply: C = A B // Create matrices A, B and C let A = MPSMatrix(buffer: ABuffer, descriptor: MPSMatrixDescriptor(rows: M, columns: K, rowbytes: ARowBytes, datatype:.float32)) let B = MPSMatrix(buffer: BBuffer, descriptor: MPSMatrixDescriptor(rows: K, columns: N, rowbytes: BRowBytes, datatype:.float32)) let C = MPSMatrix(buffer: CBuffer, descriptor: MPSMatrixDescriptor(rows: M, columns: N, rowbytes: CRowBytes, datatype:.float32))

25 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()

26 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()

27 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()

28 // Example: Matrix-Matrix Multiply: C = A B // Perform Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Create a Matrix-Matrix Multiplication kernel let mmkernel = MPSMatrixMultiplication(device: device, resultrows: M, resultcolumns: N, interiorcolumns: K) // Encode kernel to the command buffer mmkernel.encode(commandbuffer: commandbuffer, leftmatrix: A, rightmatrix: B, resultmatrix: C) // Tell GPU to start doing the work commandbuffer.commit()

29 Sample Code MPSMatrixMultiplication Triangular Matrix Factorization and Linear Solvers Coming soon

30 Machine Learning

31 Machine Learning at Apple Architecture Applications Domain Specific Frameworks Vision NLP ML Framework Core ML ML Performance Primitives Accelerate MPS

32 What Is Deep Learning?

33

34

35 panda

36

37 house ocean dress dog girl sunset bicycle giraffe horse ramp man plant skateboard lights

38 Training and Inference cat rabbit dog giraffe horse Training to Classify Images

39 Training giraffe cat rabbit dog dog cat cat rabbit rabbit horse horse dog cat rabbit dog giraffe horse Training to Classify Images

40 Training cat rabbit dog giraffe horse Training to Classify Images

41 Training cat rabbit dog giraffe Trained Parameters horse Training to Classify Images

42 Inference cat rabbit dog giraffe Trained Parameters horse Training to Classify Images

43 Inference Input Image cat rabbit CNN dog cat giraffe horse Inference Training to Classify Images

44 Agenda Recap on Convolutional Neural Networks (CNN) What s New in Metal, Part 2 WWDC 2016

45 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)

46 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)

47 What Are Convolutional Neural Networks?

48 Convolutional Neural Networks Biologically-inspired, resemble the visual cortex

49 Convolutional Neural Networks Biologically-inspired, resemble the visual cortex Hierarchical representation Organized into a hierarchy of layers Higher-level features are derived from lower-level features

50 Convolutional Neural Networks Biologically-inspired, resemble the visual cortex Hierarchical representation Organized into a hierarchy of layers Higher-level features are derived from lower-level features Think of a feature as a filter that filters data for that feature

51 Convolutional Neural Networks Primitives available in ios 10 Convolution Fully-Connected Pooling Average Max Normalization Cross-Channel Local Contrast Spatial Softmax Neuron Linear ReLU Sigmoid TanH Absolute

52 Convolutional Neural Networks Primitives available in ios 10 Convolution Fully-Connected Pooling Average Max Normalization Cross-Channel Local Contrast Spatial Softmax Neuron Linear ReLU Sigmoid TanH Absolute

53 Convolution Core building block Recognizes features in input

54 1 filter 3 x 3 1-channel input 1-channel output

55 1 filter 3 x 3 1-channel input 1-channel output

56 1 filter 3 x 3 1-channel input 1-channel output

57 1 filter 3 x 3 1-channel input 1-channel output

58 1 filter 3 x 3 1-channel input 1-channel output

59 1 filter 3 x 3 1-channel input 1-channel output

60 16 5x5 filters 3-channel input 40 x channel output 40 x 40

61 3*16 5x5 filters 3-channel input 40 x channel output 40 x 40

62 3*16 5x5 filters 3-channel input 40 x channel output 40 x 40

63 3*16 5x5 filters 3-channel input 40 x channel output 40 x 40

64 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)

65 Convolutional Neural Networks New primitives NEW New Convolution weight types Binary and XNOR Convolution Sub-Pixel Convolution Dilated Convolution Convolution Transpose L2Norm Pooling Dilated Max Pooling Log Softmax Resampling Lanczos, Bilinear Upsampling Arithmetic Operators Addition, Subtraction, Multiplication, Division New Neuron layers Hard Sigmoid, SoftPlus, SoftSign, ELU

66 Convolutional Neural Networks New primitives NEW New Convolution weight types Binary and XNOR Convolution Sub-Pixel Convolution Dilated Convolution Convolution Transpose L2Norm Pooling Dilated Max Pooling Log Softmax Resampling Lanczos, Bilinear Upsampling Arithmetic Operators Addition, Subtraction, Multiplication, Division New Neuron layers Hard Sigmoid, SoftPlus, SoftSign, ELU

67 Convolution Filter weight types NEW Single Precision Floating-Point To reduce memory footprint and improve performance Half Precision Floating-Point 8-bit Integer Binary

68 Convolution Primitives NEW Standard Binary and XNOR Dilated Sub-Pixel Transpose

69 Binary and XNOR Convolution Same operation as regular Convolution Input Weights Improved performance Regular Convolution Less memory

70 Binary and XNOR Convolution Binary Convolution Input Weights Full-sized input, binary weights Regular Convolution Binary Convolution

71 Binary and XNOR Convolution Binary Convolution Input Weights Full-sized input, binary weights Regular Convolution XNOR Convolution Binary input, binary weights Binary Convolution XNOR Convolution

72 Dilated Convolution Comparison to regular convolution Input Output

73 Dilated Convolution Comparison to regular convolution Input Output

74 Dilated Convolution Comparison to regular convolution Input Output 3 x 3 kernel

75 Dilated Convolution Comparison to regular convolution Input Output 3 x 3 kernel

76 Dilated Convolution How it works Input Output 3 x 3 kernel dilationfactorx = 2 dilationfactory = 2

77 Dilated Convolution How it works Input Output 3 x 3 kernel dilationfactorx = 2 dilationfactory = 2

78 Sub-Pixel Convolution and Convolution Transpose Commonly used for upscaling

79 Upscaling Using a box filter Fixed operation with a constant filter Input W x H Output 2W x 2H

80 Upscaling Using a box filter Fixed operation with a constant filter Input W x H Output 2W x 2H

81 Upscaling Using a box filter Fixed operation with a constant filter Input W x H Output 2W x 2H

82 Sub-Pixel Convolution How it works Trained Parameters One-channel input W x H 4 filters for 2x upscaling One-channel output 2W x 2H

83 Sub-Pixel Convolution How it works One-channel input W x H 4 filters for 2x upscaling One-channel output 2W x 2H

84 Sub-Pixel Convolution How it works Reshuffle One-channel input W x H 4 filters for 2x upscaling One-channel output 2W x 2H

85 Convolution Transpose How it works Input W x H

86 Convolution Transpose How it works Input W x H

87 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H

88 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H

89 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H

90 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H

91 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H

92 Convolution Transpose How it works Intermediate Result 2W x 2H Output W x H

93 New Convolution Primitives Example: colorizing black and white images

94 New Convolution Primitives Example: colorizing black and white images Input Output Convolution Dilated Convolution Batch Normalization Convolution Transpose SoftMax Colorization network* *Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016,

95 New Convolution Primitives Example: colorizing black and white images Dilated Convolution integrate wider global context Convolution Dilated Convolution Batch Normalization Convolution Transpose SoftMax Colorization network* *Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016,

96 New Convolution Primitives Example: colorizing black and white images Dilated Convolution integrate wider global context Convolution Transpose upscale output Convolution Dilated Convolution Batch Normalization Convolution Transpose SoftMax Colorization network* *Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016,

97 Demo Image colorization

98 Performance Improvements in ios Higher is better Percentage Improvement 20 0 iphone 6S iphone 7 Plus ipad Pro 9.7 ipad Pro 10.5" Inception-v3 network *Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, CVPR 2015,

99 Performance Improvements in ios Higher is better Percentage Improvement 20 22% 22% 29% 21% 0 iphone 6S iphone 7 Plus ipad Pro 9.7 ipad Pro 10.5" Inception-v3 network *Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, CVPR 2015,

100 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)

101 Neural Network Graph API Overview NEW Describe neural network using graph API

102 Neural Network Graph API Overview NEW Describe neural network using graph API Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

103 Neural Network Graph API Overview NEW Describe neural network using graph API Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

104 Neural Network Graph API Overview NEW Describe neural network using graph API Filter nodes Operations Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

105 Neural Network Graph API Overview NEW Describe neural network using graph API Filter nodes Operations Image nodes Data Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

106 Neural Network Graph API Ease of use Compact representation

107 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding)

108 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse

109 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call

110 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call No intermediate images to manage, just input/output

111 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call No intermediate images to manage, just input/output Auto-configuration of image sizes, padding, centering

112 Neural Network Graph API Ease of use Compact representation Save and restore across platforms (NSSecureCoding) Initialize once, reuse Execute graph on GPU with single call No intermediate images to manage, just input/output Auto-configuration of image sizes, padding, centering MetalImageRecognition code sample* 4x less code with NN Graph API

113 Neural Network Graph API Deliver best performance Easy to parallelize between CPU and GPU

114 Neural Network Graph API Deliver best performance Easy to parallelize between CPU and GPU Fuse graph nodes

115 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

116 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

117 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Optimize away Concatenation nodes Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

118 Neural Network Graph API Deliver best performance NEW Easy to parallelize between CPU and GPU Fuse graph nodes Execute graph nodes concurrently Optimize away Concatenation nodes Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Concatentation Image

119 Filter Nodes Convolution node Create a MPSNNConvolutionNode with data source provider let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat ))

120 Filter Nodes Convolution node Create a MPSNNConvolutionNode with data source provider let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat ))

121 Filter Nodes Convolution node Create a MPSNNConvolutionNode with data source provider let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat ))

122 Feeding Parameters to Convolution Layer Just-in-time loading and purging of weights data Minimize memory footprint class MyWeights: NSObject, MPSCNNConvolutionDataSource { // Initialize the data source object init(file: String) { } } public func load() -> Bool { } public func descriptor() -> MPSCNNConvolutionDescriptor { } public func weights() -> UnsafeMutableRawPointer { } public func purge() { }

123 Feeding Parameters to Convolution Layer Just-in-time loading and purging of weights data Minimize memory footprint class MyWeights: NSObject, MPSCNNConvolutionDataSource { // Initialize the data source object init(file: String) { } } public func load() -> Bool { } public func descriptor() -> MPSCNNConvolutionDescriptor { } public func weights() -> UnsafeMutableRawPointer { } public func purge() { }

124 Feeding Parameters to Convolution Layer Just-in-time loading and purging of weights data Minimize memory footprint class MyWeights: NSObject, MPSCNNConvolutionDataSource { // Initialize the data source object init(file: String) { } } public func load() -> Bool { } public func descriptor() -> MPSCNNConvolutionDescriptor { } public func weights() -> UnsafeMutableRawPointer { } public func purge() { }

125 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 pool1 conv2 pool2 conv3 pool3 conv4 fc1 fc2 }

126 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) pool1 conv2 pool2 conv3 pool3 conv4 fc1 fc2 }

127 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 pool1 let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) let pool1 = MPSCNNPoolingMaxNode(source: conv1.resultimage, filtersize: 2) conv2 pool2 conv3 pool3 conv4 fc1 fc2 }

128 // Example: create a graph func makegraph() -> MPSNNImageNode { conv1 pool1 conv2 pool2 conv3 pool3 conv4 let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) let pool1 = MPSCNNPoolingMaxNode(source: conv1.resultimage, filtersize: 2) let conv2 = MPSCNNConvolutionNode(source: pool1.resultimage, weights: MyWeights(file: conv2.dat )) let pool2 = MPSCNNPoolingMaxNode(source: conv2.resultimage, filtersize: 2) let conv3 = MPSCNNConvolutionNode(source: pool2.resultimage, weights: MyWeights(file: conv3.dat )) let pool3 = MPSCNNPoolingMaxNode(source: conv3.resultimage, filtersize: 2) let conv4 = MPSCNNConvolutionNode(source: pool3.resultimage, weights: MyWeights(file: conv4.dat )) let fc1 = MPSCNNFullyConnectedNode(source: conv4.resultimage, weights: MyWeights(file: fc1.dat )) fc1 fc2 } let fc2 = MPSCNNFullyConnectedNode(source: return fc1.resultimage, weights: MyWeights(file: fc2.dat )) fc2.resultimage

129 // Example: create a graph func makegraph() -> MPSNNImageNode { let conv1 = MPSCNNConvolutionNode(source: MPSNNImageNode(handle: nil), weights: MyWeights(file: conv1.dat )) let pool1 = MPSCNNPoolingMaxNode(source: conv1.resultimage, filtersize: 2) let conv2 = MPSCNNConvolutionNode(source: pool1.resultimage, weights: MyWeights(file: conv2.dat )) let pool2 = MPSCNNPoolingMaxNode(source: conv2.resultimage, filtersize: 2) let conv3 = MPSCNNConvolutionNode(source: pool2.resultimage, weights: MyWeights(file: conv3.dat )) let pool3 = MPSCNNPoolingMaxNode(source: conv3.resultimage, filtersize: 2) let conv4 = MPSCNNConvolutionNode(source: pool3.resultimage, weights: MyWeights(file: conv4.dat )) let fc1 = MPSCNNFullyConnectedNode(source: conv4.resultimage, weights: MyWeights(file: fc1.dat )) let fc2 = MPSCNNFullyConnectedNode(source: fc1 fc1.resultimage, weights: MyWeights(file: fc2.dat )) } return fc2 fc2.resultimage

130 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()

131 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()

132 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()

133 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()

134 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()

135 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()

136 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! let commandqueue = device.makecommandqueue() let commandbuffer = commandqueue.makecommandbuffer() // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted()

137 // Example: execute graph on the GPU // Metal setup let device = MTLCreateSystemDefaultDevice()! task1 let commandqueue = device.makecommandqueue() CPU GPU encode Bubble let commandbuffer = commandqueue.makecommandbuffer() execute task1 // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.encode(to: commandbuffer, sourceimages: [input]) // Tell GPU to start executing work and wait until GPU work is done commandbuffer.commit() commandbuffer.waituntilcompleted() encode task2 Bubble encode task2 Bubble encode task2 time Bubble execute task2 Bubble execute task2 Bubble

138 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task

139 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task

140 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task

141 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task

142 // Example: execute graph on the GPU asynchronously // Metal setup let device = MTLCreateSystemDefaultDevice()! // Initialize graph let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image let input = MPSImage(texture: texture, ) // Encode graph let output = graph?.executeasync(sourceimages: [input]) { resultimage, error in // check for error and use resultimage inside closure } // Don t wait, encode new GPU task

143 // Example: execute graph on the GPU asynchronously // Metal setup CPU encode let device = MTLCreateSystemDefaultDevice()! task1 GPU // Initialize graph encode task2 execute task1 let graph = MPSNNGraph(device: device, resultimage: makegraph()) // Create input image encode task3 let input = MPSImage(texture: texture, ) execute task2 // Encode graph let output = graph?.executeasync(sourceimages: [input]) { } resultimage, error in // check for error and use resultimage inside closure // Don t wait, encode new GPU task encode task4 encode task5 encode task6 time execute task3 execute task4 execute task5

144 Demo Inception-v3 using Neural Network Graph API

145 Agenda Recap on Convolutional Neural Networks (CNN) Convolutional Neural Networks New Primitives Neural Network Graph API Recurrent Neural Networks (RNN)

146 What Are Recurrent Neural Networks?

147 CNN One - to - one One input Image

148 CNN One - to - one CNN dog grass Inference One input Image One output Set of probabilities

149 RNN Sequences: one - to - many CNN Inference

150 RNN Sequences: one - to - many CNN RNN A black and white dog laying in the grass Inference Inference One input Set of probabilities Sequence of outputs Words / image caption

151 RNN Sequences: many - to - many A black and RNN white dog laying in the grass Inference Sequence of inputs Sentence in English

152 RNN Sequences: many - to - many A black and white dog laying in the grass RNN Чёрно-белая собака лежит на траве Mustan ja valkoisen värinen koira makaa ruohikolla Inference Sequence of inputs Sentence in English Sequence of outputs Translated sentence

153 Recurrent Neural Networks New primitives NEW Single Gate Long Short-Term Memory (LSTM) Gated Recurrent Unit (GRU) Minimally Gated Unit (MGU)

154 Single Gate RNN Recurrent Unit enables previous output to affect Output the output of subsequent iterations Recurrent Unit Input

155 Long Short-Term Memory (LSTM) Built from Single Gate RNNs Output Has an internal Memory Cell Gates control information flow inside the LSTM LSTM and what is stored in the Memory Cell Input

156 Long Short-Term Memory (LSTM) Built from Single Gate RNNs Output Has an internal Memory Cell Gates control information flow inside the LSTM LSTM and what is stored in the Memory Cell Memory Cell Input

157 LSTM Architecture Output LSTM Memory Cell Input

158 LSTM Architecture LSTM Memory Cell

159 LSTM Architecture LSTM Old Memory New Memory

160 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply LSTM * + Point-wise operations What to keep from old memory Old Memory * New Memory Previous Output Input M M Forget Gate

161 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory How new input affects new memory Old Memory * New Memory * Previous Output M Forget Previous Output M Input Previous Output M Cell Input M Gate Input M Gate Input M Gate

162 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory How new input affects new memory Old Memory * New Memory * Previous Output M Forget Previous Output M Input Previous Output M Cell Input M Gate Input M Gate Input M Gate

163 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory How new input affects new memory Old Memory * New + Memory * Previous Output Input M M Forget Gate Previous Output Input M M Input Gate Previous Output Input M M Cell Gate

164 LSTM Architecture M Matrix-Matrix or Matrix-Vector Multiply * + Point-wise operations LSTM What to keep from old memory Output How new input affects new memory Previous Output Input M M Output Gate * How previous output, current input, new memory affect new output Old Memory * New Memory * Previous Output M Forget Previous Output M Input Previous Output M Cell Input M Gate Input M Gate Input M Gate

165 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)

166 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)

167 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)

168 // Example: Creating a LSTM RNN // Create a LSTM layer descriptor let descriptor = MPSLSTMDescriptor() descriptor.inputfeaturechannels = inputsize descriptor.outputfeaturechannels = outputsize // Create and initialize gate weights with trained parameters, using a data source provider // for just-in-time loading and purging of weights descriptor.forgetgateinputweights = MyWeights(file: forgetgateweights.dat )) descriptor.cellgateinputweights = MyWeights(file: cellgateweights.dat )) // Initialize the rest of the gates // Metal setup let device = MTLCreateSystemDefaultDevice()! // Also get commandqueue and commandbuffer // Create a LSTM layer let layer = MPSRNNMatrixInferenceLayer(device: device, rnndescriptor: descriptor)

169 // Example: Running a LSTM RNN on the GPU // Create input and output data var inputsequence: [MPSMatrix] = [] var outputsequence: [MPSMatrix] = [] for i in 0..< N { // Matrix size is (1, inputsize), inputsize is number of columns inputsequence.append(mpsmatrix( )) // Matrix size is (1, outputsize), outputsize is number of columns outputsequence.append(mpsmatrix( )) } // Submit work to GPU layer.encodesequence(commandbuffer: commandbuffer, sourcematrices: inputsequence, destinationmatrices: outputsequence, recurrentinputstate: nil, recurrentoutputstates: nil) // Tell GPU to start executing work commandbuffer.commit()

170 // Example: Running a LSTM RNN on the GPU // Create input and output data var inputsequence: [MPSMatrix] = [] var outputsequence: [MPSMatrix] = [] for i in 0..< N { // Matrix size is (1, inputsize), inputsize is number of columns inputsequence.append(mpsmatrix( )) // Matrix size is (1, outputsize), outputsize is number of columns outputsequence.append(mpsmatrix( )) } // Submit work to GPU layer.encodesequence(commandbuffer: commandbuffer, sourcematrices: inputsequence, destinationmatrices: outputsequence, recurrentinputstate: nil, recurrentoutputstates: nil) // Tell GPU to start executing work commandbuffer.commit()

171 // Example: Running a LSTM RNN on the GPU // Create input and output data var inputsequence: [MPSMatrix] = [] var outputsequence: [MPSMatrix] = [] for i in 0..< N { // Matrix size is (1, inputsize), inputsize is number of columns inputsequence.append(mpsmatrix( )) // Matrix size is (1, outputsize), outputsize is number of columns outputsequence.append(mpsmatrix( )) } // Submit work to GPU layer.encodesequence(commandbuffer: commandbuffer, sourcematrices: inputsequence, destinationmatrices: outputsequence, recurrentinputstate: nil, recurrentoutputstates: nil) // Tell GPU to start executing work commandbuffer.commit()

172 Example: Image Captioning Training Training to Caption Images

173 Example: Image Captioning Training caption caption caption Trained Parameters Training to Caption Images

174 Example: Image Captioning Training caption caption caption Determine what is depicted in Generate image the image caption CNN RNN Trained Parameters

175 Example: Image Captioning Inference Trained Parameters

176 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption CNN RNN Trained Parameters

177 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption CNN RNN Trained Parameters control CNN layers Trained Parameters control RNN gates

178 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption CNN RNN

179 Example: Image Captioning Inference Determine what is depicted in the image Generate image caption a man riding a wave on top of a surfboard CNN RNN

180 Example: Image Captioning Inference a man riding a wave on top of a surfboard Determine what is depicted in the image Generate image caption LSTM Inception-v3 Memory Cell Image Captioning Network* *Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, IEEE Transactions on Pattern Analysis and Machine Intelligence,

181 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Memory Cell Inception-v3

182 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Memory Cell Inception-v3

183 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Inception-v3 Feature vector Memory Cell

184 Example: Image Captioning LSTM initialization phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax LSTM Inception-v3 Feature vector Memory Cell

185 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token LSTM Memory Cell Output

186 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token LSTM Memory Cell Output 3 best one-word captions

187 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token 3 best one-word captions LSTM Memory Cell Output 3 best one-word captions

188 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token 3 best one-word captions LSTM LSTM Memory Cell Memory Cell Output 3 best one-word captions 3 best two-word captions

189 Example: Image Captioning Caption generation phase Convolution Pooling (Avg.) Pooling (Max.) Fully-Connected SoftMax Input Sentence start token 3 best one-word captions 3 best N-word captions LSTM Memory Cell LSTM Memory Cell... LSTM Memory Cell Output 3 best one-word captions 3 best two-word captions End

190 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the Top three captions:

191 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the Top three captions:

192 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions:

193 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions: a man a person a surfer

194 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions: a man a person a surfer the man the surfer the young

195 Caption Generation Iteration 1 Iteration 2 Caption Probability Caption Probability man a the man on man in man surfing Top three captions: a man a person a surfer the man the surfer the young

196 Caption Generation Iteration 2 Iteration 3 Caption Probability Caption Probability Top three captions: man on man in man surfing a man a person a surfer the man the surfer the young a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in

197 Caption Generation Iteration 2 Iteration 3 Caption Probability Caption Probability Top three captions: man on man in man surfing a man a person a surfer the man the surfer the young a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in

198 Caption Generation Iteration 3 Iteration 4 Caption Probability Caption Probability Top three captions: a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in a man riding a a man riding on a man riding the a man on a a man on his a man on the a man is surfing a man is riding a man is on

199 Caption Generation Iteration 3 Iteration 4 Caption Probability Caption Probability Top three captions: a man riding a man on a man is a person riding a person on a person in a surfer is a surfer riding a surfer in a man riding a a man riding on a man riding the a man on a a man on his a man on the a man is surfing a man is riding a man is on

200 Caption Generation Top three captions: 1. a man riding a wave on top of a surfboard 2. a man on a surfboard riding a wave 3. a man riding a wave on a surfboard

201 Caption Generation Top three captions: 1. a man riding a wave on top of a surfboard 2. a man on a surfboard riding a wave 3. a man riding a wave on a surfboard

202 Demo Image captioning CNN + LSTM

203 Summary GPU accelerated primitives Expanded support for Image Processing and Convolutional Neural Networks Added support for Linear Algebra and Recurrent Neural Networks Optimized for ios and macos New Neural Network Graph API

204 Related Sessions Introducing Metal 2 Executive Ballroom Tuesday 1:50PM Introducing Core ML Hall 3 Tuesday 3:10PM VR with Metal 2 Hall 3 Wednesday 10:00AM Vision Framework: Building on Core ML Hall 2 Wednesday 3:10PM Core ML in depth Hall 3 Thursday 09:00AM Accelerate and Sparse Solvers Executive Ballroom Thursday 10:00AM Metal 2 Optimization and Debugging Grand Ballroom B Thursday 3:10PM

205 Labs Metal 2 Lab Technology Lab Friday 09:00AM 12:00PM

206 More Information

207

What s New in Metal. Part 2 #WWDC16. Graphics and Games. Session 605

What s New in Metal. Part 2 #WWDC16. Graphics and Games. Session 605 Graphics and Games #WWDC16 What s New in Metal Part 2 Session 605 Charles Brissart GPU Software Engineer Dan Omachi GPU Software Engineer Anna Tikhonova GPU Software Engineer 2016 Apple Inc. All rights

More information

What s New in Metal, Part 2

What s New in Metal, Part 2 Graphics and Games #WWDC15 What s New in Metal, Part 2 Session 607 Dan Omachi GPU Software Frameworks Engineer Anna Tikhonova GPU Software Frameworks Engineer 2015 Apple Inc. All rights reserved. Redistribution

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Core ML in Depth. System Frameworks #WWDC17. Krishna Sridhar, Core ML Zach Nation, Core ML

Core ML in Depth. System Frameworks #WWDC17. Krishna Sridhar, Core ML Zach Nation, Core ML System Frameworks #WWDC17 Core ML in Depth Krishna Sridhar, Core ML Zach Nation, Core ML 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission from

More information

Metal for Ray Tracing Acceleration

Metal for Ray Tracing Acceleration Session #WWDC18 Metal for Ray Tracing Acceleration 606 Sean James, GPU Software Engineer Wayne Lister, GPU Software Engineer 2018 Apple Inc. All rights reserved. Redistribution or public display not permitted

More information

Introducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer

Introducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer Session Graphics and Games #WWDC17 Introducing Metal 2 601 Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display

More information

Hello Edge: Keyword Spotting on Microcontrollers

Hello Edge: Keyword Spotting on Microcontrollers Hello Edge: Keyword Spotting on Microcontrollers Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra ARM Research, Stanford University arxiv.org, 2017 Presented by Mohammad Mofrad University of

More information

Martian lava field, NASA, Wikipedia

Martian lava field, NASA, Wikipedia Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia

More information

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Grounded Compositional Semantics for Finding and Describing Images with Sentences Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational

More information

CS 523: Multimedia Systems

CS 523: Multimedia Systems CS 523: Multimedia Systems Angus Forbes creativecoding.evl.uic.edu/courses/cs523 Today - Convolutional Neural Networks - Work on Project 1 http://playground.tensorflow.org/ Convolutional Neural Networks

More information

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model

More information

Using Accelerate and simd

Using Accelerate and simd Session #WWDC18 Using Accelerate and simd 701 Matthew Badin, CoreOS, Vector and Numerics Luke Chang, CoreOS, Vector and Numerics 2018 Apple Inc. All rights reserved. Redistribution or public display not

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius

MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius MIXED PRECISION TRAINING: THEORY AND PRACTICE Paulius Micikevicius What is Mixed Precision Training? Reduced precision tensor math with FP32 accumulation, FP16 storage Successfully used to train a variety

More information

Metal for OpenGL Developers

Metal for OpenGL Developers #WWDC18 Metal for OpenGL Developers Dan Omachi, Metal Ecosystem Engineer Sukanya Sudugu, GPU Software Engineer 2018 Apple Inc. All rights reserved. Redistribution or public display not permitted without

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Working With Metal Advanced

Working With Metal Advanced Graphics and Games #WWDC14 Working With Metal Advanced Session 605 Gokhan Avkarogullari GPU Software Aaftab Munshi GPU Software Serhat Tekin GPU Software 2014 Apple Inc. All rights reserved. Redistribution

More information

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text 16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Neural Nets & Deep Learning

Neural Nets & Deep Learning Neural Nets & Deep Learning The Inspiration Inputs Outputs Our brains are pretty amazing, what if we could do something similar with computers? Image Source: http://ib.bioninja.com.au/_media/neuron _med.jpeg

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13

GPU FOR DEEP LEARNING. 周国峰 Wuhan University 2017/10/13 GPU FOR DEEP LEARNING chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 Why Deep Learning Boost Today? Nvidia SDK for Deep Learning? Agenda CUDA 8.0 cudnn TensorRT (GIE) NCCL DIGITS 2 Why Deep Learning

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

The OpenVX Computer Vision and Neural Network Inference

The OpenVX Computer Vision and Neural Network Inference The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos

More information

CNN for Low Level Image Processing. Huanjing Yue

CNN for Low Level Image Processing. Huanjing Yue CNN for Low Level Image Processing Huanjing Yue 2017.11 1 Deep Learning for Image Restoration General formulation: min Θ L( x, x) s. t. x = F(y; Θ) Loss function Parameters to be learned Key issues The

More information

Accelerating Convolutional Neural Nets. Yunming Zhang

Accelerating Convolutional Neural Nets. Yunming Zhang Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune

More information

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University LSTM and its variants for visual recognition Xiaodan Liang xdliang328@gmail.com Sun Yat-sen University Outline Context Modelling with CNN LSTM and its Variants LSTM Architecture Variants Application in

More information

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms

More information

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,

More information

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos

More information

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm What is Machine Learning (ML)? Artificial Intelligence Machine Learning Deep Learning Neural Networks Additional

More information

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang Unsupervised Deep Learning James Hays slides from Carl Doersch and Richard Zhang Recap from Previous Lecture We saw two strategies to get structured output while using deep learning With object detection,

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

Como funciona o Deep Learning

Como funciona o Deep Learning Como funciona o Deep Learning Moacir Ponti (com ajuda de Gabriel Paranhos da Costa) ICMC, Universidade de São Paulo Contact: www.icmc.usp.br/~moacir moacir@icmc.usp.br Uberlandia-MG/Brazil October, 2017

More information

Using and Extending the Xcode Source Editor

Using and Extending the Xcode Source Editor Developer Tools #WWDC16 Using and Extending the Xcode Source Editor Session 414 Mike Swingler Xcode Infrastructure and Editors Chris Hanson Xcode Infrastructure and Editors 2016 Apple Inc. All rights reserved.

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Accessibility on OS X

Accessibility on OS X Frameworks #WWDC14 Accessibility on OS X New Accessibility API Session 207 Patti Hoa Accessibility Engineer! Chris Dolan Accessibility Engineer 2014 Apple Inc. All rights reserved. Redistribution or public

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

NVIDIA FOR DEEP LEARNING. Bill Veenhuis NVIDIA FOR DEEP LEARNING Bill Veenhuis bveenhuis@nvidia.com Nvidia is the world s leading ai platform ONE ARCHITECTURE CUDA 2 GPU: Perfect Companion for Accelerating Apps & A.I. CPU GPU 3 Intro to AI AGENDA

More information

Novel Image Captioning

Novel Image Captioning 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Vision Framework. Building on Core ML. Media #WWDC17. Brett Keating, Apple Manager Frank Doepke, He who wires things together

Vision Framework. Building on Core ML. Media #WWDC17. Brett Keating, Apple Manager Frank Doepke, He who wires things together Session Media #WWDC17 Vision Framework Building on Core ML 506 Brett Keating, Apple Manager Frank Doepke, He who wires things together 2017 Apple Inc. All rights reserved. Redistribution or public display

More information

Media and Gaming Accessibility

Media and Gaming Accessibility Session System Frameworks #WWDC17 Media and Gaming Accessibility 217 Greg Hughes, Software Engineering Manager Charlotte Hill, Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public

More information

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC.

IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC. IMPLEMENTING DEEP LEARNING USING CUDNN 이예하 VUNO INC. CONTENTS Deep Learning Review Implementation on GPU using cudnn Optimization Issues Introduction to VUNO-Net DEEP LEARNING REVIEW BRIEF HISTORY OF NEURAL

More information

컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision

컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision 1 컴퓨터비전의최신기술 : Deep Learning, 3D Vision and Embedded Vision 김종남 Application Engineer 2017 The MathWorks, Inc. 2 Three Main Topics New capabilities for computer vision system design: Deep Learning 3-D Vision

More information

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Scene Text Recognition for Augmented Reality Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science Outline Research area and motivation Finding text in natural scenes Prior art Improving

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

What s New in SpriteKit

What s New in SpriteKit Graphics and Games #WWDC16 What s New in SpriteKit Session 610 Ross Dexter Games Technologies Engineer Clément Boissière Games Technologies Engineer 2016 Apple Inc. All rights reserved. Redistribution

More information

CS231N Section. Video Understanding 6/1/2018

CS231N Section. Video Understanding 6/1/2018 CS231N Section Video Understanding 6/1/2018 Outline Background / Motivation / History Video Datasets Models Pre-deep learning CNN + RNN 3D convolution Two-stream What we ve seen in class so far... Image

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning

More information

Wu Zhiwen.

Wu Zhiwen. Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

CIS 660. Image Searching System using CNN-LSTM. Presented by. Mayur Rumalwala Sagar Dahiwala

CIS 660. Image Searching System using CNN-LSTM. Presented by. Mayur Rumalwala Sagar Dahiwala CIS 660 using CNN-LSTM Presented by Mayur Rumalwala Sagar Dahiwala AGENDA Problem in Image Searching? Proposed Solution Tools, Library and Dataset used Architecture of Proposed System Implementation of

More information

All You Want To Know About CNNs. Yukun Zhu

All You Want To Know About CNNs. Yukun Zhu All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and

More information

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision

More information

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( )

Artificial Intelligence Introduction Handwriting Recognition Kadir Eren Unal ( ), Jakob Heyder ( ) Structure: 1. Introduction 2. Problem 3. Neural network approach a. Architecture b. Phases of CNN c. Results 4. HTM approach a. Architecture b. Setup c. Results 5. Conclusion 1.) Introduction Artificial

More information

Convolutional Networks for Text

Convolutional Networks for Text CS11-747 Neural Networks for NLP Convolutional Networks for Text Graham Neubig Site https://phontron.com/class/nn4nlp2017/ An Example Prediction Problem: Sentence Classification I hate this movie very

More information

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks

Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Show, Discriminate, and Tell: A Discriminatory Image Captioning Model with Deep Neural Networks Zelun Luo Department of Computer Science Stanford University zelunluo@stanford.edu Te-Lin Wu Department of

More information

Convolutional-Recursive Deep Learning for 3D Object Classification

Convolutional-Recursive Deep Learning for 3D Object Classification Convolutional-Recursive Deep Learning for 3D Object Classification Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng NIPS 2012 Iro Armeni, Manik Dhar Motivation Hand-designed

More information

ABC-CNN: Attention Based CNN for Visual Question Answering

ABC-CNN: Attention Based CNN for Visual Question Answering ABC-CNN: Attention Based CNN for Visual Question Answering CIS 601 PRESENTED BY: MAYUR RUMALWALA GUIDED BY: DR. SUNNIE CHUNG AGENDA Ø Introduction Ø Understanding CNN Ø Framework of ABC-CNN Ø Datasets

More information

Mastering Drag and Drop

Mastering Drag and Drop Session App Frameworks #WWDC17 Mastering Drag and Drop 213 Tom Adriaenssen, UIKit Wenson Hsieh, WebKit Robb Böhnke, UIKit 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted

More information

Deep Learning Applications

Deep Learning Applications October 20, 2017 Overview Supervised Learning Feedforward neural network Convolution neural network Recurrent neural network Recursive neural network (Recursive neural tensor network) Unsupervised Learning

More information

Modern User Interaction on ios

Modern User Interaction on ios App Frameworks #WWDC17 Modern User Interaction on ios Mastering the UIKit UIGesture System Session 219 Dominik Wagner, UIKit Engineer Michael Turner, UIKit Engineer Glen Low, UIKit Engineer 2017 Apple

More information

Deep Neural Network Evaluation

Deep Neural Network Evaluation Lecture 8: Deep Neural Network Evaluation Visual Computing Systems Training/evaluating deep neural networks Technique leading to many high-profile AI advances in recent years Speech recognition/natural

More information

Supplementary Material for: Video Prediction with Appearance and Motion Conditions

Supplementary Material for: Video Prediction with Appearance and Motion Conditions Supplementary Material for Video Prediction with Appearance and Motion Conditions Yunseok Jang 1 2 Gunhee Kim 2 Yale Song 3 A. Architecture Details (Section 3.2) We provide architecture details of our

More information

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague CNNS FROM THE BASICS TO RECENT ADVANCES Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague ducha.aiki@gmail.com OUTLINE Short review of the CNN design Architecture progress

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100 倍以上速く 本当に可能ですか? 2 DOUGLAS ADAMS BABEL FISH Neural Machine Translation Unit 3 4 OVER 100X FASTER, IS IT REALLY POSSIBLE?

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

What s New in Core Data?

What s New in Core Data? Session App Frameworks #WWDC17 What s New in Core? Persisting since 2004 210 Melissa Turner, Core Engineer Rishi Verma, Core Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display

More information

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

Neural Networks with Input Specified Thresholds

Neural Networks with Input Specified Thresholds Neural Networks with Input Specified Thresholds Fei Liu Stanford University liufei@stanford.edu Junyang Qian Stanford University junyangq@stanford.edu Abstract In this project report, we propose a method

More information

OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS

OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS April 4-7, 2016 Silicon Valley OPTIMIZING PERFORMANCE OF RECURRENT NEURAL NETWORKS Jeremy Appleyard, 7 April 2016 RECURRENT NEURAL NETWORKS Output is fed into input Perform the same operation repeatedly

More information

Convolutional Neural Networks

Convolutional Neural Networks NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Getting started with Caffe. Jon Barker, Solutions Architect

Getting started with Caffe. Jon Barker, Solutions Architect Getting started with Caffe Jon Barker, Solutions Architect Caffe tour Overview Agenda Example applications Setup Performance Hands-on lab preview 2 A tour of Caffe 3 What is Caffe? An open framework for

More information

Deep Learning on Graphs

Deep Learning on Graphs Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 22 March 2017 joint work with Max Welling (University of Amsterdam) BDL Workshop @ NIPS 2016

More information

What s New in ARKit 2

What s New in ARKit 2 Session #WWDC18 What s New in ARKit 2 602 Arsalan Malik, ARKit Engineer Reinhard Klapfer, ARKit Engineer 2018 Apple Inc. All rights reserved. Redistribution or public display not permitted without written

More information

The Hitchhiker s Guide to TensorFlow:

The Hitchhiker s Guide to TensorFlow: The Hitchhiker s Guide to TensorFlow: Beyond Recurrent Neural Networks (sort of) Keith Davis @keithdavisiii iamthevastidledhitchhiker.github.io Topics Kohonen/Self-Organizing Maps LSTMs in TensorFlow GRU

More information

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal Object Detection Lecture 10.3 - Introduction to deep learning (CNN) Idar Dyrdal Deep Learning Labels Computational models composed of multiple processing layers (non-linear transformations) Used to learn

More information

Building Visually Rich User Experiences

Building Visually Rich User Experiences Session App Frameworks #WWDC17 Building Visually Rich User Experiences 235 Noah Witherspoon, Software Engineer Warren Moore, Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public

More information

Xilinx ML Suite Overview

Xilinx ML Suite Overview Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame

More information

MIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA

MIXED PRECISION TRAINING OF NEURAL NETWORKS. Carl Case, Senior Architect, NVIDIA MIXED PRECISION TRAINING OF NEURAL NETWORKS Carl Case, Senior Architect, NVIDIA OUTLINE 1. What is mixed precision training with FP16? 2. Considerations and methodology for mixed precision training 3.

More information

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their

More information

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Deep Learning Based Real-time Object Recognition System with Image Web Crawler , pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department

More information

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Raymond Ptucha, Rochester Institute of Technology, USA Tutorial-9 May 19, 218 www.nvidia.com/dli R. Ptucha 18 1 Fair Use Agreement

More information

Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System

Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign

More information

Metal. GPU-accelerated advanced 3D graphics rendering and data-parallel computation. source rebelsmarket.com

Metal. GPU-accelerated advanced 3D graphics rendering and data-parallel computation. source rebelsmarket.com Metal GPU-accelerated advanced 3D graphics rendering and data-parallel computation source rebelsmarket.com Maths The heart and foundation of computer graphics source wallpoper.com Metalmatics There are

More information

arxiv: v1 [cs.cv] 20 Mar 2017

arxiv: v1 [cs.cv] 20 Mar 2017 I2T2I: LEARNING TEXT TO IMAGE SYNTHESIS WITH TEXTUAL DATA AUGMENTATION Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo arxiv:1703.06676v1 [cs.cv] 20 Mar 2017 Data Science Institute, Imperial College

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework

XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework XES Tensorflow Process Prediction using the Tensorflow Deep-Learning Framework Demo Paper Joerg Evermann 1, Jana-Rebecca Rehse 2,3, and Peter Fettke 2,3 1 Memorial University of Newfoundland 2 German Research

More information