Deep learning for music, galaxies and plankton Sander Dieleman May 17, 2016 1
I. Galaxies 2
http://www.galaxyzoo.org 3
4
The Galaxy Challenge: automate this classification process Competition on? model colour image predictions 5
The data: 140 000 JPEG colour images dimensions: 424 x 424 train: 61 578 images test: 79 975 images 6
The solution: a convnet with 7 layers 45 40 16 5 6 3 3 6 16 45 40 3 (RGB) 32 Max pooling = 20x20 4 3 3 5 6 6 64 Max pooling = 8x8 4 128 128 x 16 Max pooling = 2x2 37 2048 maxout(2) 2048 maxout(2) 7
Shallow learning xn ɸn fθ(ɸn) yn training examples extracted features shallow model predictions 8
Deep learning xn fθk( fθ2(fθ1(xn))) yn training examples deep model predictions 9
Deep learning vs. traditional neural networks output layer hidden layer 10
Deep learning vs. traditional neural networks output layer hidden layers 11
Deep learning vs. traditional neural networks output layer hidden layers rectified linear units y = max(x, 0) 12
Deep learning vs. traditional neural networks output layer hidden layers 13
Convolutional neural networks local connectivity flatten translation invariance fully connected convolutional 14
The solution: a convnet with 7 layers 45 40 16 5 6 3 3 6 16 45 40 3 (RGB) 32 Max pooling = 20x20 4 3 3 5 6 6 64 Max pooling = 8x8 4 128 128 x 16 Max pooling = 2x2 37 2048 maxout(2) 2048 maxout(2) 15
Preprocessing: cropping and downsampling 424 x 424 207 x 207 69 x 69 16
Data augmentation: rotation, translation, rescaling, flipping, 17
Network architecture: exploiting rotation invariance 18
Network architecture: exploiting rotation invariance 19
Network architecture: exploiting rotation invariance 20
Training large CNNs requires GPU acceleration Intel Core i7 3930K at 3.2 GHz, 6 cores 32GB RAM NVIDIA GeForce GTX 680 2GB / 4GB (2x) 21
The filters learned in the first convolutional layer Red Green Blue 22
input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 23
input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 24
input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 25
input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 26
27
28
29
30
31
32
33
34
35
36
http://benanne.github.io/2014/04/05/galaxy-zoo.html https://github.com/benanne/kaggle-galaxies 37
II. Plankton 38
Pieter Jonas Iryna Jeroen Lionel Sander Aäron 39
40
41
Preprocessing and data augmentation rescale zoom, rotate, translate, flip, shear, stretch 42
Network architecture based on OxfordNet 3x3 convolution 3x3 overlapping pooling, stride 2 fully connected layer Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan & Zisserman, ICLR 2015 43
Cyclic pooling 0 90 180 270
Cyclic pooling 3x3 convolution cyclic slicing 3x3 pooling, stride 2 cyclic pooling fully connected layer 45
Cyclic rolling 0 90 180 270
Pseudo-labeling averaged test set predictions... test set predictions from various models
Pseudo-labeling testing data + averaged test set predictions 0.33 training data + labels 0.67 larger training set! strong regularizing effect mixed training batch
Traditional CV features Image size in pixels Image moments (capturing size and shape) Haralick texture features 49
Model averaging: ensembling... 50
Model averaging: test-time augmentation quasi-random affine transformations... 51
Model averaging: bagging same networks retrained on different subsets... 52
Software and hardware Lots of GPUs Tesla K40 GeForce GTX 680 GeForce GTX 980 Theano + Lasagne Very fast prototyping through automatic differentiation and graph optimisations 53
http://benanne.github.io/2015/03/17/plankton.html https://github.com/benanne/kaggle-ndsb Reservoir Lab http://reslab.elis.ugent.be Sander Dieleman http://benanne.github.io @sedielem Iryna Korshunova http://irakorshunova.github.io Lionel Pigou http://lpigou.github.io Pieter Buteneers http://playn.be @pieterbuteneers 54
III. Music
Collaborative filtering: use listening patterns for recommendation + good performance - cold start problem many niche items that only appeal to a small audience 56
Content-based: use audio content and/or metadata for recommendation - worse performance + no usage data required Artist Title allows for all items to be recommended regardless of popularity 57
There is a large semantic gap between audio signals and listener preference genre mood popularity time audio signals lyrical themes location instrumentation 58
# listeners the long tail not enough data to recommend these songs! popular unpopular 59
# listeners rich get richer popularity 60
Latent factor models: project users and songs into the same latent space similar songs good recommendations dissimilar songs 61
Predict latent factors from music audio signals regression model audio signals 62
Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 63
Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 64
Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 65
Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 66
Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 67
128 4x MP 2048 256 2048 1536 2x MP 4 256 2x MP 512 mean 40 4 4 4 35 max L2 73 149 599 Spectrograms (30 seconds) Latent factors global temporal pooling 68
Blog post: http://benanne.github.io/2014/08/05/spotify-cnns.html