Deep learning for music, galaxies and plankton

Size: px

Start display at page:

Download "Deep learning for music, galaxies and plankton"

Joella Farmer
6 years ago
Views:

1 Deep learning for music, galaxies and plankton Sander Dieleman May 17,

2 I. Galaxies 2

3 3

4 4

5 The Galaxy Challenge: automate this classification process Competition on? model colour image predictions 5

The data: 140 000 JPEG colour images dimensions:

6 The data: JPEG colour images dimensions: 424 x 424 train: images test: images 6

7 The solution: a convnet with 7 layers (RGB) 32 Max pooling = 20x Max pooling = 8x x 16 Max pooling = 2x maxout(2) 2048 maxout(2) 7

8 Shallow learning xn ɸn fθ(ɸn) yn training examples extracted features shallow model predictions 8

9 Deep learning xn fθk( fθ2(fθ1(xn))) yn training examples deep model predictions 9

10 Deep learning vs. traditional neural networks output layer hidden layer 10

11 Deep learning vs. traditional neural networks output layer hidden layers 11

12 Deep learning vs. traditional neural networks output layer hidden layers rectified linear units y = max(x, 0) 12

13 Deep learning vs. traditional neural networks output layer hidden layers 13

14 Convolutional neural networks local connectivity flatten translation invariance fully connected convolutional 14

15 The solution: a convnet with 7 layers (RGB) 32 Max pooling = 20x Max pooling = 8x x 16 Max pooling = 2x maxout(2) 2048 maxout(2) 15

16 Preprocessing: cropping and downsampling 424 x x x 69 16

17 Data augmentation: rotation, translation, rescaling, flipping, 17

18 Network architecture: exploiting rotation invariance 18

19 Network architecture: exploiting rotation invariance 19

20 Network architecture: exploiting rotation invariance 20

21 Training large CNNs requires GPU acceleration Intel Core i7 3930K at 3.2 GHz, 6 cores 32GB RAM NVIDIA GeForce GTX 680 2GB / 4GB (2x) 21

22 The filters learned in the first convolutional layer Red Green Blue 22

23 input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 23

24 input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 24

25 input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 25

26 input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 II. Plankton 38

39 Pieter Jonas Iryna Jeroen Lionel Sander Aäron 39

40 40

41 41

42 Preprocessing and data augmentation rescale zoom, rotate, translate, flip, shear, stretch 42

43 Network architecture based on OxfordNet 3x3 convolution 3x3 overlapping pooling, stride 2 fully connected layer Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan & Zisserman, ICLR

44 Cyclic pooling

45 Cyclic pooling 3x3 convolution cyclic slicing 3x3 pooling, stride 2 cyclic pooling fully connected layer 45

46 Cyclic rolling

47 Pseudo-labeling averaged test set predictions... test set predictions from various models

48 Pseudo-labeling testing data + averaged test set predictions 0.33 training data + labels 0.67 larger training set! strong regularizing effect mixed training batch

49 Traditional CV features Image size in pixels Image moments (capturing size and shape) Haralick texture features 49

50 Model averaging: ensembling... 50

51 Model averaging: test-time augmentation quasi-random affine transformations... 51

52 Model averaging: bagging same networks retrained on different subsets... 52

53 Software and hardware Lots of GPUs Tesla K40 GeForce GTX 680 GeForce GTX 980 Theano + Lasagne Very fast prototyping through automatic differentiation and graph optimisations 53

http://benanne.github.io/2015/03/17/plankton.html https://github.

be Sander Dieleman http://benanne.github.

54 Reservoir Lab Sander Dieleman Iryna Korshunova Lionel Pigou Pieter Buteneers 54

55 III. Music

56 Collaborative filtering: use listening patterns for recommendation + good performance - cold start problem many niche items that only appeal to a small audience 56

57 Content-based: use audio content and/or metadata for recommendation - worse performance + no usage data required Artist Title allows for all items to be recommended regardless of popularity 57

58 There is a large semantic gap between audio signals and listener preference genre mood popularity time audio signals lyrical themes location instrumentation 58

59 # listeners the long tail not enough data to recommend these songs! popular unpopular 59

60 # listeners rich get richer popularity 60

61 Latent factor models: project users and songs into the same latent space similar songs good recommendations dissimilar songs 61

62 Predict latent factors from music audio signals regression model audio signals 62

63 Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 63

64 Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 64

65 Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 65

66 Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 66

67 Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 67

128 4x MP 2048 256 2048 1536 2x MP 4 256 2x MP 512 mean 40 4 4 4 35 max L2

68 128 4x MP x MP x MP 512 mean max L Spectrograms (30 seconds) Latent factors global temporal pooling 68

69 Blog post:

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of