Deep learning for music, galaxies and plankton

Similar documents
Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning for Computer Vision with MATLAB By Jon Cherrie

Deep Learning with Tensorflow AlexNet

Return of the Devil in the Details: Delving Deep into Convolutional Nets

ImageNet Classification with Deep Convolutional Neural Networks

Convolutional Neural Network Layer Reordering for Acceleration

Large-scale Video Classification with Convolutional Neural Networks

Deep Learning for Computer Vision II

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Machine Learning. MGS Lecture 3: Deep Learning

INTRODUCTION TO DEEP LEARNING

Deconvolutions in Convolutional Neural Networks

POINT CLOUD DEEP LEARNING

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

A performance comparison of Deep Learning frameworks on KNL

Fig 2.1 Typical prediction results by linear regression with HOG

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

Structured Prediction using Convolutional Neural Networks

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

ConvolutionalNN's... ConvNet's... deep learnig

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Object Detection on Self-Driving Cars in China. Lingyun Li

Study of Residual Networks for Image Recognition

Real-Time Depth Estimation from 2D Images

Dynamic Routing Between Capsules. Yiting Ethan Li, Haakon Hukkelaas, and Kaushik Ram Ramasamy

Deep Learning for Computer Vision

Keras: Handwritten Digit Recognition using MNIST Dataset

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

Pose estimation using a variety of techniques

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

Two-Stream Convolutional Networks for Action Recognition in Videos

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Machine Learning 13. week

Face Recognition A Deep Learning Approach

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Deep Learning for Vision

Convolution Neural Networks for Chinese Handwriting Recognition

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

6. Convolutional Neural Networks

Accelerating Convolutional Neural Nets. Yunming Zhang

Restricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version

Kaggle Data Science Bowl 2017 Technical Report

Music Recommendation at Spotify

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Tiny ImageNet Visual Recognition Challenge

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Edge Detection Using Convolutional Neural Network

Fully Convolutional Networks for Semantic Segmentation

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

ECE 5470 Classification, Machine Learning, and Neural Network Review

Seminars in Artifiial Intelligenie and Robotiis

Computer Vision Lecture 16

A Deep Learning Approach to Vehicle Speed Estimation

Accelerating Reinforcement Learning in Engineering Systems

Multi-Task Self-Supervised Visual Learning

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

Deep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies

Application of Deep Learning Techniques in Satellite Telemetry Analysis.

Deep Learning on Graphs

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

Facial Keypoint Detection

Fully Convolutional Network for Depth Estimation and Semantic Segmentation

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

Demystifying Deep Learning

Big Data Era Time Domain Astronomy

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

Rotation Invariance Neural Network

arxiv: v1 [cs.cv] 20 Apr 2017

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

Capsule Networks. Eric Mintun

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Facial Key Points Detection using Deep Convolutional Neural Network - NaimishNet

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Dynamic Routing Between Capsules

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES

2015 The MathWorks, Inc. 1

arxiv: v3 [cs.cv] 2 Jun 2017

Convolutional Neural Networks for Facial Expression Recognition

Implementing Deep Learning for Video Analytics on Tegra X1.

Using Machine Learning for Classification of Cancer Cells

Know your data - many types of networks

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Fuzzy Set Theory in Computer Vision: Example 3

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Exploiting Cyclic Symmetry in Convolutional Neural Networks

Introduction to Neural Networks

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

arxiv: v1 [cs.cv] 31 Mar 2016

Parallel one-versus-rest SVM training on the GPU

Deep Face Recognition. Nathan Sun

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Neural Networks with Input Specified Thresholds

Transcription:

Deep learning for music, galaxies and plankton Sander Dieleman May 17, 2016 1

I. Galaxies 2

http://www.galaxyzoo.org 3

4

The Galaxy Challenge: automate this classification process Competition on? model colour image predictions 5

The data: 140 000 JPEG colour images dimensions: 424 x 424 train: 61 578 images test: 79 975 images 6

The solution: a convnet with 7 layers 45 40 16 5 6 3 3 6 16 45 40 3 (RGB) 32 Max pooling = 20x20 4 3 3 5 6 6 64 Max pooling = 8x8 4 128 128 x 16 Max pooling = 2x2 37 2048 maxout(2) 2048 maxout(2) 7

Shallow learning xn ɸn fθ(ɸn) yn training examples extracted features shallow model predictions 8

Deep learning xn fθk( fθ2(fθ1(xn))) yn training examples deep model predictions 9

Deep learning vs. traditional neural networks output layer hidden layer 10

Deep learning vs. traditional neural networks output layer hidden layers 11

Deep learning vs. traditional neural networks output layer hidden layers rectified linear units y = max(x, 0) 12

Deep learning vs. traditional neural networks output layer hidden layers 13

Convolutional neural networks local connectivity flatten translation invariance fully connected convolutional 14

The solution: a convnet with 7 layers 45 40 16 5 6 3 3 6 16 45 40 3 (RGB) 32 Max pooling = 20x20 4 3 3 5 6 6 64 Max pooling = 8x8 4 128 128 x 16 Max pooling = 2x2 37 2048 maxout(2) 2048 maxout(2) 15

Preprocessing: cropping and downsampling 424 x 424 207 x 207 69 x 69 16

Data augmentation: rotation, translation, rescaling, flipping, 17

Network architecture: exploiting rotation invariance 18

Network architecture: exploiting rotation invariance 19

Network architecture: exploiting rotation invariance 20

Training large CNNs requires GPU acceleration Intel Core i7 3930K at 3.2 GHz, 6 cores 32GB RAM NVIDIA GeForce GTX 680 2GB / 4GB (2x) 21

The filters learned in the first convolutional layer Red Green Blue 22

input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 23

input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 24

input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 25

input layer 2 16x16 layer 1 40x40 pooling 2 8x8 layer 3 6x6 pooling 1 20x20 layer 4 4x4 pooling 4 2x2 26

27

28

29

30

31

32

33

34

35

36

http://benanne.github.io/2014/04/05/galaxy-zoo.html https://github.com/benanne/kaggle-galaxies 37

II. Plankton 38

Pieter Jonas Iryna Jeroen Lionel Sander Aäron 39

40

41

Preprocessing and data augmentation rescale zoom, rotate, translate, flip, shear, stretch 42

Network architecture based on OxfordNet 3x3 convolution 3x3 overlapping pooling, stride 2 fully connected layer Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan & Zisserman, ICLR 2015 43

Cyclic pooling 0 90 180 270

Cyclic pooling 3x3 convolution cyclic slicing 3x3 pooling, stride 2 cyclic pooling fully connected layer 45

Cyclic rolling 0 90 180 270

Pseudo-labeling averaged test set predictions... test set predictions from various models

Pseudo-labeling testing data + averaged test set predictions 0.33 training data + labels 0.67 larger training set! strong regularizing effect mixed training batch

Traditional CV features Image size in pixels Image moments (capturing size and shape) Haralick texture features 49

Model averaging: ensembling... 50

Model averaging: test-time augmentation quasi-random affine transformations... 51

Model averaging: bagging same networks retrained on different subsets... 52

Software and hardware Lots of GPUs Tesla K40 GeForce GTX 680 GeForce GTX 980 Theano + Lasagne Very fast prototyping through automatic differentiation and graph optimisations 53

http://benanne.github.io/2015/03/17/plankton.html https://github.com/benanne/kaggle-ndsb Reservoir Lab http://reslab.elis.ugent.be Sander Dieleman http://benanne.github.io @sedielem Iryna Korshunova http://irakorshunova.github.io Lionel Pigou http://lpigou.github.io Pieter Buteneers http://playn.be @pieterbuteneers 54

III. Music

Collaborative filtering: use listening patterns for recommendation + good performance - cold start problem many niche items that only appeal to a small audience 56

Content-based: use audio content and/or metadata for recommendation - worse performance + no usage data required Artist Title allows for all items to be recommended regardless of popularity 57

There is a large semantic gap between audio signals and listener preference genre mood popularity time audio signals lyrical themes location instrumentation 58

# listeners the long tail not enough data to recommend these songs! popular unpopular 59

# listeners rich get richer popularity 60

Latent factor models: project users and songs into the same latent space similar songs good recommendations dissimilar songs 61

Predict latent factors from music audio signals regression model audio signals 62

Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 63

Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 64

Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 65

Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 66

Qualitative evaluation: visualisation of predicted usage patterns (t-sne) 67

128 4x MP 2048 256 2048 1536 2x MP 4 256 2x MP 512 mean 40 4 4 4 35 max L2 73 149 599 Spectrograms (30 seconds) Latent factors global temporal pooling 68

Blog post: http://benanne.github.io/2014/08/05/spotify-cnns.html