Populating the Galaxy Zoo Real-time Image Classification with SQL Server R Services David M Smith @revodavid R Community Lead Microsoft Algorithms and Data Science
THANKS to all Sponsors! EVENT SPONSORS EXPO SPONSORS EXPO LIGHT SPONSORS
Meet me at the Community Zone After this session, you can speak with me in the Community Zone WE MIGHT Discuss additional questions Review parts of my session in more detail Network Take selfies
Session goals The Origin and Eventual Fate of the Universe Computer Vision and Deep Neural Networks Deploying a Convolutional Neural Network Using Microsoft R and SQL Server
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / Hubble
Image Credit: NASA / http://sploid.gizmodo.com/the-incredibly-huge-size-of-andromeda-1493036499 Hubble
Image Credit: NASA / Hubble
Whirlpool Galaxy (M51) and companion galaxy
Grand design spiral galaxy M81
Barred spiral galaxy NGC 1300
Elliptical galaxy IC 2006
Centaurus A, from European Southern Observatory: http://www.eso.org
NGC 3125 Forming Ancient Image: http://www.nasa.gov/image-feature/goddard/2016/hubble-views-a-galaxy-fit-to-burst
Spiral galaxies Elliptical galaxies M10 M50 Collisions and other events ESO 3250G004 Forming Ancient NASA, ESA, K. Kuntz (JHU), F. Bresolin (University of Hawaii), J. Trauger (Jet Propulsion Lab), J. Mould (NOAO), Y.-H. Chu (University of Illinois, Urbana), and STScI
The Hubble tuning fork Source: Wikipedia
2 trillion 200 billion Hubble ultra deep Hubble deep field 100 Billion Galaxies in observable universe http://www.nasa.gov/feature/goddard/2016/hubble-reveals-observable-universe-contains-10-times-more-galaxies-than-previously-thought
Professional astronomers The Astronomer by Johannes Vermeer (Wikipedia)
Professional astronomers Citizen data science The Astronomer by Johannes Vermeer (Wikipedia)
Professional astronomers Citizen data science Thousands of images 250K images The Astronomer by Johannes Vermeer (Wikipedia)
Professional astronomers Citizen data science Computer vision Thousands of images 250K images Millions of images The Astronomer by Johannes Vermeer (Wikipedia)
Demonstration
Data Hidden layer(s) Outcome
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations HonglakLee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng
A two-dimensional array of pixels Neural network Spiral Elliptical
rotation scaling translation Neural network
Match pieces of the image
Convolution Matches specific shape (kernel) across entire image Automatic feature generation
Layers can be repeated several (or many) times. Convolution Convolution Spiral Pooling Pooling Elliptical
R Usage Growth Rexer Data Miner Survey, 2007-2015 Language Popularity IEEE Spectrum Top Programming Languages, 2016 76% of analytic professionals report using R 36% select R as their primary tool
ConnectR Microsoft R Open RevoScaleR MicrosoftML DistributedR Available in: Microsoft R Server 9, SQL Server 2016/2017
library library Load the required R packages
library(revoscaler) library(microsoftml) Load the required R packages multiclass Run the neural network
library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" gpu 32 ) Load the required R packages Run the neural network Use GPU acceleration
library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" acceleration = "gpu", minibatchsize = 32 initwtsdiameter = 0.1, 50) Load the required R packages Run the neural network Use GPU acceleration Specify hyperparameters
library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" acceleration = "gpu", minibatchsize = 32 initwtsdiameter = 0.1, numiterations = 50) What about the network definition?
library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" acceleration = "gpu", minibatchsize = 32 initwtsdiameter = 0.1, numiterations = 50) NET# https://docs.microsoft.com/en-us/azure/machinelearning/machine-learning-azure-ml-netsharpreference-guide
input pixels [3, 50, 50]; hidden conv1 [64, 24, 24] rlinear from pixels convolve { KernelShape = [3, 5, 5]; Stride = [1, 2, 2]; MapCount = 64; } NET# hidden rnorm1 [64, 11, 11] from conv1 response norm { KernelShape = [1, 4, 4]; Stride = [1, 2, 2]; } hidden pool1 [64, 9, 9] from rnorm1 max pool { KernelShape = [1, 3, 3]; } hidden hid1 [256] rlinear from pool1 all; hidden hid2 [256] rlinear from hid1 all; output Class [13] softmax from hid2 all; https://docs.microsoft.com/en-us/azure/machinelearning/machine-learning-azure-ml-netsharpreference-guide
input [3, 50, 50] rlinear [3, 5, 5] 64 convolve Input images 64 maps [1, 4, 4] response norm normalize [1, 3, 3] max pool max pooling output [13] softmax all all fully connected output
Azure storage Storage blob Images SQL Server Train model Data Science Virtual machine Skyserver database SQL2016 R Services Azure N Series GPU VM Web Azure
Train neural network using GPU on Azure GPU = Graphical processing unit CPU: 30 hrs GPU: 3 hrs
Call to remote SQL Server instance with R inside
Data Scientist Interacts directly with data Creates models and experiments Data Analyst/DBA Manages data and analytics together Extensibility R Integration R Analytic Library Relational Data? 010010 100100 010101 T-SQL Interface open source/microsoft R Example Solutions Fraud detection Sales forecasting Warehouse efficiency Predictive maintenance How is it Integrated? T-SQL calls a Stored Procedure Script is run in SQL through extensibility model Result sets sent through Web API to database or applications Benefits Faster deployment of ML models Less data movement, faster insights Work with large datasets: mitigate R memory and scalability limitations
Demonstration
Publish service with mrsdeploy Easy Consumption Easy Deployment Data Scientist Microsoft R Client publishservice (mrsdeploy package) Microsoft R Server configured for operationalizing R analytics Easy Setup Services / Sessions In-cloud or on-prem Adding nodes to scale High availability & load balancing Remote execution server Data Scientist Microsoft R Client (mrsdeploy package) Developer Easy Integration
100K * 3 Training images, augmented with rotation 8 Layers in deep network 176K Weights to compute in network 2.5B Weight updates per second 1.8 hours Computing time on Azure N series GPU 88% Overall accuracy - training data 55% Overall accuracy - test data The technique works, but has scope for improvement!
55% Overall accuracy on test data
Convolutional neural nets can predict galaxy class You can use R Server to train and deploy a model Use Azure GPU machines for faster training Deploy to SQL server
Please evaluate all sessions! QR / LINK on posters and in program
Easy deployment Build the model first Deploy as a web service instantly
Johannes Vermeer, The Astronomer
R Open Open source R Compatible with CRAN MKL for fast linear algebra R Open Microsoft R Server RTVS DeployR ConnectR Connectivity to databases and Hadoop ScaleR Parallel computing Large scale analytics DistributedR Distributed computing Cross-platform portability
Scalable computing, storage and services
SQL Server 2016 Enterprise Edition SQL Server Query Processor SQL Server R Services Integration Facilities: Component Integration Launchers Parameter Passing Results Return Console Output Return Parallel Data Exchange (RTM) Stored Procedures Package Administration Microsoft R Open Algorithm Library Open Source R Interpreter Fast, Parallel, Storage Efficient Algorithms Data Prep Descriptive Stats Sampling Statistical Tests Predictive Models 100% Open Source R Fully CRAN Compatible Accelerated Math Variable Selection Clustering Classification Custom APIs for R + CRAN Parallel Scoring