Populating the Galaxy Zoo

Similar documents
Noviembre18, 2017 Concepción, Chile. #sqlsatconce

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Microsoft, Open Source, R: You Gotta be Kidding Me!

Boost your Analytics with ML for SQL Nerds

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Prepare. Model. Operationalize

Boost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com

Modeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

Oracle R Technologies

Polytechnic University of Tirana

Oracle Big Data Science

SQL Server 2016 R Integration for database administrators

Integrate MATLAB Analytics into Enterprise Applications

BIG DATA COURSE CONTENT

Oracle Big Data Science IOUG Collaborate 16

Overview of Data Services and Streaming Data Solution with Azure

Revolution R Open Denver R Users Group 6 Jan 2015

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1

Machine Learning 13. week

Understanding the latent value in all content

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Data and AI LATAM 2018

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Deploying Deep Learning Networks to Embedded GPUs and CPUs

Perceptron: This is convolution!

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Optimizing Your Analytics Life Cycle with SAS & Teradata. Rick Lower

Oracle Machine Learning Notebook

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

Transforming Transport Infrastructure with GPU- Accelerated Machine Learning Yang Lu and Shaun Howell

Data center: The center of possibility

The Evolution of Big Data Platforms and Data Science

Demystifying Machine Learning

Introducing SAS Model Manager 15.1 for SAS Viya

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

Question Bank. 4) It is the source of information later delivered to data marts.

What is Gluent? The Gluent Data Platform

What's New in MATLAB for Engineering Data Analytics?

Accelerate your SAS analytics to take the gold

MACHINE LEARNING Example: Google search

Graph Database and Analytics in a GPU- Accelerated Cloud Offering

Integrate MATLAB Analytics into Enterprise Applications

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor

ImageNet Classification with Deep Convolutional Neural Networks

GPU-Accelerated Deep Learning

Is your IT Infrastructure Ready for Machine Learning & Artificial Intelligence?

Diving into your Azure Data Lake with U-SQL. Helge Rege

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

Massive Scalability With InterSystems IRIS Data Platform

R Language for the SQL Server DBA

WITH INTEL TECHNOLOGIES

Deep Learning with Tensorflow AlexNet

Multi-Task Self-Supervised Visual Learning

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

Intelligence for the connected world How European First-Movers Manage IoT Analytics Projects Successfully

NVIDIA DLI HANDS-ON TRAINING COURSE CATALOG

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich

Developing In The Cloud

THE RSA SUITE NETWITNESS REINVENT YOUR SIEM. Presented by: Walter Abeson

The Future of Analytics or The New SQL

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Analyzing Big Data with Microsoft R

Deep Learning Frameworks with Spark and GPUs

Machine Learning with Python

Microsoft vision for a new era

Please give me your feedback

NVIDIA DEEP LEARNING INSTITUTE

Image Classification pipeline. Lecture 2-1

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Graph Analytics and Machine Learning A Great Combination Mark Hornick

How to Keep UP Through Digital Transformation with Next-Generation App Development

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Deep Learning for Computer Vision

Why data science is the new frontier in software development

Build a system health check for Db2 using IBM Machine Learning for z/os

Netezza The Analytics Appliance

15-440: Project 4. Characterizing MapReduce Task Parallelism using K-Means on the Cloud

Security & Management

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

Data Mining, Parallelism, Data Mining, Parallelism, and Grids. Queen s University, Kingston David Skillicorn

Overview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::

Creating a Recommender System. An Elasticsearch & Apache Spark approach

ECS289: Scalable Machine Learning

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

OLAP Introduction and Overview

Facial Expression Classification with Random Filters Feature Extraction

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Accelerate AI with Cisco Computing Solutions

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Unsupervised Deep Learning for Scene Recognition

SQL Server 2017: Data Science with Python or R?

Cloud-Driven Spatial Intelligence

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Transcription:

Populating the Galaxy Zoo Real-time Image Classification with SQL Server R Services David M Smith @revodavid R Community Lead Microsoft Algorithms and Data Science

THANKS to all Sponsors! EVENT SPONSORS EXPO SPONSORS EXPO LIGHT SPONSORS

Meet me at the Community Zone After this session, you can speak with me in the Community Zone WE MIGHT Discuss additional questions Review parts of my session in more detail Network Take selfies

Session goals The Origin and Eventual Fate of the Universe Computer Vision and Deep Neural Networks Deploying a Convolutional Neural Network Using Microsoft R and SQL Server

Image Credit: NASA / Hubble

Image Credit: NASA / Hubble

Image Credit: NASA / Hubble

Image Credit: NASA / Hubble

Image Credit: NASA / Hubble

Image Credit: NASA / Hubble

Image Credit: NASA / http://sploid.gizmodo.com/the-incredibly-huge-size-of-andromeda-1493036499 Hubble

Image Credit: NASA / Hubble

Whirlpool Galaxy (M51) and companion galaxy

Grand design spiral galaxy M81

Barred spiral galaxy NGC 1300

Elliptical galaxy IC 2006

Centaurus A, from European Southern Observatory: http://www.eso.org

NGC 3125 Forming Ancient Image: http://www.nasa.gov/image-feature/goddard/2016/hubble-views-a-galaxy-fit-to-burst

Spiral galaxies Elliptical galaxies M10 M50 Collisions and other events ESO 3250G004 Forming Ancient NASA, ESA, K. Kuntz (JHU), F. Bresolin (University of Hawaii), J. Trauger (Jet Propulsion Lab), J. Mould (NOAO), Y.-H. Chu (University of Illinois, Urbana), and STScI

The Hubble tuning fork Source: Wikipedia

2 trillion 200 billion Hubble ultra deep Hubble deep field 100 Billion Galaxies in observable universe http://www.nasa.gov/feature/goddard/2016/hubble-reveals-observable-universe-contains-10-times-more-galaxies-than-previously-thought

Professional astronomers The Astronomer by Johannes Vermeer (Wikipedia)

Professional astronomers Citizen data science The Astronomer by Johannes Vermeer (Wikipedia)

Professional astronomers Citizen data science Thousands of images 250K images The Astronomer by Johannes Vermeer (Wikipedia)

Professional astronomers Citizen data science Computer vision Thousands of images 250K images Millions of images The Astronomer by Johannes Vermeer (Wikipedia)

Demonstration

Data Hidden layer(s) Outcome

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations HonglakLee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng

A two-dimensional array of pixels Neural network Spiral Elliptical

rotation scaling translation Neural network

Match pieces of the image

Convolution Matches specific shape (kernel) across entire image Automatic feature generation

Layers can be repeated several (or many) times. Convolution Convolution Spiral Pooling Pooling Elliptical

R Usage Growth Rexer Data Miner Survey, 2007-2015 Language Popularity IEEE Spectrum Top Programming Languages, 2016 76% of analytic professionals report using R 36% select R as their primary tool

ConnectR Microsoft R Open RevoScaleR MicrosoftML DistributedR Available in: Microsoft R Server 9, SQL Server 2016/2017

library library Load the required R packages

library(revoscaler) library(microsoftml) Load the required R packages multiclass Run the neural network

library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" gpu 32 ) Load the required R packages Run the neural network Use GPU acceleration

library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" acceleration = "gpu", minibatchsize = 32 initwtsdiameter = 0.1, 50) Load the required R packages Run the neural network Use GPU acceleration Specify hyperparameters

library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" acceleration = "gpu", minibatchsize = 32 initwtsdiameter = 0.1, numiterations = 50) What about the network definition?

library(revoscaler) library(microsoftml) model <- rxneuralnet( formula, data = galaxy_data, netdefinition = netdefinition, type = "multiclass" acceleration = "gpu", minibatchsize = 32 initwtsdiameter = 0.1, numiterations = 50) NET# https://docs.microsoft.com/en-us/azure/machinelearning/machine-learning-azure-ml-netsharpreference-guide

input pixels [3, 50, 50]; hidden conv1 [64, 24, 24] rlinear from pixels convolve { KernelShape = [3, 5, 5]; Stride = [1, 2, 2]; MapCount = 64; } NET# hidden rnorm1 [64, 11, 11] from conv1 response norm { KernelShape = [1, 4, 4]; Stride = [1, 2, 2]; } hidden pool1 [64, 9, 9] from rnorm1 max pool { KernelShape = [1, 3, 3]; } hidden hid1 [256] rlinear from pool1 all; hidden hid2 [256] rlinear from hid1 all; output Class [13] softmax from hid2 all; https://docs.microsoft.com/en-us/azure/machinelearning/machine-learning-azure-ml-netsharpreference-guide

input [3, 50, 50] rlinear [3, 5, 5] 64 convolve Input images 64 maps [1, 4, 4] response norm normalize [1, 3, 3] max pool max pooling output [13] softmax all all fully connected output

Azure storage Storage blob Images SQL Server Train model Data Science Virtual machine Skyserver database SQL2016 R Services Azure N Series GPU VM Web Azure

Train neural network using GPU on Azure GPU = Graphical processing unit CPU: 30 hrs GPU: 3 hrs

Call to remote SQL Server instance with R inside

Data Scientist Interacts directly with data Creates models and experiments Data Analyst/DBA Manages data and analytics together Extensibility R Integration R Analytic Library Relational Data? 010010 100100 010101 T-SQL Interface open source/microsoft R Example Solutions Fraud detection Sales forecasting Warehouse efficiency Predictive maintenance How is it Integrated? T-SQL calls a Stored Procedure Script is run in SQL through extensibility model Result sets sent through Web API to database or applications Benefits Faster deployment of ML models Less data movement, faster insights Work with large datasets: mitigate R memory and scalability limitations

Demonstration

Publish service with mrsdeploy Easy Consumption Easy Deployment Data Scientist Microsoft R Client publishservice (mrsdeploy package) Microsoft R Server configured for operationalizing R analytics Easy Setup Services / Sessions In-cloud or on-prem Adding nodes to scale High availability & load balancing Remote execution server Data Scientist Microsoft R Client (mrsdeploy package) Developer Easy Integration

100K * 3 Training images, augmented with rotation 8 Layers in deep network 176K Weights to compute in network 2.5B Weight updates per second 1.8 hours Computing time on Azure N series GPU 88% Overall accuracy - training data 55% Overall accuracy - test data The technique works, but has scope for improvement!

55% Overall accuracy on test data

Convolutional neural nets can predict galaxy class You can use R Server to train and deploy a model Use Azure GPU machines for faster training Deploy to SQL server

Please evaluate all sessions! QR / LINK on posters and in program

Easy deployment Build the model first Deploy as a web service instantly

Johannes Vermeer, The Astronomer

R Open Open source R Compatible with CRAN MKL for fast linear algebra R Open Microsoft R Server RTVS DeployR ConnectR Connectivity to databases and Hadoop ScaleR Parallel computing Large scale analytics DistributedR Distributed computing Cross-platform portability

Scalable computing, storage and services

SQL Server 2016 Enterprise Edition SQL Server Query Processor SQL Server R Services Integration Facilities: Component Integration Launchers Parameter Passing Results Return Console Output Return Parallel Data Exchange (RTM) Stored Procedures Package Administration Microsoft R Open Algorithm Library Open Source R Interpreter Fast, Parallel, Storage Efficient Algorithms Data Prep Descriptive Stats Sampling Statistical Tests Predictive Models 100% Open Source R Fully CRAN Compatible Accelerated Math Variable Selection Clustering Classification Custom APIs for R + CRAN Parallel Scoring