Creating a Cloud Based Machine Learning Platform at the Petabyte Scale クラウド基盤の機械学習プラットフォームをペタバイト規模で作成する

Size: px
Start display at page:

Download "Creating a Cloud Based Machine Learning Platform at the Petabyte Scale クラウド基盤の機械学習プラットフォームをペタバイト規模で作成する"

Transcription

1 Creating a Cloud Based Machine Learning Platform at the Petabyte Scale クラウド基盤の機械学習プラットフォームをペタバイト規模で作成する Alex Sadovsky alex.sadovsky@oracle.com Director of Data Science Oracle Data Cloud

2 Our mission: Unlocking the value of data to help advertisers connect more deeply with customers

3 Our data Over 5 Billion global cookie IDs 1.05 Billion mobile device IDs Even North Korea Over 15 Million domains 1.1 Trillion page views $3 Trillion in annual observed consumer spending Lowest Activity Highest Activity 3

4 This gives us a full view of the consumer What they BUY What they DO Where they GO Who they ARE 4

5 Down to an extremely detailed level Married: Yes Kids: 2 Education level: College grad Online sites: Kayak travel Italy Travel - High-end hotels Cars - cross-over vehicle DMA: San Diego Hobby: Photography Buys high-end women s apparel Drives a Honda CRV Avid home cook Organic product purchaser Buys children s apparel Favorite wine is Pinot Noir Shops at luxury retailers 5

6 We use machine learning to go from data to targeted advertising Start with an action a client wants to find more of Apply machine learning to model consumer action patterns Use models to create audiences of potential future purchasers Target audiences across the digital landscape Add in all of our consumer data 6

7 We do this on a stable, self built, machine learning platform Every month we create: 10,000+ models Each with terabytes of data and best in class machine learning accuracy 7

8 So let s discuss how we do it! How can developers create a machine learning platform that can: Use as much data as possible? Utilize a data driven machine learning approach? Scale to meet business needs? 8

9 Agenda 1. Letting data tell the story 2. Creating effective cloud infrastructure 3. Constructing an end to end platform 9

10 Letting data tell the story: The data Data is our most precious asset, but we often have too much for a single computer We can use Hadoop to create large scale, distributed data sets from data stored in cloud object stores We can use Hive to access this data in simple SQL, making complex multicomputer computation as simple as: SELECT SUM(spend) FROM consumers WHERE age > 21 10

11 Letting data tell the story: Modeling The wrong way to model data: Manually pick a modeling method for your data Apply it Height Height Newborn Age 10 Years Newborn Age 100 Years 11

12 Letting data tell the story: Modeling No Free Lunch Theorem For any algorithm, any elevated performance for one class of problems is offset by subpar performance for another class So what can we do? Try multiple algorithms/models with training data and pick the one or combination that works best with machine learning We can do it easily in under 50 lines of code! 12

13 Letting data tell the story: Python Pros: For machine learning, Python lets us quickly write code that is easily readable and able to be interfaced with other languages and software Libraries like scikit-learn, like in the previous example, create access to hundreds of machine learning algorithms Cons: Limited to single machines Not great for big data out of the box So how can we fix that? 13

14 Letting data tell the story: Python + Spark Spark is a distributed, in memory, big data processing engine Pyspark gives us the simplicity of a Python API with the power of Spark text_file = spark.textfile("hdfs://...") counts = text_file.flatmap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b) counts.saveastextfile("hdfs://...") Distributed computing in 5 lines of code! But what if we want to use deep learning? 14

15 Letting data tell the story: Deep learning The simplicity of a Python API on top of a C/C++ engine GPU & Distributed processing made easy 15

16 Letting data tell the story: Python + Tensorflow from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import numpy as np iris = load_iris() X = iris.data[:, (2, 3)] # petal length, petal width X = X.astype(np.float32) y = (iris.target == 0).astype(np.int) X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33) import tensorflow as tf tf.logging.set_verbosity(tf.logging.info) feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(x_train) clf = tf.contrib.learn.dnnclassifier(hidden_units=[2, 3], n_classes=2, feature_columns=feature_columns) clf.fit(x=x_train, y=y_train, batch_size=50, steps=500) y_pred = clf.predict(x=x_test, batch_size=50, as_iterable=false) from sklearn.metrics import accuracy_score print accuracy_score(y_test,y_pred) Deep learning in <21 lines of code! 16

17 Agenda 1. Letting data tell the story 2. Creating effective cloud infrastructure 3. Constructing an end to end platform 17

18 Creating effective cloud infrastructure: Why cloud? CPU Bound CPU + GPU Bound Network, Memory, CPU Bound The needs of data science change all the time The cloud lets us keep our infrastructure current and fit for the task at hand But how can we efficient keep track of all of our cloud databases, networks, and computers? 18

19 Creating effective cloud infrastructure: Infrastructure as code Amazon Web Services provider "aws" {... } resource "aws_vpc" "default" {... } resource "aws_instance" model_cpu" {... } provider "baremetal" {... } Oracle Bare Metal Cloud resource "baremetal_core_virtual_network default {... } resource "baremetal_core_instance" model_cpu" {... } 19

20 Creating effective cloud infrastructure: Multiple environments $ terraform apply $ terraform destroy Bring up entire cloud infrastructure Shutdown cloud infrastrucutre Test Environment </code> Production Environment Staging Environment 20

21 Creating effective cloud infrastructure: How can we package different software and code? Containers! All machine learning code can be stored in Docker containers Resource isolation and allocation like a virtual machine Abstract the OS Memory CPU 21

22 Creating effective cloud infrastructure: Failure recovery 22

23 Agenda 1. Letting data tell the story 2. Creating effective cloud infrastructure 3. Constructing an end to end platform 23

24 Constructing an end to end platform Flexible Service oriented Scalable For data For compute Robust Containerized and failsafe 24

25 Creating a cloud based machine learning platform at the petabyte scale Lessons learned: 1. Use a scalable data infrastructure to make use of all data 2. Let the data drive machine learning decisions 3. Use the cloud to create a flexible, scalable, and robust platform Thank you! 25

26 26

27

Ubuntuを利用した世界最高のOSSプラットフォーム. Takaaki Suzuki Canonical - Solutions Architect

Ubuntuを利用した世界最高のOSSプラットフォーム. Takaaki Suzuki Canonical - Solutions Architect Ubuntuを利用した世界最高のOSSプラットフォーム Takaaki Suzuki Canonical - Solutions Architect Canonical is a company behind Ubuntu 2004 FOUNDED 800< EMPLOYEES 34+ COUNTRIES London Tokyo Boston Beijing Shanghai Taipei Canonical

More information

Lab Five. COMP Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves. October 29th 2018

Lab Five. COMP Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves. October 29th 2018 Lab Five COMP 219 - Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves October 29th 2018 1 Decision Trees and Random Forests 1.1 Reading Begin by reading chapter three of Python Machine

More information

Lab Four. COMP Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves. October 22nd 2018

Lab Four. COMP Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves. October 22nd 2018 Lab Four COMP 219 - Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves October 22nd 2018 1 Reading Begin by reading chapter three of Python Machine Learning until page 80 found in the learning

More information

SUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model?

SUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model? SUPERVISED LEARNING WITH SCIKIT-LEARN How good is your model? Classification metrics Measuring model performance with accuracy: Fraction of correctly classified samples Not always a useful metric Class

More information

Analyze the work and depth of this algorithm. These should both be given with high probability bounds.

Analyze the work and depth of this algorithm. These should both be given with high probability bounds. CME 323: Distributed Algorithms and Optimization Instructor: Reza Zadeh (rezab@stanford.edu) TA: Yokila Arora (yarora@stanford.edu) HW#2 Solution 1. List Prefix Sums As described in class, List Prefix

More information

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time, Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since

More information

Practical example - classifier margin

Practical example - classifier margin Support Vector Machines (SVMs) SVMs are very powerful binary classifiers, based on the Statistical Learning Theory (SLT) framework. SVMs can be used to solve hard classification problems, where they look

More information

EPL451: Data Mining on the Web Lab 5

EPL451: Data Mining on the Web Lab 5 EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available

More information

IBM Bluemix compute capabilities IBM Corporation

IBM Bluemix compute capabilities IBM Corporation IBM Bluemix compute capabilities After you complete this section, you should understand: IBM Bluemix infrastructure compute options Bare metal servers Virtual servers IBM Bluemix Container Service IBM

More information

Creating Shared Digital Value at Qwant: Protecting Privacy while Remaining Profitable

Creating Shared Digital Value at Qwant: Protecting Privacy while Remaining Profitable Creating Shared Digital Value at Qwant: Protecting Privacy while Remaining Profitable Eric LEANDRI President, QWANT 2018 TM Forum 1 The state of the Internet 2018 TM Forum 2 The state of the Internet 7.6

More information

MATH 829: Introduction to Data Mining and Analysis Model selection

MATH 829: Introduction to Data Mining and Analysis Model selection 1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

ELASTIC DATA PLATFORM

ELASTIC DATA PLATFORM SERVICE OVERVIEW ELASTIC DATA PLATFORM A scalable and efficient approach to provisioning analytics sandboxes with a data lake ESSENTIALS Powerful: provide read-only data to anyone in the enterprise while

More information

Falling Out of the Clouds: When Your Big Data Needs a New Home

Falling Out of the Clouds: When Your Big Data Needs a New Home Falling Out of the Clouds: When Your Big Data Needs a New Home Executive Summary Today s public cloud computing infrastructures are not architected to support truly large Big Data applications. While it

More information

A Tutorial on Apache Spark

A Tutorial on Apache Spark A Tutorial on Apache Spark A Practical Perspective By Harold Mitchell The Goal Learning Outcomes The Goal Learning Outcomes NOTE: The setup, installation, and examples assume Windows user Learn the following:

More information

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions Providing Superior Server and Storage Performance, Efficiency and Return on Investment As Announced and Demonstrated at

More information

Lecture 4 Classification

Lecture 4 Classification 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 4 Classification Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research Center

More information

Accelerate your SAS analytics to take the gold

Accelerate your SAS analytics to take the gold Accelerate your SAS analytics to take the gold A White Paper by Fuzzy Logix Whatever the nature of your business s analytics environment we are sure you are under increasing pressure to deliver more: more

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack

High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack Eve Kleinknecht Principal Product Manager Thorsten Früauf Principal Software Engineer November 18, 2015 Safe Harbor Statement

More information

Container-Native Storage

Container-Native Storage Container-Native Storage Solving the Persistent Storage Challenge with GlusterFS Michael Adam Manager, Software Engineering José A. Rivera Senior Software Engineer 2017.09.11 WARNING The following presentation

More information

How to Grow Successfully in Emerging Retail Markets with Openbravo October 27th, Openbravo Inc. All Rights Reserved. 1

How to Grow Successfully in Emerging Retail Markets with Openbravo October 27th, Openbravo Inc. All Rights Reserved. 1 How to Grow Successfully in Emerging Retail Markets with Openbravo October 27th, 2016 2016 Openbravo Inc. All Rights Reserved. 1 Today s Presenters Francesco Leto Business Development Director for New

More information

Oracle Innovation Summit Tokyo 2018 GAUSS

Oracle Innovation Summit Tokyo 2018 GAUSS Oracle Innovation Summit Tokyo 2018 AI GAUSS OracleJavaOracle Corporation Copyright 2018, Oracle and/or its affiliates. All rights reserved. 2 Artificial Intelligence Internet of Things Blockchain Copyright

More information

Deep Learning Frameworks with Spark and GPUs

Deep Learning Frameworks with Spark and GPUs Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,

More information

Hands-on Machine Learning for Cybersecurity

Hands-on Machine Learning for Cybersecurity Hands-on Machine Learning for Cybersecurity James Walden 1 1 Center for Information Security Northern Kentucky University 11th Annual NKU Cybersecurity Symposium Highland Heights, KY October 11, 2018 Topics

More information

Olivia Klose Technical Evangelist. Sascha Dittmann Cloud Solution Architect

Olivia Klose Technical Evangelist. Sascha Dittmann Cloud Solution Architect Olivia Klose Technical Evangelist Sascha Dittmann Cloud Solution Architect What is Apache Spark? Apache Spark is a fast and general engine for large-scale data processing. An unified, open source, parallel,

More information

Kubernetes for Stateful Workloads Benchmarks

Kubernetes for Stateful Workloads Benchmarks Kubernetes for Stateful Workloads Benchmarks Baremetal Like Performance for For Big Data, Databases And AI/ML Executive Summary Customers are actively evaluating stateful workloads for containerization

More information

Choosing the Right Container Infrastructure for Your Organization

Choosing the Right Container Infrastructure for Your Organization WHITE PAPER Choosing the Right Container Infrastructure for Your Organization Container adoption is accelerating rapidly. Gartner predicts that by 2018 more than 50% of new workloads will be deployed into

More information

Intro Cassandra. Adelaide Big Data Meetup.

Intro Cassandra. Adelaide Big Data Meetup. Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,

More information

Virtual vs Physical ADC

Virtual vs Physical ADC WHITEPAPER What are the primary differences and the pros and cons of virtual vs physical application delivery controllers? Snapt Technical Team sales@snapt.net pg. 1 Forward-thinking organizations are

More information

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS WHITE PAPER FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS Over the past 15 years, server virtualization has become the preferred method of application deployment in the enterprise datacenter.

More information

Big Data Infrastructure at Spotify

Big Data Infrastructure at Spotify Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system

More information

MapReduce, Hadoop and Spark. Bompotas Agorakis

MapReduce, Hadoop and Spark. Bompotas Agorakis MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)

More information

Spatial Analytics Built for Big Data Platforms

Spatial Analytics Built for Big Data Platforms Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of

More information

An Insider s Guide to Oracle Autonomous Transaction Processing

An Insider s Guide to Oracle Autonomous Transaction Processing An Insider s Guide to Oracle Autonomous Transaction Processing Maria Colgan Master Product Manager Troy Anthony Senior Director, Product Management #thinkautonomous Autonomous Database Traditionally each

More information

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018 Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized

More information

State of OpenShift on Bare Metal

State of OpenShift on Bare Metal State of OpenShift on Bare Metal OpenShift Commons Gathering - Seattle Jose Palafox, Technical Program Manager for CNCF, Intel Jeremy Eder, Senior Principal Performance Engineer, Red Hat Dave Cain, Senior

More information

Agile Data Science i

Agile Data Science i i About the Tutorial Agile is a software development methodology that helps in building software through incremental sessions using short iterations of 1 to 4 weeks so that the development is aligned with

More information

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security Bringing OpenStack to the Enterprise An enterprise-class solution ensures you get the required performance, reliability, and security INTRODUCTION Organizations today frequently need to quickly get systems

More information

Running Databases in Containers.

Running Databases in Containers. Running Databases in Containers. How to Overcome the Challenges of Data Frank Stienhans CTO Prepared for Evolution of Enterprise IT Subjective Perspective CONTAINERS 1. More Choices CLOUD 2. Faster Delivery

More information

Iris Example PyTorch Implementation

Iris Example PyTorch Implementation Iris Example PyTorch Implementation February, 28 Iris Example using Pytorch.nn Using SciKit s Learn s prebuilt datset of Iris Flowers (which is in a numpy data format), we build a linear classifier in

More information

Practical Applications of Machine Learning for Image and Video in the Cloud

Practical Applications of Machine Learning for Image and Video in the Cloud Practical Applications of Machine Learning for Image and Video in the Cloud Shawn Przybilla, AWS Solutions Architect M&E @shawnprzybilla 2/27/18 There were 3.7 Billion internet users in 2017 1.2 Trillion

More information

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to

More information

SCALE AND SECURE MOBILE / IOT MQTT TRAFFIC

SCALE AND SECURE MOBILE / IOT MQTT TRAFFIC APPLICATION NOTE SCALE AND SECURE MOBILE / IOT TRAFFIC Connecting millions of devices requires a simple implementation for fast deployments, adaptive security for protection against hacker attacks, and

More information

Benchmarks Prove the Value of an Analytical Database for Big Data

Benchmarks Prove the Value of an Analytical Database for Big Data White Paper Vertica Benchmarks Prove the Value of an Analytical Database for Big Data Table of Contents page The Test... 1 Stage One: Performing Complex Analytics... 3 Stage Two: Achieving Top Speed...

More information

WHY COMPOSABLE INFRASTRUCTURE INSTEAD OF HYPERCONVERGENCE

WHY COMPOSABLE INFRASTRUCTURE INSTEAD OF HYPERCONVERGENCE WHY COMPOSABLE INFRASTRUCTURE INSTEAD OF HYPERCONVERGENCE WHO WE ARE GOAL: Composable Infrastruture HEADQUARTERS: Toronto - Canada COMPANY FOCUS: Composable infrastructure True Software Defined Datacenter

More information

Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances

Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances Advancing State-of-the-Art of Autonomous Vehicles and Robotics Research using AWS GPU Instances Adrien Gaidon - Machine Learning Lead, Toyota Research Institute Mike Garrison - Senior Systems Engineer,

More information

What Your Cloud Vendor May Not Be Telling You

What Your Cloud Vendor May Not Be Telling You What Your Cloud Vendor May Not Be Telling You And How It Could Cost You Jordan Jacobs VP of Products, SingleHop Public Cloud Dominates the News! Netflix Shifts All IT to Amazon s Public Cloud (Wall Street

More information

WITH INTEL TECHNOLOGIES

WITH INTEL TECHNOLOGIES WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel

More information

Deploy. A step-by-step guide to successfully deploying your new app with the FileMaker Platform

Deploy. A step-by-step guide to successfully deploying your new app with the FileMaker Platform Deploy A step-by-step guide to successfully deploying your new app with the FileMaker Platform Share your custom app with your team! Now that you ve used the Plan Guide to define your custom app requirements,

More information

The. C s. of Mobile Device. Management

The. C s. of Mobile Device. Management The AB C s of Mobile Device Management Hundreds of models, multiple carriers, OSs from Gingerbread to Jelly Bean Android fragmentation is scary for IT, but it s far from a problem with Mobile Device Management

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

Insights JiWire Mobile Audience Insights Report Q4 2012

Insights JiWire Mobile Audience Insights Report Q4 2012 Table of Contents Mobile Audience Trends 2-6 Connected Device Adoption & Trends 7-10 Worldwide Location Highlights 11-12 Public Wi-Fi Trends 13 79.5 % of mobile consumers are influenced by the availability

More information

The Evolution of a Data Project

The Evolution of a Data Project The Evolution of a Data Project The Evolution of a Data Project Python script The Evolution of a Data Project Python script SQL on live DB The Evolution of a Data Project Python script SQL on live DB SQL

More information

Map Reduce & Hadoop Recommended Text:

Map Reduce & Hadoop Recommended Text: Map Reduce & Hadoop Recommended Text: Hadoop: The Definitive Guide Tom White O Reilly 2010 VMware Inc. All rights reserved Big Data! Large datasets are becoming more common The New York Stock Exchange

More information

Overview of Big Data

Overview of Big Data Overview of Big Data Tools and Techniques, Discoveries and Pitfalls Spring 2018 What Does Big Data Mean? (1) Collecting large amounts of data Via computers, sensors, people, events (2) Doing something

More information

Roadmap: Operating Pentaho at Scale. Jens Bleuel Senior Product Manager, Pentaho

Roadmap: Operating Pentaho at Scale. Jens Bleuel Senior Product Manager, Pentaho Roadmap: Operating Pentaho at Scale Jens Bleuel Senior Product Manager, Pentaho Agenda Worker Nodes Hear about new upcoming capabilities for scaling out the Pentaho platform in large enterprise operations.

More information

Seagull: A distributed, fault tolerant, concurrent task runner. Sagar Patwardhan

Seagull: A distributed, fault tolerant, concurrent task runner. Sagar Patwardhan Seagull: A distributed, fault tolerant, concurrent task runner Sagar Patwardhan sagarp@yelp.com Yelp s Mission Connecting people with great local businesses. Yelp scale Outline What is Seagull? Why did

More information

THE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM

THE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM THE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM FIRST WELCOME TO VERNE GLOBAL Established in Iceland 2007 Optimised industrial scale data center solutions exploiting Iceland s cool

More information

Introducing Amazon Elastic File System (EFS)

Introducing Amazon Elastic File System (EFS) Introducing Amazon Elastic File System (EFS) Danilo Poccia, Technical Evangelist, AWS @danilop 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Goals and expectations for this session

More information

White paper: Agentless Backup is Not a Myth. Agentless Backup is Not a Myth

White paper: Agentless Backup is Not a Myth. Agentless Backup is Not a Myth White paper: less Backup is Not a Myth less Backup is Not a Myth White paper: less Backup is Not a Myth Executive Summary Backup and recovery software typically requires agents that are installed onto

More information

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

State of Mobile Commerce. Q

State of Mobile Commerce. Q State of Mobile Commerce. Q4 2014. JANUARY 2015 Executive Summary. Mobile commerce adoption is far ahead of expectations. Globally, mobile now accounts for 30% of ecommerce transactions. It is expected

More information

Horizont HPE Synergy. Matt Foley, EMEA Hybrid IT Presales. October Copyright 2015 Hewlett Packard Enterprise Development LP

Horizont HPE Synergy. Matt Foley, EMEA Hybrid IT Presales. October Copyright 2015 Hewlett Packard Enterprise Development LP Horizont 2016 HPE Synergy Matt Foley, EMEA Hybrid IT Presales Copyright 2015 Hewlett Packard Enterprise Development LP October 2016 Where we started Remember this? 2 Strategy, circa 2007 3 Change-ready

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

Cloud Open Source Innovation on Software Defined Storage

Cloud Open Source Innovation on Software Defined Storage NorthEast ASIA OSS Promotion Forum Cloud Open Source Innovation on Software Defined Storage Hiroshi Miura Director of Japan OSS Promotion Forum OSS Cloud Evangelist, NTT DATA Corporation. Copyright 2014

More information

VAB REPORT: MVPD S VOD DRIVING ENGAGEMENT & AD LIFT

VAB REPORT: MVPD S VOD DRIVING ENGAGEMENT & AD LIFT PGM VAB REPORT: MVPD S VOD DRIVING & AD LIFT MVPD s VOD Insights MVPD s On-demand platform is a top preference for viewing programming content VOD is available in 62% of the Homes Growth driven by improved

More information

SOFTWARE DEFINED STORAGE VS. TRADITIONAL SAN AND NAS

SOFTWARE DEFINED STORAGE VS. TRADITIONAL SAN AND NAS WHITE PAPER SOFTWARE DEFINED STORAGE VS. TRADITIONAL SAN AND NAS This white paper describes, from a storage vendor perspective, the major differences between Software Defined Storage and traditional SAN

More information

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development June14, 2012 1 Copyright 2012, Oracle and/or its affiliates. All rights Agenda Big Data Overview Oracle NoSQL Database Architecture Technical

More information

Lab 15 - Support Vector Machines in Python

Lab 15 - Support Vector Machines in Python Lab 15 - Support Vector Machines in Python November 29, 2016 This lab on Support Vector Machines is a Python adaptation of p. 359-366 of Introduction to Statistical Learning with Applications in R by Gareth

More information

pescador Documentation

pescador Documentation pescador Documentation Release Brian McFee and Eric Humphrey July 28, 2016 Contents 1 Simple example 1 1.1 Batch generators............................................. 1 1.2 StreamLearner..............................................

More information

CloudExpo November 2017 Tomer Levi

CloudExpo November 2017 Tomer Levi CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes

More information

Engage with ESRI in the AWS Cloud. Teresa Carlson, VP of Global Public Sector

Engage with ESRI in the AWS Cloud. Teresa Carlson, VP of Global Public Sector Engage with ESRI in the AWS Cloud Teresa Carlson, VP of Global Public Sector On Premise Infrastructure is Costly & Complex Large Capital Expenditures Patching Software Scaling down as needed Contract negotiation

More information

REBIT PARTNER PROGRAM. Overview and Guide

REBIT PARTNER PROGRAM. Overview and Guide REBIT PARTNER PROGRAM Overview and Guide Generate a great new recurring revenue stream, provide your customers peace of mind, and reduce your troubleshooting effort by becoming a Rebit partner. Page 1

More information

TensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team

TensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team TensorFlowOnSpark Scalable TensorFlow Learning on Spark Clusters Lee Yang, Andrew Feng Yahoo Big Data ML Platform Team What is TensorFlowOnSpark Why TensorFlowOnSpark at Yahoo? Major contributor to open-source

More information

CLOUD COMPUTING ARTICLE. Submitted by: M. Rehan Asghar BSSE Faizan Ali Khan BSSE Ahmed Sharafat BSSE

CLOUD COMPUTING ARTICLE. Submitted by: M. Rehan Asghar BSSE Faizan Ali Khan BSSE Ahmed Sharafat BSSE CLOUD COMPUTING ARTICLE Submitted by: M. Rehan Asghar BSSE 715126 Faizan Ali Khan BSSE 715125 Ahmed Sharafat BSSE 715109 Murawat Hussain BSSE 715129 M. Haris BSSE 715123 Submitted to: Sir Iftikhar Shah

More information

Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments

Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments Archiving, Backup, and Recovery for Complete the Promise of Virtualisation Unified information management for enterprise Windows environments The explosion of unstructured information It is estimated that

More information

CSE6331: Cloud Computing

CSE6331: Cloud Computing CSE6331: Cloud Computing Leonidas Fegaras University of Texas at Arlington c 2019 by Leonidas Fegaras Cloud Computing Fundamentals Based on: J. Freire s class notes on Big Data http://vgc.poly.edu/~juliana/courses/bigdata2016/

More information

Cloud Computing. Technologies and Types

Cloud Computing. Technologies and Types Cloud Computing Cloud Computing Technologies and Types Dell Zhang Birkbeck, University of London 2017/18 The Technological Underpinnings of Cloud Computing Data centres Virtualisation RESTful APIs Cloud

More information

nolearn Documentation

nolearn Documentation nolearn Documentation Release 0.6 Daniel Nouri September 06, 2016 Contents 1 Installation 3 2 Modules 5 2.1 nolearn.cache............................................ 5 2.2 nolearn.dbn.............................................

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Nabil Abdennadher nabil.abdennadher@hesge.ch 2017/2018 1 Plan Context Definition Market Cloud service models Cloud deployments models Key drivers to adopting the Cloud Barriers

More information

High Performance and Cloud Computing (HPCC) for Bioinformatics

High Performance and Cloud Computing (HPCC) for Bioinformatics High Performance and Cloud Computing (HPCC) for Bioinformatics King Jordan Georgia Tech January 13, 2016 Adopted From BIOS-ICGEB HPCC for Bioinformatics 1 Outline High performance computing (HPC) Cloud

More information

Smart Data Catalog DATASHEET

Smart Data Catalog DATASHEET DATASHEET Smart Data Catalog There is so much data distributed across organizations that data and business professionals don t know what data is available or valuable. When it s time to create a new report

More information

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice 2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data

More information

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor WELCOME TO Operant conditioning 1938 Spiking Neuron 1952 first neuroscience department 1964 Deep learning prevalence mid 2000s The Turing Machine 1936 Transistor 1947 First computer science department

More information

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992 2014-05-20 MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992 @SoQooL http://blog.mssqlserver.se Mattias.Lind@Sogeti.se 1 The evolution of the Microsoft data platform

More information

CREATIVITY MAKES THE DIFFERENCE

CREATIVITY MAKES THE DIFFERENCE CREATIVITY MAKES THE DIFFERENCE Your school has a big challenge: preparing Generation Z for a rapidly changing world and jobs that don t yet exist. Along with learning digital skills, your students need

More information

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2 1 Senior Application Engineer The MathWorks Korea 2017 The MathWorks, Inc. 2 Data Analytics Workflow Business Systems Smart Connected Systems Data Acquisition Engineering, Scientific, and Field Business

More information

Iggy Fernandez Infrastructure as Code demo by Artem Danielov

Iggy Fernandez Infrastructure as Code demo by Artem Danielov Iggy Fernandez Infrastructure as Code demo by Artem Danielov NoCOUG Journal Trends and Game-Changers in the Cloud 2 Latest Issue Trends and Game-Changers in the Cloud 3 NoCOUG Journal Archive wget www.nocoug.org/journal/nocoug_journal_{2001..2018}{02..12..3}.pdf

More information

Leading Innovation in the Data Center

Leading Innovation in the Data Center Leading Innovation in the Data Center Bjørn R. Martinussen Technical Solutions ArchitectARCHITECT Oslo, October 2012 1 Agenda IT Challenges Today EMC + Cisco + Intel Cisco Unified Data Center Joint DC

More information

Netezza The Analytics Appliance

Netezza The Analytics Appliance Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for

More information

CIT 668: System Architecture. Amazon Web Services

CIT 668: System Architecture. Amazon Web Services CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions

More information

THE INTERNET IS LOADED WITH TOO MUCH CONTENT

THE INTERNET IS LOADED WITH TOO MUCH CONTENT THE INTERNET IS LOADED WITH TOO MUCH CONTENT As a publisher, you re contributing to the 2.5 quintillion bytes of data that is made everyday! The old method of publishing tons of content isn t as effective

More information

AtSNP Infrastructure a case study for searching billions of records while providing significant cost savings over cloud providers

AtSNP Infrastructure a case study for searching billions of records while providing significant cost savings over cloud providers AtSNP Infrastructure a case study for searching billions of records while providing significant cost savings over cloud providers Christopher Harrison, Sündüz Keleş, Rebecca Hudson, Sunyoung Shin and Inês

More information

Samu Konttinen, CEO Q4 / 2017 CORPORATE SECURITY REVENUE GROWTH ACCELERATED TO 16%

Samu Konttinen, CEO Q4 / 2017 CORPORATE SECURITY REVENUE GROWTH ACCELERATED TO 16% Samu Konttinen, CEO Q4 / 2017 CORPORATE SECURITY REVENUE GROWTH ACCELERATED TO 16% 1 AGENDA Key takeaways from Q4 Key figures Business review for 2017 Outlook 2018 Outlook for 2018-2021 Financials FAQ

More information

COLUMBUS. Business Solutions. Cloud Unified Communications & Interoperability. Javier Pereira Director of Product Development November 2011

COLUMBUS. Business Solutions. Cloud Unified Communications & Interoperability. Javier Pereira Director of Product Development November 2011 COLUMBUS Business Solutions Cloud Unified Communications & Interoperability Javier Pereira Director of Product Development November 2011 Industry Tendencies Gartner Reported that the second priority for

More information

Supercomputing made super human

Supercomputing made super human Supercomputing made super human The New Age of Accelerated Computing: A History of Innovation and Optimization in Computing Steve Hebert, Cofounder and CEO, Nimbix 2 1880 census had taken eight years to

More information

Empowering 21st Century Learning with EcoStruxure for Data Centers

Empowering 21st Century Learning with EcoStruxure for Data Centers Empowering 21st Century Learning with EcoStruxure for Data Centers 2 We believe that effective use of educational technology will help promote lifelong learning and foster academic excellence while preparing

More information

Leveraging AI on the Cloud to transform your business. Florida Business Analytics Forum 2018 at University of South Florida

Leveraging AI on the Cloud to transform your business. Florida Business Analytics Forum 2018 at University of South Florida Leveraging AI on the Cloud to transform your business Florida Business Analytics Forum 2018 at University of South Florida 1 My (unusual) path to Google Neural networks at NOAA 2 DNNs solved image analysis

More information