Image Processing on the Cloud. Outline

Similar documents
Welcome to the New Era of Cloud Computing

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

5 Fundamental Strategies for Building a Data-centered Data Center

Data Centers and Cloud Computing. Data Centers

Amazon Web Services Cloud Computing in Action. Jeff Barr

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Improving Oceanographic Anomaly Detection Using High Performance Computing

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

IBM SmartCloud Resilience offers cloud-based services to support a more rapid, reliable and cost-effective enterprise-wide resiliency.

Next-Generation Cloud Platform

White Paper. Platform9 ROI for Hybrid Clouds

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Looking ahead with IBM i. 10+ year roadmap

Emerging Technologies for HPC Storage

Cloud Computing. UCD IT Services Experience

A Cloud-based Architecture for Processing 3D Mars Terrain

Dell Technologies IoT Solution Surveillance with Genetec Security Center

On demand Satellite Image Processing Next generation technology for processing Terabytes of imagery on the Cloud White Paper July, 2011

Accuracy Assessment of Ames Stereo Pipeline Derived DEMs Using a Weighted Spatial Dependence Model

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

PERFORMANCE CHARACTERIZATION OF MICROSOFT SQL SERVER USING VMWARE CLOUD ON AWS PERFORMANCE STUDY JULY 2018

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

Virtualization Overview. Joel Jaeggli AFNOG SS-E 2013

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST

Introducing SUSE Enterprise Storage 5

Life In The Flash Director - EMC Flash Strategy (Cross BU)

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Storage Optimization with Oracle Database 11g

A10 HARMONY CONTROLLER

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

TPP On The Cloud. Joe Slagel

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud

escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

ClickHouse on MemCloud Performance Benchmarks by Altinity

Data Curation Practices at the Oak Ridge National Laboratory Distributed Active Archive Center

Data Protection at Cloud Scale. A reference architecture for VMware Data Protection using CommVault Simpana IntelliSnap and Dell Compellent Storage

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

The BioHPC Nucleus Cluster & Future Developments

Processing and dissemination of satellite remote sensing data in an heterogeneous environment.

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Building Software to Translate

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

Hadoop/MapReduce Computing Paradigm

VMware Virtual SAN Technology

Low-cost BYO Mass Storage Project

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

The Fastest And Most Efficient Block Storage Software (SDS)

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

Forget about the Clouds, Shoot for the MOON

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

vrealize Business Standard User Guide

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

How to Keep UP Through Digital Transformation with Next-Generation App Development

A Comparative Study of High Performance Computing on the Cloud. Lots of authors, including Xin Yuan Presentation by: Carlos Sanchez

The Cirrus Research Computing Cloud

SCALABLE TRAJECTORY DESIGN WITH COTS SOFTWARE. x8534, x8505,

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

Cloud Computing 1. CSCI 4850/5850 High-Performance Computing Spring 2018

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

How to Cloud for Earth Scientists: An Introduction

HPC Storage Use Cases & Future Trends

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Virtual Desktop Infrastructure (VDI) Bassam Jbara

CS 425 / ECE 428 Distributed Systems Fall 2015

Oracle Database and Application Solutions

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Big Data solution benchmark

Setup Lab. A quick guide to infrastructure tools for EPL371

MAGMA: a New Generation

Malling U3A Computer Group. Cloud Storage. Chris Daly 3 rd April 2017

LUNAR TEMPERATURE CALCULATIONS ON A GPU

EMC XtremIO All-Flash Applications. Sonny Aulakh VP, Sales Engineering November 2014

Cross-layer Optimization for Virtual Machine Resource Management

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud

PROCESSING THE GAIA DATA IN CNES: THE GREAT ADVENTURE INTO HADOOP WORLD

Grow Your Business & Expand Your Service Offerings

The MapReduce Framework

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Transcription:

Mars Science Laboratory! Image Processing on the Cloud Emily Law Cloud Computing Workshop ESIP 2012 Summer Meeting July 14 th, 2012 1/26/12! 1 Outline Cloud computing @ JPL SDS Lunar images Challenge Image tiling process Implementations Analysis Summary 2 2! 1

1/26/12 Science Data Systems Cover a wide variety of domain disciplines Solar system exploration, Astrophysics, Earth science, Biomedicine, etc, Each has its own communities, standards and systems But, there is a set of common components & constraints Some can greatly benefit from proven cloud computing technology Data Processing Data Repository Data Distribution 3 3! Earth Science Data Systems On Board Processing TDRS Network Science Data Processing Data Acquisi+on and Command Research Network w/ Cloud Storage & Computation Mission Mission Opera+on Opera+ons s Instrume nt Instrument Opera+on Opera+ons s L0A L0A Processing Processin EDOS/ g Ground Data Systems Science Teams NASA Mission/ Mul+- Mission Science Data & Science Data Centers Manage Analysis, Modeling and Applica+on Environments Applica+ons Decision Support Archive Data Centers Other Data Systems (e.g. NOAA) 4 4! 2

1/26/12 Lunar Modeling and Mapping Project (LMMP) Provides science and exploration community a suite of lunar mapping and modeling tools and products that support the lunar exploration activities The tools and products are made available through a common, intuitive NASA portal Utilizes open standards and facilitates platform and application independent access 5 5! Challenge How to make these large images usable by desktop computers, mobile devices and other memory constrained products? 6 6! 3

1/26/12 Tiling Process Divides images into small tiles Combines and shrinks for the next zoom level Iterates till the zoom level has only 1 tile 7 7! Using Hadoop Hadoop is an implementation of Google s Map-Reduce algorithm Map Function Takes a subset of the data, performs a computation, and returns an output. Reduce Function Consolidates outputs from the map function to generate another output 8 8! 4

In-House Implementation Test image, 2.77 gigabytes LRO LOLA (Lunar Orbiter Laser Altimeter) colorized digital elevation map which produced 9.1 gigabytes set of tiles Ran Hadoop on local machines in the lab 2 Sun Fire x4170 machines running dual Xeon X5570 processors with 72 GBs of RAM with a heterogeneous mix of Solaris 10 and Linux Performance was excellent Machines are costly to maintain, especially since these tasks are bursty 9 9! Cloud Implementation Using Amazon EC2 Amazon EC2 is a cloud computing infrastructure allowing users to rent virtual machines Installed Hadoop Elastic MapReduce framework on a number of EC2 instances Output image files stored on Amazon S3, a cloud storage system 10 10! 5

Configurations Configuration 1 - In-House 2x Sun Fire 4170 72 GB RAM, 64 GB SSD Storage $10K each, plus administration and infrastructure costs Configuration 2-20 EC2 Large 20 EC2 Large Instances (4 Compute Units ~ 4x1GHz Xeon) 7.5 GB RAM, 850 GB Storage $0.34/instance/hour plus bandwidth Configuration 3-4 EC2 CC 4 EC2 Cluster Compute Instances (33.5 Compute Units) 23 GB RAM, 1.69 TB Storage $1.60/instance/hour plus bandwidth 11 11! Performance Configuration 3 4 EC2 CC Instances Configuration 2 20 EC2 Large Instances Configuration 1 2 Sun Fire 4170 Storage Time (Writing to persistent storage) Processing Time Upload Time (Set up processing) 0 20 40 60 80 100 120 Min 12 12! 6

Cost In-House Implementation Total Cost: $20K + SA + infrastructure 20 EC2 Large Processing: 2h 20 $0.34 = $13.60 Bandwidth: 3GB $.10 = $.30 Storage: 10GB $.14 = $1.40/month Total Cost: $15.30 4 EC2 CC Processing: 1h 4 $1.60 = $6.40 Bandwidth: 3GB $.10 = $.30 Storage: 10GB $.14 = $1.40/month Total Cost: $8.10 13 13! Performance Analysis In-House Implementation Fastest overall Did not need to export data to remote systems Most expensive from a cost-benefit perspective Cloud Implementation Upload and storage time a consideration Network speed between Hadoop nodes a significant consideration Most cost-effective for occasional, computationally intensive jobs Copyright PRE-DECISIONAL 2012 California DRAFT; Institute For of planning Technology. and discussion Government purposes sponsorship only acknowledged.! 14 14! 7

Conclusion Hadoop framework provides a simple programmatic interface for developing distributed computing applications for problems that are parallelizable Problems that required large amounts of data will depend on the interconnect speeds between nodes Cloud computing gives a cost-effective infrastructure to use compute capacity as needed In designing applications for cloud, must consider the performance of locally run machines vs. the price of cloud instances Security should also be considered in using public infrastructure We are using a hybrid system where private data is hosted locally while public data is on the cloud Copyright PRE-DECISIONAL 2012 California DRAFT; Institute For planning of Technology. and discussion Government purposes only sponsorship acknowledged.! 15 15! 8