Using the Open Science Data Cloud for Data Science Research. Robert Grossman University of Chicago Open Cloud Consor=um June 17, 2013

Size: px
Start display at page:

Download "Using the Open Science Data Cloud for Data Science Research. Robert Grossman University of Chicago Open Cloud Consor=um June 17, 2013"

Transcription

1 Using the Open Science Data Cloud for Data Science Research Robert Grossman University of Chicago Open Cloud Consor=um June 17, 2013

2 Discoveries Team: you and your colleagues correla=on + algorithms + Instrument: 3000 cores / 5 PB OSDC science cloud + Data: 1 PB of OSDC data across several disciplines

3 Part 1 What Instrument Do we Use to Make Big Data Discoveries? How do we build a datascope?

4 What is big data? W? KW? MW? TB? PB? EB?

5 An algorithm and compu=ng infrastructure is big- data scalable if adding a rack (or container) of data (and corresponding processors) allows you to do the same computa=on in the same =me but over more data.

6 Commercial Cloud Service Provider (CSP) 15 MW Data Center Monitoring, network security and forensics Automa=c provisioning and infrastructure management Accoun=ng and billing 100,000 servers 1 PB DRAM 100 s of PB of disk Customer Facing Portal ~1 Tbps egress bandwidth 25 operators for 15 MW Commercial Cloud Data center network

7 OSDC s vote for a datascope: a (bou=que) data center scale facility with a big- data scalable analy=c infrastructure.

8 Discoveries Team: you and your colleagues correla=on + algorithms + Instrument: 3000 cores / 5 PB OSDC science cloud + Data: 1 PB of OSDC data across several disciplines

9 Some Examples of Big Data Science Discipline Dura2on Size # Devices HEP - LHC 10 years 15 PB/year* One Astronomy - LSST 10 years 12 PB/year** One Genomics - NGS 2-4 years 0.5 TB/genome 1000 s *At full capacity, the Large Hadron Collider (LHC), the world's largest par=cle accelerator, is expected to produce more than 15 million Gigabytes of data each year. This ambi=ous project connects and combines the IT power of more than 140 computer centres in 33 countries. Source: hhp://press.web.cern.ch/public/en/spotlight/spotlightgrid_ en.html **As it carries out its 10- year survey, LSST will produce over 15 terabytes of raw astronomical data each night (30 terabytes processed), resul=ng in a database catalog of 22 petabytes and an image archive of 100 petabytes. Source: hhp:// News/enews/teragrid html

10 One large instrument Many smaller instruments

11 Part 2. What is a Cloud and Why Do We Care? 11

12 There Are Two Essen=al Characteris=cs of a Cloud 1. Self service 2. Scale Clouds enable you to compute over large amounts of data with the necessity of first downloading the data. Clouds can be designed to be secure and compliant. 12

13 Self Service Self Service 13

14 Scale 14

15 Types of Clouds Public Clouds Amazon Private Clouds Run internally by universi=es or companies Community Clouds Run by organiza=ons (either formally or informally), such as the Open Cloud Consor=um 15

16 vs. Amazon Web Services (AWS)? Scale Simplicity of a credit card Wide variety of offerings. Community clouds, science clouds, etc. Lower cost (at medium scale) Data too important for commercial cloud Compu=ng over scien=fic data is a core competency Can support any required governance / security OCC supports AWS interop and burs=ng when permissible. 16

17 POV Data & Storage NFP Science Clouds Democra=ze access to data. Integrate data to make discoveries. Long term archive. Data intensive compu=ng & HP storage Science Clouds Commercial Clouds As long as you pay the bill; as long as the business model holds. Internet style scale out and object- based storage Flows Large & small data flows Lots of small web flows Streams Streaming processing required NA Accoun=ng Essen=al Essen=al Lock in Moving environment between CSPs essen=al Lock in is good Interop Cri=cal, but difficult Customers will drive to some degree 17

18 Essen=al Services for a Science CSP Support for data intensive compu=ng Support for big data flows Account management, authen=ca=on and authoriza=on services Health and status monitoring Billing and accoun=ng Ability to rapidly provision infrastructure Security services, logging, event repor=ng Access to large amounts of public data High performance storage Simple data export and import services

19 Sci CSP services Data scien=st Datascope Science Cloud Service Provider (Sci CSP)

20 Cloud Services Opera=ons Centers (CSOC) The OSDC operates Cloud Services Opera=ons Center (or CSOC). It is a CSOC focused on suppor=ng Science Clouds for researchers. Compare to Network Opera=ons Center or NOC. Both are an important part of cyber infrastructure for big data science.

21 Sci CSP services Data scien=st Datascope Science Cloud Service Provider (Sci CSP) Cloud Service Opera=ons Center (CSOC)

22 Part 3 Data Science

23 Establish best prac=ces, strategies for data science in general and discipline specific data science in par=cular Models and algorithms Data General and discipline specific souware applica=ons and tools Data Analy=c infrastructure Founda=ons of data science

24 What are the founda=ons for data science?

25 Theory to Big Data Spectrum Mathema=cal theorems Tradi=onal sta=s=cal modeling (Semi- )Automa=ng sta=s=cal modeling Simple counts and sta=s=cs over big data No data Small data Medium data GB TB PB OSDC Datascope MW Big data

26 Part 4 The Open Science Data Cloud

27 Discoveries Team: you and your colleagues correla=on + algorithms + Instrument: 3000 cores / 5 PB OSDC science cloud + Data: 1 PB of OSDC data across several disciplines

28 2013 Open Science Data Cloud (IaaS) Compliance, & security (OpenFISMA) Infrastructure automa=on & management (Yates) Accoun=ng & billing (Salesforce.com) Science Cloud SW & Services 5 PB 2013 (OpenStack & GlusterFS) Customer Facing Portal (Tukey) ~ Gbps bandwidth 5 engineers to operate 0.5 MW Science Cloud Data center network Virtual Machine (VM) containing common applica=ons & pipelines Tukey (OSDC portal & middleware v0.3) Yates (infrastructure automa=on and management v0.1) 28

29 Tukey Tukey (based in part on Horizon). We have factored out digital ID service, file sharing, and transport from Bionimbus and Matsu.

30 Yates Automa=on installa=on of OSDC souware stack on rack of computers. Based upon Chef Version 0.1

31 UDR UDT is a high performance network transport protocol UDR = rsync + UDT It is easy for an average systems administrator to keep 100 s of TB of distributed data synchronized. We are using it to distribute c. 1 PB from the OSDC

32 Open Science Data Cloud Services Digital ID services Data sharing services Data transport services (UDR) What other core services are essen&al? Of course, working groups and applica=ons always add their own services These core services will hopefully make the OSDC ahrac=ve as a plaxorm (PaaS) for scien=fic discovery.

33 U.S based not- for- profit corpora=on. Manages cloud compu=ng infrastructure to support scien=fic research: Open Science Data Cloud. Manages cloud compu=ng infrastructure to support medical and health care research: Biomedical Commons Cloud Manages cloud compu=ng testbeds: Open Cloud Testbed. 33

34 OCC Members & Partners Companies: Cisco, Yahoo!, Intel, Universi=es: University of Chicago, Northwestern Univ., Johns Hopkins, Calit2, ORNL, University of Illinois at Chicago, Federal agencies and labs: NASA Interna=onal Partners: Univ. Edinburgh, AIST (Japan), Univ. Amsterdam, Partners: Na=onal Lambda Rail 34

35 Tukey Yates + + Third party open source souware Open source souware developed by the OCC and open standards Data center Data with permissions Authoriza=on of users access to data Policies, procedures, controls, etc. + + Governance, legal agreements Sustainability model 35

36 Part 5 OSDC Data

37 Discoveries Team: you and your colleagues correla=on + algorithms + Instrument: 3000 cores / 5 PB OSDC science cloud + Data: 1 PB of OSDC data across several disciplines

38

39 OSDC Public Data Sets Over 800 TB of open access data in the OSDC Earth sciences data Biological sciences data Social sciences data Digital humani=es

40 Part 6 OSDC Working Groups Just look around you

41 Matsu Working Group: Clouds to Support Earth Science matsu.opensciencedatacloud.org 41

42 Analy=c Services NoSQL- based Analy=c Services Matsu Architecture Storage for WMS =les and derived data products NoSQL Database Presenta=on Services Matsu Web Map Tile Service (WMTS) Images at different zoom layers suitable for OGC Web Mapping Server Workflow Services MR- based Analy=c Services Streaming Analy=c Services Matsu MR- based Tiling Service MapReduce used to process Level n to Level n+1 data and to par==on images for different zoom levels Hadoop HDFS Level 0, Level 1 and Level 2 images Web Coverage Processing Service (WCPS)

43 Hadoop- Based Re- Analysis Zoom Level 1: 4 images Zoom Level 2: 16 images Zoom Level 3: 64 images Zoom Level 4: 256 images

44 Bionimbus Working Group bionimbus.opensciencedatacloud.org (biological data)

45 Bionimbus Protected Data Cloud 45

46 Analyzing Data From The Cancer Genome Atlas (TCGA) Current Prac2ce 1. Apply to dbgap for access to data. 2. Hire staff, set up and operate secure compliant compu=ng environment to mange TB of data. 3. Get environment approved by your research center. 4. Setup analysis pipelines. 5. Download data from CG- Hub (takes days to weeks). 6. Begin analysis. With Protected Data Cloud (PDC) 1. Apply to dbgap for access to data. 2. Use your era commons creden=als to login to the PDC, select the data that you want to analyze, and the pipelines that you want to use. 3. Begin analysis. 46

47 One Million Genomes Sequencing a million genomes would most likely fundamentally change the way we understand genomic varia=on. The genomic data for a pa=ent is about 1 TB (including samples from both tumor and normal =ssue). One million genomes is about 1000 PB or 1 EB With compression, it may be about 100 PB At $1000/genome, the sequencing would cost about $1B

48 Big data driven discovery on 1,000,000 genomes and 1 EB of data. Genomic- driven diagnosis Improved understanding of genomic science Genomic- driven drug development Precision diagnosis and treatment. Preven=ve health care.

49 Biomedical Commons Cloud (BCC) Working Group Medical Research Center C Medical Research Center A Cloud for Public Data Cloud for Controlled Genomic Data Medical Research Center B Cloud for EMR, PHI, data Example: Open Cloud Consor=um s Biomedical Commons Cloud (BCC) Hospital D 49

50 Resource Who users Who operates Open Science Data Cloud (OSDC) Biomedical Commons Clouds (BCC) Bionimbus Protected Data Cloud Pan science data for researchers (Interna=onal) biomedical researchers Genomics researchers Open Cloud Consor=um (OCC) supported by University OCC members OCC Biomedical Commons Cloud Working Group supported by OCC University members University of Chicago supported by the OCC 50

51 OpenFlow- Enabled Hadoop WG When running Hadoop some map and reduce jobs take significantly longer than others. These are stragglers and can significantly slow down a MapReduce computa=on. Stragglers are common (dirty secret about Hadoop) Infoblox and UChicago are leading a OCC Working Group on OpenFlow- enabled Hadoop that will provide addi=onal bandwidth to stragglers. We have a testbed for a wide area version of this project.

52 OSDC PIRE Project We select OSDC PIRE Fellows (US ci=zens or permanent residents): We give them tutorials and training on big data science. We provide them fellowships to work with OSDC interna=onal partners. We give them preferred access to the OSDC. Nominate your favorite scien=st as an OSDC PIRE Fellow. (look for PIRE)

53 Part 7 Key Ques=ons for This Workshop

54 Ques=on 1. How can we add partner sites at other loca=ons that extend the OSDC? In par=cular, how can we extend the OSDC to sites around the world? How can the OSDC interoperate with other science clouds? Ques=on 2. What data can we add to the OSDC to facilitate data intensive cross- disciplinary discoveries? Ques=on 3. How can we build a plugin structure so that Tukey can be extended by other users and by other communi=es? Ques=on 4. What tools and applica=ons can we add to the OSDC facilitate data intensive cross- disciplinary discoveries? Ques=on 5. How can we beher integrate digital IDs and file sharing services into the OSDC? Ques=on 6. What are 3-5 grand challenge ques=ons that leverage the OSDC?

55 Ques=ons

56 Robert Grossman is a faculty member at the University of Chicago. He is the Chief Research Informa=cs Officer for the Biological Sciences Division, a Faculty Member and Senior Fellow at the Computa=on Ins=tute and the Ins=tute for Genomics and Systems Biology, and a Professor of Medicine in the Sec=on of Gene=c Medicine. His research group focuses on big data, biomedical informa=cs, data science, cloud compu=ng, and related areas. He is also the Founder and a Partner of Open Data Group, which has been building predic=ve models over big data for companies for over ten years. He recently wrote a book for the general reader that discusses big data (among other topics) called the Structure of Digital Compu=ng: From Mainframes to Big Data, which can be purchased from Amazon. He blogs occasionally about big data at rgrossman.com.

Bionimbus: Lessons from a Petabyte-Scale Science Cloud Service Provider (CSP)

Bionimbus: Lessons from a Petabyte-Scale Science Cloud Service Provider (CSP) Bionimbus: Lessons from a Petabyte-Scale Science Cloud Service Provider (CSP) Robert Grossman Institute for Genomics & Systems Biology Center for Research Informatics Computation Institute Department of

More information

Florida International University

Florida International University Florida International University PARTNERSHIP FOR INTERNATIONAL RESEARCH AND EDUCATION TERENA June 3 rd,2013 Julio Ibarra, PhD. Assistant Vice President of Technology Augmented Research (CIARA) The Open

More information

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on

EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on EOSC Services & Architecture: the EOSC-hub approach Tiziana Ferrari, Project Coordinator, EGI Founda?on eosc-hub.eu @EOSC_eu EOSC-hub receives funding from the European Union s Horizon 2020 research and

More information

CANARIE: Providing Essential Digital Infrastructure for Canada

CANARIE: Providing Essential Digital Infrastructure for Canada CANARIE: Providing Essential Digital Infrastructure for Canada Mark Wolff; CTO April 16, 2014 A Transformation of the Science Paradigm thousands of years ago last few hundred years last few decades today

More information

Image Processing on the Cloud. Outline

Image Processing on the Cloud. Outline Mars Science Laboratory! Image Processing on the Cloud Emily Law Cloud Computing Workshop ESIP 2012 Summer Meeting July 14 th, 2012 1/26/12! 1 Outline Cloud computing @ JPL SDS Lunar images Challenge Image

More information

globus online The Galaxy Project and Globus Online

globus online The Galaxy Project and Globus Online globus online The Galaxy Project and Globus Online Ravi K Madduri Argonne National Lab University of Chicago Outline What is Globus Online? Globus Online and Sequencing Centers What is Galaxy? Integra;ng

More information

The Bionimbus PDC: Obtaining Access FAQ

The Bionimbus PDC: Obtaining Access FAQ The Bionimbus PDC: Obtaining Access FAQ TABLE OF CONTENTS PREREQUISITES 3 LEGAL DOCUMENTS 3 SECURITY TRAINING 3 GENERAL GUIDELINES 4 AUTH METHOD 1: USING AN ERA TO GAIN ACCESS TO A DBGAP DATASET 5 GETTING

More information

TPP On The Cloud. Joe Slagel

TPP On The Cloud. Joe Slagel TPP On The Cloud Joe Slagel Lecture topics Introduc5on to Cloud Compu5ng and Amazon Web Services Overview of TPP Cloud components Setup trial AWS and use of the new TPP Web Launcher for Amazon (TWA) Future

More information

Towards a Strategy for Data Sciences at UW

Towards a Strategy for Data Sciences at UW Towards a Strategy for Data Sciences at UW Albrecht Karle Department of Physics June 2017 High performance compu0ng infrastructure: Perspec0ves from Physics Exis0ng infrastructure and projected future

More information

Rutgers Discovery Informatics Institute (RDI2)

Rutgers Discovery Informatics Institute (RDI2) Rutgers Discovery Informatics Institute (RDI2) Manish Parashar h+p://rdi2.rutgers.edu Modern Science & Society Transformed by Compute & Data The era of Extreme Compute and Big Data New paradigms and prac3ces

More information

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal

More information

LSST: Crea*ng a Digital Universe

LSST: Crea*ng a Digital Universe LSST: Crea*ng a Digital Universe 1 LSST: Crea*ng a Digital Universe LSST is designed to image the whole sky every few nights for 10 years, giving us a movie-like window into our dynamic Universe. 2 Large

More information

CLOUD SERVICES. Cloud Value Assessment.

CLOUD SERVICES. Cloud Value Assessment. CLOUD SERVICES Cloud Value Assessment www.cloudcomrade.com Comrade a companion who shares one's ac8vi8es or is a fellow member of an organiza8on 2 Today s Agenda! Why Companies Should Consider Moving Business

More information

UNIT-II : VIRTUALIZATION & COMMON STANDARDS IN CLOUD COMPUTING

UNIT-II : VIRTUALIZATION & COMMON STANDARDS IN CLOUD COMPUTING Cloud Computing UNIT-II : VIRTUALIZATION & COMMON STANDARDS IN CLOUD COMPUTING Prof. S. S. Kasualye Department of Information Technology Sanjivani College of Engineering, Kopargaon Common Standards 1.

More information

Developing an Analy.cs Dashboard for Coursera MOOC Discussion Forums CNI Fall 2014 Membership Mee.ng

Developing an Analy.cs Dashboard for Coursera MOOC Discussion Forums CNI Fall 2014 Membership Mee.ng Developing an Analy.cs Dashboard for Coursera MOOC Discussion Forums CNI Fall 2014 Membership Mee.ng Bill Parod Northwestern University Informa7on Technology Northwestern University Private / Big Ten Campuses

More information

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research Dr Paul Calleja Director of Research Computing University of Cambridge Global leader in science & technology

More information

EBOOK: Backup & Recovery on AWS

EBOOK: Backup & Recovery on AWS EBOOK: Backup & Recovery on AWS Contents Backup and Recovery on AWS... 2 AWS Object Storage Services... 3 Featured Backup and Recovery Providers... 5 APN Storage Partner Benefits on AWS... 6 King County,

More information

Please give me your feedback

Please give me your feedback #HPEDiscover Please give me your feedback Session ID: B4385 Speaker: Aaron Spurlock Use the mobile app to complete a session survey 1. Access My schedule 2. Click on the session detail page 3. Scroll down

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information

2013 AWS Worldwide Public Sector Summit Washington, D.C.

2013 AWS Worldwide Public Sector Summit Washington, D.C. 2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic

More information

REsources linkage for E-scIence - RENKEI -

REsources linkage for E-scIence - RENKEI - REsources linkage for E-scIence - - hlp://www.e- sciren.org/ REsources linkage for E- science () is a research and development project for new middleware technologies to enable e- science communi?es. ""

More information

UAB IT Research Compu3ng Update

UAB IT Research Compu3ng Update UAB IT Research Compu3ng Update UAB IT Research Compu3ng Day September 15,2011 Discussion Topics Research Applica3ons Developer Tools Hardware Networks People Current Projects Projected Growth in Data

More information

Institute of Cybernetics NAS of Ukraine Valentyna Cherepynets

Institute of Cybernetics NAS of Ukraine Valentyna Cherepynets Interna'onal Brokerage Event Istanbul 30/11/2016 Institute of Cybernetics NAS of Ukraine Valentyna Cherepynets valentyna.cherepynets@incyb.kiev.ua This presenta,on is for Workshop 4 internet of Things

More information

Big Data, Big Compute, Big Interac3on Machines for Future Biology. Rick Stevens. Argonne Na3onal Laboratory The University of Chicago

Big Data, Big Compute, Big Interac3on Machines for Future Biology. Rick Stevens. Argonne Na3onal Laboratory The University of Chicago Assembly Annota3on Modeling Design Big Data, Big Compute, Big Interac3on Machines for Future Biology Rick Stevens stevens@anl.gov Argonne Na3onal Laboratory The University of Chicago There are no solved

More information

On-demand Research Computing: the European Grid Infrastructure

On-demand Research Computing: the European Grid Infrastructure EGI- InSPIRE On-demand Research Computing: the European Grid Infrastructure Gergely Sipos EGI.eu, Amsterdam gergely.sipos@egi.eu The Milky Way: Stars, Gas, Dust and Magnetic Fields in 3D 19-06-2012 Heidelberg,

More information

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

IRODS USER GROUP 2014 CAMBRIDGE,MA John Burns. 6/25/14 Archive Analy3cs Solu3ons 1

IRODS USER GROUP 2014 CAMBRIDGE,MA John Burns. 6/25/14 Archive Analy3cs Solu3ons 1 IRODS USER GROUP 2014 CAMBRIDGE,MA John Burns 6/25/14 Archive Analy3cs Solu3ons 1 Credits Archive Analy3cs Solu3ons is presen3ng an archive system that embodies best prac3ce for long- term, high integrity

More information

Cisco CloudCenter Use Case Summary

Cisco CloudCenter Use Case Summary Cisco CloudCenter Use Case Summary Overview IT organizations often use multiple clouds to match the best application and infrastructure services with their business needs. It makes sense to have the freedom

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud? DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing Slide 1 Slide 3 ➀ What is Cloud Computing? ➁ X as a Service ➂ Key Challenges ➃ Developing for the Cloud Why is it called Cloud? services provided

More information

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS suneys@amazon.com AWS Core Infrastructure and Services Traditional Infrastructure Amazon Web Services Security Security Firewalls ACLs

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University

More information

The Canadian CyberSKA Project

The Canadian CyberSKA Project The Canadian CyberSKA Project A. G. Willis (on behalf of the CyberSKA Project Team) National Research Council of Canada Herzberg Institute of Astrophysics Dominion Radio Astrophysical Observatory May 24,

More information

Information Technology Infrastructure Committee (ITIC)

Information Technology Infrastructure Committee (ITIC) Information Technology Infrastructure Committee (ITIC) Briefing to the Astrophysics Science Subcommittee November 2012 Larry Smarr Chair ITIC ITIC Committee Members Membership Dr. Larry Smarr (Chair),

More information

EGI: Linking digital resources across Eastern Europe for European science and innovation

EGI: Linking digital resources across Eastern Europe for European science and innovation EGI- InSPIRE EGI: Linking digital resources across Eastern Europe for European science and innovation Steven Newhouse EGI.eu Director 12/19/12 EPE 2012 1 EGI European Over 35 countries Grid Secure sharing

More information

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge Problem statement Today

More information

Travelling securely on the Grid to the origin of the Universe

Travelling securely on the Grid to the origin of the Universe 1 Travelling securely on the Grid to the origin of the Universe F-Secure SPECIES 2007 conference Wolfgang von Rüden 1 Head, IT Department, CERN, Geneva 24 January 2007 2 CERN stands for over 50 years of

More information

AWS Iden)ty And Access Management (IAM) Manohar Rapolu

AWS Iden)ty And Access Management (IAM) Manohar Rapolu AWS Iden)ty And Access Management (IAM) Manohar Rapolu Topics Introduc5on Principals Authen5ca5on Authoriza5on Other Key Feature -> Mul5 Factor Authen5ca5on -> Rota5ng Keys -> Resolving Mul5ple Permissions

More information

Title DC Automation: It s a MARVEL!

Title DC Automation: It s a MARVEL! Title DC Automation: It s a MARVEL! Name Nikos D. Anagnostatos Position Network Consultant, Network Solutions Division Classification ISO 27001: Public Data Center Evolution 2 Space Hellas - All Rights

More information

Business Case Components

Business Case Components How to Build A SOC Agenda Mission Business Case Components Regulatory requirements SOC Terminology Technology Components Events categories Staff Requirements Organiza>on s Considera>ons Training Requirements

More information

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography Christopher Crosby, San Diego Supercomputer Center J Ramon Arrowsmith, Arizona State University Chaitan

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

EMC ISILON HARDWARE PLATFORM

EMC ISILON HARDWARE PLATFORM EMC ISILON HARDWARE PLATFORM Three flexible product lines that can be combined in a single file system tailored to specific business needs. S-SERIES Purpose-built for highly transactional & IOPSintensive

More information

Tools for Handling Big Data and Compu5ng Demands. Humani5es and Social Science Scholars

Tools for Handling Big Data and Compu5ng Demands. Humani5es and Social Science Scholars Tools for Handling Big Data and Compu5ng Demands Humani5es and Social Science Scholars Outline Overview of Compute Canada and WestGrid Focus on Humani5es and Social Sciences The Resource Alloca5on Compe55on

More information

SKA Computing and Software

SKA Computing and Software SKA Computing and Software Nick Rees 18 May 2016 Summary Introduc)on System overview Compu)ng Elements of the SKA Telescope Manager Low Frequency Aperture Array Central Signal Processor Science Data Processor

More information

Horizont HPE Synergy. Matt Foley, EMEA Hybrid IT Presales. October Copyright 2015 Hewlett Packard Enterprise Development LP

Horizont HPE Synergy. Matt Foley, EMEA Hybrid IT Presales. October Copyright 2015 Hewlett Packard Enterprise Development LP Horizont 2016 HPE Synergy Matt Foley, EMEA Hybrid IT Presales Copyright 2015 Hewlett Packard Enterprise Development LP October 2016 Where we started Remember this? 2 Strategy, circa 2007 3 Change-ready

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

January 2011 Joint ISACA/IIA Mee5ng

January 2011 Joint ISACA/IIA Mee5ng January 2011 Joint ISACA/IIA Mee5ng Panel Discussion - Cloud Compu5ng January 13, 2011 Agenda Learning Objec5ves Introduc5ons Defini5ons Discussion Resource Links Note: Electronic copies of this presenta2on

More information

Cloud Computing WSU Dr. Bahman Javadi. School of Computing, Engineering and Mathematics

Cloud Computing WSU Dr. Bahman Javadi. School of Computing, Engineering and Mathematics Cloud Computing Research @ WSU Dr. Bahman Javadi School of Computing, Engineering and Mathematics Research Team and Research Interests Team 4 Academic Staff 5 PhD Students 1 Master Student Resource Scheduling

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn

More information

CloudLab. Updated: 5/24/16

CloudLab. Updated: 5/24/16 2 The Need Addressed by Clouds are changing the way we look at a lot of problems Impacts go far beyond Computer Science but there's still a lot we don't know, from perspective of Researchers (those who

More information

Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Lynn Anderson Senior Vice President, Communications Chief Communications Officer Chief of Staff Bill Veghte Chief Operating Officer, HP Lynn Anderson Senior Vice President, Communications Chief Communications

More information

PhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015.

PhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015. PhD in Computer And Control Engineering XXVII cycle Torino February 27th, 2015. Parallel and reconfigurable systems are more and more used in a wide number of applica7ons and environments, ranging from

More information

The OpenCirrus TM Project: A global Testbed for Cloud Computing R&D

The OpenCirrus TM Project: A global Testbed for Cloud Computing R&D The OpenCirrus TM Project: A global Testbed for Cloud Computing R&D Marcel Kunze Steinbuch Centre for Computing (SCC) Karlsruhe Institute of Technology (KIT) Germany KIT The cooperation of Forschungszentrum

More information

Georgia State University Cyberinfrastructure Plan

Georgia State University Cyberinfrastructure Plan Georgia State University Cyberinfrastructure Plan Summary Building relationships with a wide ecosystem of partners, technology, and researchers are important for GSU to expand its innovative improvements

More information

Database Management Systems

Database Management Systems Database Management Systems Fall 2017 Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it. -- Samuel Johnson (1709-1784) Queries for Today Why? Who?

More information

Accelerate your Azure Hybrid Cloud Business with HPE. Ken Won, HPE Director, Cloud Product Marketing

Accelerate your Azure Hybrid Cloud Business with HPE. Ken Won, HPE Director, Cloud Product Marketing Accelerate your Azure Hybrid Cloud Business with HPE Ken Won, HPE Director, Cloud Product Marketing Mega trend: Customers are increasingly buying cloud services from external service providers Speed of

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill Introduction to FREE National Resources for Scientific Computing Dana Brunson Oklahoma State University High Performance Computing Center Jeff Pummill University of Arkansas High Peformance Computing Center

More information

igeni: International Global Environment for Network Innovations

igeni: International Global Environment for Network Innovations igeni: International Global Environment for Network Innovations Joe Mambretti, Director, (j-mambretti@northwestern.edu) International Center for Advanced Internet Research (www.icair.org) Northwestern

More information

The Fermilab HEPCloud Facility: Adding 60,000 Cores for Science! Burt Holzman, for the Fermilab HEPCloud Team HTCondor Week 2016 May 19, 2016

The Fermilab HEPCloud Facility: Adding 60,000 Cores for Science! Burt Holzman, for the Fermilab HEPCloud Team HTCondor Week 2016 May 19, 2016 The Fermilab HEPCloud Facility: Adding 60,000 Cores for Science! Burt Holzman, for the Fermilab HEPCloud Team HTCondor Week 2016 May 19, 2016 My last Condor Week talk 2 05/19/16 Burt Holzman Fermilab HEPCloud

More information

GENI Laboratory Exercises for a Cloud Computing course

GENI Laboratory Exercises for a Cloud Computing course GENI Laboratory Exercises for a Cloud Computing course Prasad Calyam, Ph.D. Assistant Professor, Department of Computer Science NSF Workshop on GENI in Education, October 26 th 2013 What is Cloud Computing?

More information

CDIS Biomedical Data Commons

CDIS Biomedical Data Commons CDIS Biomedical Data Commons Computational Life Science Seminar Series October 18, 2017 Michael Fitzsimons Center for Data Intensive Science Agenda What is a Data Commons? Data Commons at CDIS NCI GDC

More information

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama Basics of Cloud Computing Lecture 2 Cloud Providers Satish Srirama Outline Cloud computing services recap Amazon cloud services Elastic Compute Cloud (EC2) Storage services - Amazon S3 and EBS Cloud managers

More information

Op#mizing MapReduce for Highly- Distributed Environments

Op#mizing MapReduce for Highly- Distributed Environments Op#mizing MapReduce for Highly- Distributed Environments Abhishek Chandra Associate Professor Department of Computer Science and Engineering University of Minnesota hep://www.cs.umn.edu/~chandra 1 Big

More information

Private Cloud at IIT Delhi

Private Cloud at IIT Delhi Private Cloud at IIT Delhi Success Story Engagement: Long Term Industry: Education Offering: Private Cloud Deployment Business Challenge IIT Delhi, one of the India's leading educational Institute wanted

More information

6,000 Cameras in Time Square 210 million Cameras worldwide

6,000 Cameras in Time Square 210 million Cameras worldwide SMILE!! You are on camera 75 $mes per day Average American ci$zen can be caught on camera 1:29 Camera to person ra$o World Wide 6,000 Cameras in Time Square 210 million Cameras worldwide What is the LTO

More information

De BiG Grid e-infrastructuur digitaal onderzoek verbonden

De BiG Grid e-infrastructuur digitaal onderzoek verbonden Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/ De BiG Grid e-infrastructuur digitaal onderzoek verbonden David Groep, Nikhef KennisKring Amsterdam

More information

Yajing (Phillis)Tang. Walt Wells

Yajing (Phillis)Tang. Walt Wells Building on the NOAA Big Data Project for Academic Research: An OCC Maria Patterson Perspective Zachary Flamig Yajing (Phillis)Tang Walt Wells Robert Grossman We have a problem The commoditization of sensors

More information

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality

More information

e-infrastructures in FP7 INFO DAY - Paris

e-infrastructures in FP7 INFO DAY - Paris e-infrastructures in FP7 INFO DAY - Paris Carlos Morais Pires European Commission DG INFSO GÉANT & e-infrastructure Unit 1 Global challenges with high societal impact Big Science and the role of empowered

More information

Vision of the Software Defined Data Center (SDDC)

Vision of the Software Defined Data Center (SDDC) Vision of the Software Defined Data Center (SDDC) Raj Yavatkar, VMware Fellow Vijay Ramachandran, Sr. Director, Storage Product Management Business transformation and disruption A software business that

More information

Maimonides Medical Center s Quest for Operational Continuity Via Real-Time Data Accessibility

Maimonides Medical Center s Quest for Operational Continuity Via Real-Time Data Accessibility Maimonides Medical Center is a non-profit, non-sectarian hospital located in Borough Park, in the New York City borough of Brooklyn, in the U.S. state of New York. Maimonides is both a treatment facility

More information

Grid Computing a new tool for science

Grid Computing a new tool for science Grid Computing a new tool for science CERN, the European Organization for Nuclear Research Dr. Wolfgang von Rüden Wolfgang von Rüden, CERN, IT Department Grid Computing July 2006 CERN stands for over 50

More information

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation, Integration Alan Blatecky Director OCI 1 1 Framing the

More information

Azure Certification BootCamp for Exam (Developer)

Azure Certification BootCamp for Exam (Developer) Azure Certification BootCamp for Exam 70-532 (Developer) Course Duration: 5 Days Course Authored by CloudThat Description Microsoft Azure is a cloud computing platform and infrastructure created for building,

More information

The Human Variant Database

The Human Variant Database The Human Variant Database Mya Warren Michael Smith Genome Sciences Centre Vancouver BC Bioinforma=cs is Big Data Human genome has 3 billion nucleo=de bases 60 thousand genes 10-20 thousand proteins Bioinforma=cs

More information

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Robert Collazo Systems Engineer Rackspace Hosting The Rackspace Vision Agenda Truly a New Era of Computing 70 s 80 s Mainframe Era 90

More information

globus online Software-as-a-Service for Research Data Management

globus online Software-as-a-Service for Research Data Management globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National Laboratory Big Science built on Globus Toolkit

More information

LHC and LSST Use Cases

LHC and LSST Use Cases LHC and LSST Use Cases Depots Network 0 100 200 300 A B C Paul Sheldon & Alan Tackett Vanderbilt University LHC Data Movement and Placement n Model must evolve n Was: Hierarchical, strategic pre- placement

More information

Real- &me Archiving of Spontaneous Events (Use- Case : Hurricane Sandy)

Real- &me Archiving of Spontaneous Events (Use- Case : Hurricane Sandy) Archive- it Partner Mee&ng, Annapolis, Maryland December 3, 2012 Real- &me Archiving of Spontaneous Events (Use- Case : Hurricane Sandy) Kiran ChiBuri, Digital Library Research Laboratory, Virginia Tech.

More information

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved. BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST 1 UNSTRUCTURED DATA GROWTH 75% 78% 80% 2015 71 EB 2016 106 EB 2017 133 EB Total Capacity Shipped, Worldwide % of Unstructured Data

More information

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391

Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Outline Big Data Big Data Examples Challenges with traditional storage NoSQL Hadoop HDFS MapReduce Architecture 2 Big Data In information

More information

Welcome to the SIHO itransact portal.

Welcome to the SIHO itransact portal. Provider and Vendor Access Portal One stop access for your guide to utilizing SIHO s new itransact platform. Welcome to the SIHO itransact portal. Primary access codes will be given to key contacts at

More information

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April Europe and its Open Science Cloud: the Italian perspective Luciano Gaido (gaido@to.infn.it) Plan-E meeting, Poznan, April 27 2017 Background Italy has a long-standing expertise and experience in the management

More information

Part 2: Computing and Networking Capacity (for research and instructional activities)

Part 2: Computing and Networking Capacity (for research and instructional activities) National Science Foundation Part 2: Computing and Networking Capacity (for research and instructional activities) FY 2013 Survey of Science and Engineering Research Facilities Who should be contacted if

More information

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to the Mathematics of Big Data. Philippe B. Laval Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2017 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,

More information

DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure

DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure TM DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure About DRAGEN Edico Genome s DRAGEN TM (Dynamic Read Analysis for GENomics) Bio-IT Platform provides ultra-rapid secondary analysis of

More information

Farsight Genome Systems

Farsight Genome Systems Customer Success Story Farsight Genome Systems ClearDATA Helps Speed to Market Groundbreaking Genomic Testing Solution Page 2 of 5 Farsight Genome Systems ClearDATA Helps Speed to Market Groundbreaking

More information

Cybersecurity Curricular Guidelines

Cybersecurity Curricular Guidelines Cybersecurity Curricular Guidelines Ma2 Bishop, University of California Davis, co-chair Diana Burley The George Washington University, co-chair Sco2 Buck, Intel Corp. Joseph J. Ekstrom, Brigham Young

More information

Lesson 14: Cloud Computing

Lesson 14: Cloud Computing Yang, Chaowei et al. (2011) 'Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing?', International Journal of Digital Earth, 4: 4, 305 329 GEOG 482/582 : GIS Data

More information

Archiving to The Cloud?

Archiving to The Cloud? Why Archiving to The Cloud might prove more problematic than first envisioned. Archiving to The Cloud? White paper on Things to consider when archiving to The Cloud. Ray Quattromini MD Fortuna Power Systems

More information

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect Amazon Web Services Foundational Services for Research Computing Mike Kuentz, WWPS Solutions Architect April 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure

More information

Village Software. Security Assessment Report

Village Software. Security Assessment Report Village Software Security Assessment Report Version 1.0 January 25, 2019 Prepared by Manuel Acevedo Helpful Village Security Assessment Report! 1 of! 11 Version 1.0 Table of Contents Executive Summary

More information

2017 Resource Allocations Competition Results

2017 Resource Allocations Competition Results 2017 Resource Allocations Competition Results Table of Contents Executive Summary...3 Computational Resources...5 CPU Allocations...5 GPU Allocations...6 Cloud Allocations...6 Storage Resources...6 Acceptance

More information

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory ICN for Cloud Networking Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory Information-Access Dominates Today s Internet is focused on point-to-point communication

More information

Distributed Research Networks: Lessons from the Field

Distributed Research Networks: Lessons from the Field Distributed Research Networks: Lessons from the Field The Learning System Summit Sponsored by the Joseph H. Kanter Family Founda8on The Na(onal Press Club Washington, DC May 17-18, 2012 Jeffrey Brown,

More information

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data 46 Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

More information

Examining Public Cloud Platforms

Examining Public Cloud Platforms Examining Public Cloud Platforms A Survey Copyright 2012 Chappell & Associates Agenda What is Cloud Computing? Cloud Platform Technologies: An Overview Public Cloud Platforms: Reviewing the Terrain What

More information

A VO-friendly, Community-based Authorization Framework

A VO-friendly, Community-based Authorization Framework A VO-friendly, Community-based Authorization Framework Part 1: Use Cases, Requirements, and Approach Ray Plante and Bruce Loftis NCSA Version 0.1 (February 11, 2005) Abstract The era of massive surveys

More information

User Community Driven Development in Trust and Identity Services

User Community Driven Development in Trust and Identity Services User Community Driven Development in Trust and Identity Services Ann Harding, SWITCH Internet2 Global Summit 27 April 2015 Washington DCs Agenda Trust and Iden.ty Landscape GÉANT Research Community Engagement

More information