Bionimbus: Lessons from a Petabyte-Scale Science Cloud Service Provider (CSP)
|
|
- Alfred Lambert
- 5 years ago
- Views:
Transcription
1 Bionimbus: Lessons from a Petabyte-Scale Science Cloud Service Provider (CSP) Robert Grossman Institute for Genomics & Systems Biology Center for Research Informatics Computation Institute Department of Medicine University of Chicago & Open Data Group September 11, 2012
2 The OSDC & Bionimbus Teams Open Science Data Cloud (OSDC) Team Matt Greenway, Allison Heath, Ray Powell, Rafael Suarez. Major funding for the OSDC is provided by the Gordon and Betty Moore Foundation. Bionimbus Team Elizabeth Bartom, Casey Brown, Jason Grundstad, David Hanley, Nicolas Negre, Tom Stricker, Matt Slattery, Rebecca Spokony & Kevin White. Bionimbus is a joint project between Laboratory for Advanced Computing & White Lab at the University of Chicago and uses in part OSDC infrastructure.
3 Let s Step Back 20 Years : Petabyte Access & Storage Solutions (PASS) Project for SSC. It developed & benchmarked federated relational, OO DB, object stores, & columnoriented data warehouse solutions at the TB-scale.
4 A picture of Cern s Large Hadron Collider (LHC). The LHC took about a decade to construct, and cost about $4.75 billion. Source of picture: Conrad Melvin, Creative Commons BY-SA 2.0,
5 Part 1. Genomics as a Big Data Science
6 Source: Lincoln Stein
7 One Million Genomes Sequencing a million genomes would most likely fundamentally change the way we understand genomic variation. The genomic data for a patient is about 1 TB (including samples from both tumor and normal tissue). One million genomes is about 1000 PB or 1 EB With compression, it may be about 100 PB At $1000/genome, the sequencing would cost about $1B
8 Big data driven discovery on 1,000,000 genomes and 1 EB of data. Genomicdriven diagnosis Improved understanding of genomic science Genomicdriven drug development Precision diagnosis and treatment. Preventive health care.
9 ER+ TNBC With genomics, we can stratify diseases and treat each stratum differently. Source: White Lab, University of Chicago.
10 Clonal Evolution of Tumors Tumors evolve temporally and spatially. Source: Mel Greaves & Carlo C. Maley, Clonal evolution in cancer, Nature, Volume 241, pages , 2012.
11 Combinations of Rare Alleles Penetrance High Intermediate Modest Low Very rare alleles causing Mendelian disease rare variants of small effect very hard to identify by genetic means Rare Low-frequency variants with intermediate penetrance Uncommon rare examples of high-penetrance common variants influencing common disease most common variants implicated in common disease by GWA Common Allele frequency Source: Mark McCarthy
12 TCGA Analysis of Lung Cancer Source: The Cancer Genome Atlas Research Network, Comprehensive genomic characterization of squamous cell lung cancers, Nature, 2012, doi: /nature cases of SQCC (lung cancer) Matched tumor & normal Mean of 360 exonic mutations, 323 CNV, & 165 rearrangements per tumor
13 Some Examples of Big Data Science Discipline Duration Size # Devices HEP - LHC 10 years 15 PB/year* One Astronomy - LSST 10 years 12 PB/year** One Genomics - NGS 2-4 years 0.5 TB/genome 1000 s *At full capacity, the Large Hadron Collider (LHC), the world's largest particle accelerator, is expected to produce more than 15 million Gigabytes of data each year. This ambitious project connects and combines the IT power of more than 140 computer centres in 33 countries. Source: **As it carries out its 10-year survey, LSST will produce over 15 terabytes of raw astronomical data each night (30 terabytes processed), resulting in a database catalog of 22 petabytes and an image archive of 100 petabytes. Source:
14 One large instrument Many smaller instruments
15 Part 2. What Instrument Do we Use to Make Big Data Discoveries? How do we build a datascope?
16 TB? PB? EB? ZB? What is big data?
17 Another way: opencompute.org Think of data as big if you measure it in MW, as in Facebook s Pineville Data Center is 30 MW.
18 An algorithm and computing infrastructure is big-data scalable if adding a rack (or container) of data (and corresponding processors) allows you to do the same computation in the same time but over more data.
19 Commercial Cloud Service Provider (CSP) 15 MW Data Center Monitoring, network security and forensics Automatic provisioning and infrastructure management Accounting and billing 100,000 servers 1 PB DRAM 100 s of PB of disk Customer Facing Portal ~1 Tbps egress bandwidth 25 operators for 15 MW Commercial Cloud Data center network
20 What are some of the important differences between commercial and research-focused CSPs?
21 POV Data & Storage Flows Streams Science CSP Democratize access to data. Integrate data to make discoveries. Long term archive. Data intensive computing & HP storage Science Clouds Large data flows in and out Streaming processing required Commercial CSP As long as you pay the bill; as long as the business model holds. Internet style scale out and object-based storage Lots of small web flows NA Accounting Essential Essential Lock in Moving environment between CSPs essential Lock in is good
22 Part 3. The Open Cloud Consortium s Open Science Data Cloud
23 U.S based not-for-profit corporation. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Manages cloud computing testbeds: Open Cloud Testbed. 23
24 Cloud Services Operations Centers (CSOC) The OSDC operates Cloud Services Operations Center (or CSOC). It is a CSOC focused on supporting Science Clouds for researchers. Compare to Network Operations Center or NOC. Both are an important part of cyber infrastructure for big data science.
25 Different Styles of OSDC Racks 2012 OSDC rack design (draft) 950 TB / rack 600 cores / rack Design 1: Put cores over spindles. Higher cost but easy to compute over all the data. Design 2: separate (some of the )storage from the compute.
26 Open Science Data Cloud Monitoring, compliance, & security Automatic provisioning and infrastructure management Accounting and billing (OSDC) Science Cloud SW & Services 3 PB PB 2012 able to scale to 100 PB? 5-12 operators to operate 1-5 MW Science Cloud Data center network Customer Facing Portal (Tukey) ~100 Gbps bandwidth OSDC Data Stack based upon OpenStack, Hadoop, GlusterFS, UDT,
27 OSDC Philosophy We try to automate as much as possible (we automate the setup & operations of a rack). We try to write as little software as possible. Each project is a bit different, but in general: We assign (permanent) IDs to data managed by the OSDC and manage associated metadata. We assign and enforce permissions for users & groups of users and for files/objects, collections of files/objects, and collections of collections. We Support RESTful interfaces. Do accounting for storage and core-hours.
28 Some Of Our Biggest Mistakes Not charging for services. This resulted in a lot of bad behavior. Trying to support donated equipment without adequate staff. Being too optimistic about when big data software would be ready for prime time. Some problems with big data software doesn t show up at less than the full scale of the OSDC, but we have only one OSDC and it is difficult to test at this scale.
29 Essential Services for a Science CSP Support for data intensive computing Support for big data flows Account management, authentication and authorization services Health and status monitoring Billing and accounting Ability to rapidly provision infrastructure Security services, logging, event reporting Access to large amounts of public data High performance storage Simple data export and import services
30 Number 1000 s Individual scientists & small projects 100 s 10 s Small Public infrastructure Community based science via Science as a Service very large projects Data Size Medium to Large Very Large Shared community infrastructure Dedicated infrastructure
31 Part 4. Bionimbus Bionimbus is a joint project between Laboratory For Advanced Computing & the White Lab at the University of Chicago.
32 Step 1. Prepare a Sample
33 Step 2. Login to Bionimbus and get a Bionimbus Key.
34 Step 3. Send your sample to the sequencing center.
35 Step 4. Login on to Bionimbus and view your data
36 Step 5. Use Bionimbus to perform standard and custom pipelines. Bionimbus can launch multiple virtual machines.
37 Bionimbus Virtual Machine Releases Peak Calling Quality Control Alignment & Genotyping MAT MA2C PeakSeq MACS SPP Various Bowtie TopHat Samtools Picard 37
38 Software Tools: Moving Genomes
39 Bionimbus Community Genomic Cloud researcher 1K genomes PubMed etc. Cloud for Public Data Personal dropbox + compute
40 Bionimbus Private Genomic Cloud researcher 1K genomes PubMed etc. Cloud for Public Data Personal dropbox & compute Cloud for Controlled Data TCGA dbgap
41 Bionimbus Private Biomedical Cloud researcher 1K genomes PubMed etc. Cloud for Public Data Personal dropbox plus compute Cloud for Controlled Data TCGA dbgap Scatter, gather queries Clinical Research Data Warehouse Cloud for PHI data
42 External sequencing partner Step 3b. Return variant calls, CNV, annotation Step 2. Send sample to be sequenced. Step 4. Secure data routing to appropriate cloud based upon BID. Internal Sequencers Step 3a. Return raw reads. Bionimbus Private Cloud UC BID Generator Step 5. Cloud based analysis using IGSB and 3 rd party tools and applications. Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc. Bionimbus Community Cloud Bionimbus Private Cloud XY dbgap Amazon
43 web2py-based Front End (Eucalyptus, OpenStack) (PostgreSQL) Utility Cloud Services (IDs, etc.) Database Services Data Ingestion Services Analysis Pipelines & Re-analysis Services Data Cloud Services Intercloud Services (UDT, replication) (Hadoop, Sector/Sphere)
44 >300 ChIP datasets -Chromatin/RNA timecourse -CBP -PolII -Pho/silencers -HDACs -Insulators -TFs Predictions 537 silencers 2,307 new promoters 12,285 enhancers 14,145 insulators Negre et al. Nature
45 Part 5. Managing One Million Genomes
46 Relational databases Summary level ( TB) Enrich with clinical data NoSql & scientific databases NoSql, DFS, file overlays? Variation (VCF) Files (1-10 PB) (Genomic variation) Sequence (BAM) Files ( PB) (Sequence data in binary form)
47 Acknowledgements Major funding and support for the Open Science Data Cloud is provided by the Gordon and Betty Moore Foundation, which has provided $2M of funding to the OSDC to launch Phase 1 of the project ( ). Moore Foundation funding is used to support the OSDC-Adler, Sullivan and Root facilities. Additional funding for the OSDC has been provided by the following sponsors: The OCC-Y Hadoop Cluster (approximately 1000 cores and 1 PB of storage) was donated by Yahoo! in Cisco provides the OSDC access to the Cisco C-Wave, which connects OSDC data centers with 10 Gbps wide area networks. NSF awarded the OSDC a 5-year ( ) $3.5M PIRE award to train scientists to use the OSDC and to further develop the underlying technology. OSDC technology for high performance data transport is support in part by NSF Award The StarLight Facility in Chicago enables the OSDC to connect to over 30 high performance research networks around the world at 10 Gbps or higher, with an increasing number of 100 Gbps connections. The OSDC is managed by the Open Cloud Consortium, a 501(c)(3) not-for-profit corporation. If you are interested in providing funding or donating equipment or services, please contact us at info@opensciencedatacloud.org.
48 For more information You can find some more information on my blog: rgrossman.com. Some of my technical papers are also available there. My address is robert.grossman at uchicago dot edu
49 Sources for images The image of the hard disk is from Norlando Pobre, Creative Commons. The image of the Facebook Pineville Data Center is from the Intel Free Press, Creative Commons BY 2.0. The image of the LHC is from Conrad Melvin, Creative Commons BY-SA 2.0,
Using the Open Science Data Cloud for Data Science Research. Robert Grossman University of Chicago Open Cloud Consor=um June 17, 2013
Using the Open Science Data Cloud for Data Science Research Robert Grossman University of Chicago Open Cloud Consor=um June 17, 2013 Discoveries Team: you and your colleagues correla=on + algorithms +
More informationFlorida International University
Florida International University PARTNERSHIP FOR INTERNATIONAL RESEARCH AND EDUCATION TERENA June 3 rd,2013 Julio Ibarra, PhD. Assistant Vice President of Technology Augmented Research (CIARA) The Open
More informationCDIS Biomedical Data Commons
CDIS Biomedical Data Commons Computational Life Science Seminar Series October 18, 2017 Michael Fitzsimons Center for Data Intensive Science Agenda What is a Data Commons? Data Commons at CDIS NCI GDC
More informationDRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure
TM DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure About DRAGEN Edico Genome s DRAGEN TM (Dynamic Read Analysis for GENomics) Bio-IT Platform provides ultra-rapid secondary analysis of
More informationThe Data exacell DXC. J. Ray Scott DXC PI May 17, 2016
The Data exacell DXC J. Ray Scott DXC PI May 17, 2016 DXC Leadership Mike Levine Co-Scientific Director Co-PI Nick Nystrom Senior Director of Research Co-PI Ralph Roskies Co-Scientific Director Co-PI Robin
More informationData Centres in the Virtual Observatory Age
Data Centres in the Virtual Observatory Age David Schade Canadian Astronomy Data Centre A few things I ve learned in the past two days There exist serious efforts at Long-Term Data Preservation Alliance
More informationGiovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France
Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France ERF, Big data & Open data Brussels, 7-8 May 2014 EU-T0, Data
More informationGenomics on Cisco Metacloud + SwiftStack
Genomics on Cisco Metacloud + SwiftStack Technology is a large component of driving discovery in both research and providing timely answers for clinical treatments. Advances in genomic sequencing have
More informationCSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science
CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge Problem statement Today
More informationConference The Data Challenges of the LHC. Reda Tafirout, TRIUMF
Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment
More informationDatabase Management Systems
Database Management Systems Fall 2017 Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information upon it. -- Samuel Johnson (1709-1784) Queries for Today Why? Who?
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationCSE6331: Cloud Computing
CSE6331: Cloud Computing Leonidas Fegaras University of Texas at Arlington c 2019 by Leonidas Fegaras Cloud Computing Fundamentals Based on: J. Freire s class notes on Big Data http://vgc.poly.edu/~juliana/courses/bigdata2016/
More informationCERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008
CERN openlab II CERN openlab and Intel: Today and Tomorrow Sverre Jarp CERN openlab CTO 16 September 2008 Overview of CERN 2 CERN is the world's largest particle physics centre What is CERN? Particle physics
More informationThe Bionimbus PDC: Obtaining Access FAQ
The Bionimbus PDC: Obtaining Access FAQ TABLE OF CONTENTS PREREQUISITES 3 LEGAL DOCUMENTS 3 SECURITY TRAINING 3 GENERAL GUIDELINES 4 AUTH METHOD 1: USING AN ERA TO GAIN ACCESS TO A DBGAP DATASET 5 GETTING
More informationigeni: International Global Environment for Network Innovations
igeni: International Global Environment for Network Innovations Joe Mambretti, Director, (j-mambretti@northwestern.edu) International Center for Advanced Internet Research (www.icair.org) Northwestern
More informationThe Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research
The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research Dr Paul Calleja Director of Research Computing University of Cambridge Global leader in science & technology
More informationGrid Computing: dealing with GB/s dataflows
Grid Computing: dealing with GB/s dataflows Jan Just Keijser, Nikhef janjust@nikhef.nl David Groep, NIKHEF 3 May 2012 Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/
More informationIntroduction to Grid Computing
Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able
More informationFrom Internet Data Centers to Data Centers in the Cloud
From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs
More informationICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory
ICN for Cloud Networking Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory Information-Access Dominates Today s Internet is focused on point-to-point communication
More informationEGI: Linking digital resources across Eastern Europe for European science and innovation
EGI- InSPIRE EGI: Linking digital resources across Eastern Europe for European science and innovation Steven Newhouse EGI.eu Director 12/19/12 EPE 2012 1 EGI European Over 35 countries Grid Secure sharing
More informationQLIK INTEGRATION WITH AMAZON REDSHIFT
QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik
More informationACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development
ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development Jeremy Fischer Indiana University 9 September 2014 Citation: Fischer, J.L. 2014. ACCI Recommendations on Long Term
More informationCC-IN2P3: A High Performance Data Center for Research
April 15 th, 2011 CC-IN2P3: A High Performance Data Center for Research Toward a partnership with DELL Dominique Boutigny Agenda Welcome Introduction to CC-IN2P3 Visit of the computer room Lunch Discussion
More informationGigabyte Bandwidth Enables Global Co-Laboratories
Gigabyte Bandwidth Enables Global Co-Laboratories Prof. Harvey Newman, Caltech Jim Gray, Microsoft Presented at Windows Hardware Engineering Conference Seattle, WA, 2 May 2004 Credits: This represents
More informationScience 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis
W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis Elisabeth Lex KTI, TU Graz WS 2015/16 u www.tugraz.at
More informationGrid Computing: dealing with GB/s dataflows
Grid Computing: dealing with GB/s dataflows Jan Just Keijser, Nikhef janjust@nikhef.nl David Groep, NIKHEF 21 March 2011 Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/
More informationBig Data 2015: Sponsor and Participants Research Event ""
Big Data 2015: Sponsor and Participants Research Event "" Center for Large-scale Data Systems Research, CLDS! San Diego Supercomputer Center! UC San Diego! Agenda" Welcome and introductions! SDSC: Who
More informationThe National Center for Genome Analysis Support as a Model Virtual Resource for Biologists
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused Technical Workshop. Berkeley, CA July 17-18, 2013
More informationWhat is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?
Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation
More informationData publication and discovery with Globus
Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,
More informationFlexible HPC for Bio-informatics. Peter Clapham
Flexible HPC for Bio-informatics Peter Clapham Overview Overview of the Sanger Institute How our data flow works today New scientific demands Private cloud deployment Transitional and future challenges
More informationSPARC 2 Consultations January-February 2016
SPARC 2 Consultations January-February 2016 1 Outline Introduction to Compute Canada SPARC 2 Consultation Context Capital Deployment Plan Services Plan Access and Allocation Policies (RAC, etc.) Discussion
More informationCloudLab. Updated: 5/24/16
2 The Need Addressed by Clouds are changing the way we look at a lot of problems Impacts go far beyond Computer Science but there's still a lot we don't know, from perspective of Researchers (those who
More informationBig Data - Some Words BIG DATA 8/31/2017. Introduction
BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means
More informationThe Canadian CyberSKA Project
The Canadian CyberSKA Project A. G. Willis (on behalf of the CyberSKA Project Team) National Research Council of Canada Herzberg Institute of Astrophysics Dominion Radio Astrophysical Observatory May 24,
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationirods at TACC: Secure Infrastructure for Open Science Chris Jordan
irods at TACC: Secure Infrastructure for Open Science Chris Jordan What is TACC? Texas Advanced Computing Center Cyberinfrastructure Resources for Open Science University of Texas System 9 Academic, 6
More informationSummary of Data Management Principles
Large Synoptic Survey Telescope (LSST) Summary of Data Management Principles Steven M. Kahn LPM-151 Latest Revision: June 30, 2015 Change Record Version Date Description Owner name 1 6/30/2015 Initial
More informationYajing (Phillis)Tang. Walt Wells
Building on the NOAA Big Data Project for Academic Research: An OCC Maria Patterson Perspective Zachary Flamig Yajing (Phillis)Tang Walt Wells Robert Grossman We have a problem The commoditization of sensors
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationBest Practices and Performance Tuning on Amazon Elastic MapReduce
Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or
More informationThe NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets
The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets Jeffrey Brown, Lesley Curtis, and Rich Platt June 13, 2014 Previously The NIH Collaboratory:
More informationScientific data processing at global scale The LHC Computing Grid. fabio hernandez
Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since
More informationeresearch UCT Jason van Rooyen, PhD eresearch Analyst
eresearch UCT Jason van Rooyen, PhD eresearch Analyst www.eresearch.uct.ac.za Libraries http://www.canberra.edu.au/research/ucresearch/e-research Libraries eresearch is 21 st century discovery through
More informationAn Overview of the Open Science Data Cloud
An Overview of the Open Science Data Cloud Robert L. Grossman University of Illinois at Chicago Michal Sabala University of Illinois at Chicago Yunhong Gu University of Illinois at Chicago Alex Szalay
More informationTravelling securely on the Grid to the origin of the Universe
1 Travelling securely on the Grid to the origin of the Universe F-Secure SPECIES 2007 conference Wolfgang von Rüden 1 Head, IT Department, CERN, Geneva 24 January 2007 2 CERN stands for over 50 years of
More informationMitigating Risk of Data Loss in Preservation Environments
Storage Resource Broker Mitigating Risk of Data Loss in Preservation Environments Reagan W. Moore San Diego Supercomputer Center Joseph JaJa University of Maryland Robert Chadduck National Archives and
More informationNew strategies of the LHC experiments to meet the computing requirements of the HL-LHC era
to meet the computing requirements of the HL-LHC era NPI AS CR Prague/Rez E-mail: adamova@ujf.cas.cz Maarten Litmaath CERN E-mail: Maarten.Litmaath@cern.ch The performance of the Large Hadron Collider
More informationDecrypting your genome data privately in the cloud
Decrypting your genome data privately in the cloud Marc Sitges Data Manager@Made of Genes @madeofgenes The Human Genome 3.200 M (x2) Base pairs (bp) ~20.000 genes (~30%) (Exons ~1%) The Human Genome Project
More informationTurning Data Science into a reality with TIBCO Spotfire
Turning Data Science into a reality with TIBCO Spotfire Eduardo Gonzalez-Couto, Ph.D. Product Manager, PerkinElmer Informatics Basel, 3 rd November 2016 Safe Harbor Statement This document shows current
More informationDe BiG Grid e-infrastructuur digitaal onderzoek verbonden
Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/ De BiG Grid e-infrastructuur digitaal onderzoek verbonden David Groep, Nikhef KennisKring Amsterdam
More informationData Mining and Warehousing
Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.
More informationThe CEDA Archive: Data, Services and Infrastructure
The CEDA Archive: Data, Services and Infrastructure Kevin Marsh Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk with thanks to V. Bennett, P. Kershaw, S. Donegan and the rest of the CEDA Team
More informationInsight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products
What is big data? Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
More informationTowards a Strategy for Data Sciences at UW
Towards a Strategy for Data Sciences at UW Albrecht Karle Department of Physics June 2017 High performance compu0ng infrastructure: Perspec0ves from Physics Exis0ng infrastructure and projected future
More informationGrid Computing a new tool for science
Grid Computing a new tool for science CERN, the European Organization for Nuclear Research Dr. Wolfgang von Rüden Wolfgang von Rüden, CERN, IT Department Grid Computing July 2006 CERN stands for over 50
More informationIntroduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill
Introduction to FREE National Resources for Scientific Computing Dana Brunson Oklahoma State University High Performance Computing Center Jeff Pummill University of Arkansas High Peformance Computing Center
More informatione-infrastructures in FP7 INFO DAY - Paris
e-infrastructures in FP7 INFO DAY - Paris Carlos Morais Pires European Commission DG INFSO GÉANT & e-infrastructure Unit 1 Global challenges with high societal impact Big Science and the role of empowered
More informationMaximizing Public Data Sources for Sequencing and GWAS
Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda
More informationA VO-friendly, Community-based Authorization Framework
A VO-friendly, Community-based Authorization Framework Part 1: Use Cases, Requirements, and Approach Ray Plante and Bruce Loftis NCSA Version 0.1 (February 11, 2005) Abstract The era of massive surveys
More informationTHE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel
THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel National Center for Supercomputing Applications University of Illinois
More informationForget about the Clouds, Shoot for the MOON
Forget about the Clouds, Shoot for the MOON Wu FENG feng@cs.vt.edu Dept. of Computer Science Dept. of Electrical & Computer Engineering Virginia Bioinformatics Institute September 2012, W. Feng Motivation
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationThe OpenCirrus TM Project: A global Testbed for Cloud Computing R&D
The OpenCirrus TM Project: A global Testbed for Cloud Computing R&D Marcel Kunze Steinbuch Centre for Computing (SCC) Karlsruhe Institute of Technology (KIT) Germany KIT The cooperation of Forschungszentrum
More informationStorage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium
Storage on the Lunatic Fringe Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium tmruwart@dtc.umn.edu Orientation Who are the lunatics? What are their requirements?
More informationData Intensive Scalable Computing. Thanks to: Randal E. Bryant Carnegie Mellon University
Data Intensive Scalable Computing Thanks to: Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Big Data Sources: Seismic Simulations Wave propagation during an earthquake Large-scale
More informationGeorgia State University Cyberinfrastructure Plan
Georgia State University Cyberinfrastructure Plan Summary Building relationships with a wide ecosystem of partners, technology, and researchers are important for GSU to expand its innovative improvements
More informationCANARIE: Providing Essential Digital Infrastructure for Canada
CANARIE: Providing Essential Digital Infrastructure for Canada Mark Wolff; CTO April 16, 2014 A Transformation of the Science Paradigm thousands of years ago last few hundred years last few decades today
More informationGalaxy workshop at the Winter School Igor Makunin
Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis
More information2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice
2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 25: Parallel Databases CSE 344 - Winter 2013 1 Announcements Webquiz due tonight last WQ! J HW7 due on Wednesday HW8 will be posted soon Will take more hours
More informationCLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER
CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER V.V. Korenkov 1, N.A. Kutovskiy 1, N.A. Balashov 1, V.T. Dimitrov 2,a, R.D. Hristova 2, K.T. Kouzmov 2, S.T. Hristov 3 1 Laboratory of Information
More informationLeveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands
Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing
More informationEarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography
EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography Christopher Crosby, San Diego Supercomputer Center J Ramon Arrowsmith, Arizona State University Chaitan
More informationHadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391
Hadoop محبوبه دادخواه کارگاه ساالنه آزمایشگاه فناوری وب زمستان 1391 Outline Big Data Big Data Examples Challenges with traditional storage NoSQL Hadoop HDFS MapReduce Architecture 2 Big Data In information
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationThe CMS Computing Model
The CMS Computing Model Dorian Kcira California Institute of Technology SuperComputing 2009 November 14-20 2009, Portland, OR CERN s Large Hadron Collider 5000+ Physicists/Engineers 300+ Institutes 70+
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationCERN s Business Computing
CERN s Business Computing Where Accelerated the infinitely by Large Pentaho Meets the Infinitely small Jan Janke Deputy Group Leader CERN Administrative Information Systems Group CERN World s Leading Particle
More informationA Better Approach to Leveraging an OpenStack Private Cloud. David Linthicum
A Better Approach to Leveraging an OpenStack Private Cloud David Linthicum A Better Approach to Leveraging an OpenStack Private Cloud 1 Executive Summary The latest bi-annual survey data of OpenStack users
More informationBuilding on Existing Communities: the Virtual Astronomical Observatory (and NIST)
Building on Existing Communities: the Virtual Astronomical Observatory (and NIST) Robert Hanisch Space Telescope Science Institute Director, Virtual Astronomical Observatory Data in astronomy 2 ~70 major
More informationData Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013
Data Replication: Automated move and copy of data PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari c.cacciari@cineca.it Outline The issue
More informationHigh Performance Computing Resources at MSU
MICHIGAN STATE UNIVERSITY High Performance Computing Resources at MSU Last Update: August 15, 2017 Institute for Cyber-Enabled Research Misson icer is MSU s central research computing facility. The unit
More informationWhat s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services
What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationNext Generation Science and Infrastructure Support
Next Generation Science and Infrastructure Support James Lowey Director Network & Computing Systems TGEN The Translational Genomics Research Institute (TGen) Non-profit Biomedical research institute Founded
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationFinanced by the European Commission 7 th Framework Programme. biobankcloud.eu. Jim Dowling, PhD Assoc. Prof, KTH Project Coordinator
Financed by the European Commission 7 th Framework Programme. biobankcloud.eu Jim Dowling, PhD Assoc. Prof, KTH Project Coordinator The Biobank Bottleneck We will soon be generating massive amounts of
More informationThe Materials Data Facility
The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials
More informationIn-Memory Technology in Life Sciences
in Life Sciences Dr. Matthieu-P. Schapranow In-Memory Database Applications in Healthcare 2016 Apr Intelligent Healthcare Networks in the 21 st Century? Hospital Research Center Laboratory Researcher Clinician
More informationHigh-Energy Physics Data-Storage Challenges
High-Energy Physics Data-Storage Challenges Richard P. Mount SLAC SC2003 Experimental HENP Understanding the quantum world requires: Repeated measurement billions of collisions Large (500 2000 physicist)
More informationModelos de Negócio na Era das Clouds. André Rodrigues, Cloud Systems Engineer
Modelos de Negócio na Era das Clouds André Rodrigues, Cloud Systems Engineer Agenda Software and Cloud Changed the World Cisco s Cloud Vision&Strategy 5 Phase Cloud Plan Before Now From idea to production:
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationBig Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationBringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security
Bringing OpenStack to the Enterprise An enterprise-class solution ensures you get the required performance, reliability, and security INTRODUCTION Organizations today frequently need to quickly get systems
More informationCisco Unified Computing System
Cisco Unified Computing System Architected for Workload Diversity and Fast IT Todd Brannon, Director of Product Marketing, Unified Computing tobranno@cisco.com @tobranno Agenda Applications & Architecture
More informationSoftNAS Cloud Performance Evaluation on AWS
SoftNAS Cloud Performance Evaluation on AWS October 25, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for AWS:... 5 Test Methodology... 6 Performance Summary
More informationPortal: Applications of New Technology to Transportation Data Archiving. Kristin Tufte & the Portal Team NATMEC, July 1, 2014, Chicago, IL
+ Portal: Applications of New Technology to Transportation Data Archiving Kristin Tufte & the Portal Team NATMEC, July, 24, Chicago, IL + Who is Kristin? 2 years Data Management System Design and Implementation
More information