e-research in support of climate science

Size: px
Start display at page:

Download "e-research in support of climate science"

Transcription

1 e-research in support of climate science Bryan Lawrence Rutherford Appleton Laboratory reporting the efforts of dozens of other folks in major international projects including, but not limited to BADC itself (Juckes, Pascoe^2, Stephens, Kershaw...) CMIP5 (Taylor, Stoufer) Metafor (Guilyardi, Stenman-Clark, Devine...), IS-ENES (Joussaume) Earth System Grid, Earth System Curator (Balaji, DeLuca, Foster, Middleton, Williams) (none of whom were consulted about the content of this talk) & The Global Organisation for Earth System Science Portals, and the new Earth System Grid Federation

2 Outline The Climate Data Problem The Data Deluge Volume, Heterogeneity and Choice Evaluating Australia in CMIP3 Climate Models & the problem with understanding the differences between models and the simulations they produce. A Brief introduction to Metafor An introduction to the CMIP5 Information Ecosystem Aims and objectives of CMIP5 Global problem: Global simulations simulated globally. Global Deployment of information systems. The Earth System Grid Federation. Bringing the information flow together A tour through the CMIP5 information implementation. Exascale Futures: Some snippets from UK and European strategy thinking Interesting problems ahead? Can we use the next generation of HPC? And if we can, what will we do with the data?

3 In the beginning: observations WMO Images: from J. Lafeuille, 2006 All linked up, with global data distribution. World Meteorological Organisation have been doing e-infrastructure for years!

4 Compare with Research Instruments Some fixed, many deployed in an ad hoc manner!

5 NERC Mobile Research Sensors Slide courtesy of Alan Gadian, NCAS

6 Simulation Data Deluge 8 PB 4 PB 2 PB BADC Figures courtesy of Gary Strand (NCAR)

7 and let's not leave out EO! Source: ESA GSCB Workshop June 2009

8 20th Century Reanalysis Using data collected under the umbrella of the Atmospheric Circulation Reconstructions of the Earth (ACRE) initiative Assimilating (only) surface observations of synoptic pressure, monthly sea surface temperature and sea ice distribution to produce Data available from Jan 1871 to 2008 from NOAA ESRL but: 1 GB/year/variable, 56 ensemble members (+mean and spread), 10 variables (there are more), 120 years = 70 TB...

9 Old Weather/New Results (Modern max & min over HMS Dorothea 1818 Brohan et al 2010)

10 and so to metadata/provenance Neither of the last two examples would be possible without metadata Ship logbooks with location, time, along with measurements (Actually the measurements themselves were metadata for the ship logs.) Station data with information about location and calibration But both demonstrate problems with lack of metadata too: How were those ship measurements made, and with what accuracy? Did that station move, and if so, did anyone write it down (movements often lead to discontinuities in data records) Research data systems generate a wealth of information, usually recorded for a specific task. But that information, with sufficient information, could be repurposed, reinterpreted, and reused but mostly isn't! The sheer amount of data can overwhelm one's ability to reuse if one cannot: Discover data, and Discriminate between similar datasets. Get at basic facts as to what was done, how, and why, and

11 Take Home Points (1) Lots of data out there From routine sensor networks (such as the met networks), and From ad hoc deployment of instruments, not to mention A huge variety of simulations. Data volume increasing exponentially (that's obvious, but pay attention to the units, scale matters!) Heterogeneity increasing (a different sensor on every lamp post?) Great science is done by reusing data collected both for specific purposes and for generic reuse. It's generally hard to reuse data, because ancillary information is not as well kept as the data (and often not even recognised as useful). It will be seen that costs (research time+ money) for handling big data are falling in new places (or at least places where historically we have failed to spend)

12 Climate 101 Observations (AR4 WG1 Fig.1.3) Projections (AR4 WG1 Fig10.4) all seems rather simple doesn't it? Nice consumable curves... Enough for mitigation policy perhaps, but enough for adaptation policy?

13 Climate Delving Deeper... IPCC Fourth Assessment Report: ( Not much agreement in AR4 No stipling) Spatial and temporal subsetting statistics over models...

14 So why was Australia not stippled? Interannual variability means that our projections need to start in the right state (and have the processes to capture that variability correctly). Model uncertainty means that we may not believe our model(s) (any or all) have the relevant resolution and/or physics to capture (all) important processes. Scenario uncertainty means that we are not sure of the impact of different economic and emission futures. (Australia is unlucky, some regions more predictable than others, global mean much more predictable than any region) So what were the salient differences between the models? (Forget looking at the code, these models have millions of lines of code each!) Internal Variability Scenario Uncertainty Model Uncertainty Hawkins and Sutton, Climate Dynamics, 2010 ( /s ) Global

15 ...and deeper: CMIP3: What models did what? AR4: WG1 Table 10.4 Rows: Models, and their output types. Columns: Experiments and Projections (Three layers of complexity: models, experiments, output each of which is itself complex)

16 Digression: What is a model? Lots of interesting models actually! So, can we discuss the difference between two models (and their families of submodels)? Image: from J. Lafeuille, 2006

17 FAR:1990 SAR:1995 TAR:2001 AR4:2007 AR5:2013

18 State of the Art: Model Comparison 1: Tabulate some interesting property (and author grafts hard to get the information ) Guilyardi E. (2006): El Niño- mean state - seasonal cycle interactions in a multi-model ensemble. Clim. Dyn., 26: , DOI: /s

19 State of the Art: Model Comparison Kharin et al, Journal of Climate 2007 doi: /JCLI Dai, A.,J. Climate 2006 doi: /JCLI : Provide some (slightly) organised citation material ( and BOTH author and readers graft hard to get the information)

20 State of the art: Model Comparison 3: Calculate and tabulate some interesting properties and bury in a table or figure Guilyardi E. (2006): El Niño- mean state - seasonal cycle interactions in a multi-model ensemble. Clim. Dyn., 26: , DOI: /s

21 Why does this information detail matter? surely a technical paper can make lots of technical references, and those in the know, are, in the know? By and large: the climate projections community is actually a group of communities: From next generation experimenters, to big GCM modellers, to regional modellers, impacts assessment modelling, to impacts and adaptation modelling. Information does not easily flow between communities!

22 We've already seen concepts: Experiments (e.g. specific scenario projections) Models, with specific (experiment dependent) Metafor: 2.2 M euro EC project to deliver Resolution & Grids a Common Information Model to document these concepts, particularly in the context of supporting the next IPCC assessment report. combinations of sub-component models for specific processes. Associated infrastructure to collect and view such documentation, and and Simulations Build an accompanying governance structure... Models run for specific experiments (and hence specific boundary conditions, e.g. CO2 projections) with specific output variables, frequencies, and durations run on specific platforms

23

24 (Yes we know we shouldn't have this sort of detail in the UML, and it wont be shortly) Aerosol, Atmosphere, Atmospheric Chemistry, Land Ice, Land Surface, Ocean Biogeochem, Ocean, Sea Ice

25 Scientific Properties: Controlled Vocabularies developed with expert community using mindmaps and some rules to aid machine post processing (everyone can use mindmaps: no a priori semantic technology knowledge required.)

26 A piece of the mindmap XML

27 vocabulary driven content in web based human entry tool (A great advertisement for Python and Django)

28 CMIP5 CMIP5: Fifth Coupled Model Intercomparison Project Global community activity under the auspices of the World Meteorological Organisation (WMO) via the World Climate Research Programme (WCRP) Aim: to address outstanding scientific questions that arose as part of the AR4 process, improve understanding of climate, and to provide estimates of future climate change that will be useful to those considering its possible consequences. Method: standard set of model simulations in order to: evaluate how realistic the models are in simulating the recent past, provide projections of future climate change on two time scales, near term (out to about 2035) and long term (out to 2100 and beyond), and understand some of the factors responsible for differences in model projections, including quantifying some key feedbacks such as those involving clouds and the carbon cycle

29 Introduction to CMIP5: The Experiments (two more families) Take home points here: Many distinct experiments, with very different characteristics, which influence the configuration of the models, (what they can do, and how they should be interpreted). >5500 yrs (from Karl Taylor) 29

30 Simple View of CMIP5 Projections/simulations Perturbation examples Centennial & Longer e.g. Pre-industrial control, historical, RCPs, Paleo Natural forcings only, GHG forcings only, abrupt 4xCO2 runs. Longer integrations Decadal 10 and 30 year hindcasts and predictions Hindcasts without volcanoes Atmosphere only AMIP aqua-planet

31 CMIP5 in numbers Simulations: ~90,000 years ~60 experiments ~20 modelling centres (from around the world) using ~30 major(*) model configurations ~1-2 million output atomic datasets ~10's of petabytes of output ~2 petabytes of CMIP5 requested output ~1 petabyte of CMIP5 replicated output Which will be replicated at a number of sites (including ours), to start arriving in the next few weeks. Of the replicants: ~ 220 TB decadal ~ 540 TB long term ~ 220 TB atmos-only ~80 TB of 3hourly data ~215 TB of ocean 3d monthly data! ~250 TB for the cloud feedbacks! ~10 TB of land-biochemistry (from the long term experiments alone). Download Time? ~ 45 TB ocean 3d fields for decadal experiments: 2d (10 Gbit link to yourself), 2 weeks (1 Gbit link to yourself)

32 Handling the data! Earth System Grid (ESG) Earth System Grid FEDERATION (ESGF) US Department of Energy funded project to support the delivery of CMIP5 data to the community. Consists of distributed data node software (to publish data) Tools (Live Access Server, LAS, Bulk Data Mover, BDM, security systems etc) gateway software (to provide catalog and services) Major technical challenge Global initiative to deploy the ESG (and other) software to support: timely access to the data minimum international movement of the data long term access to significant versions of the CMIP5 data. Major social challenge as well as technical challenge 32

33 Earth System Grid Data Nodes Access Control Thredds Data Server Toolbox: GridFTP ESG Publisher inc CDAT Live Access Server (other) OpenDAP servers Bulk Data Mover (BDM) Databases: Thredds Catalog, Postgres & Myproxy Filesystem: preferably with a DRS layout Deployed by data providers to expose their data via Earth System Grid Gateways 33

34 Metadata Harvest ESGF ESGF Starts with Data Nodes 20 to 30, globally distributed, each with o( )tb 34

35 Metadata Harvest ESGF ESGF Data Nodes With Data Gateways Replication Data nodes publish to Gateways Gateways replicate metadata (all data visible on all gateways) 35

36 Metadata Harvest ESGF: Nodes (Original and Replicates) & Gateways Data Replication Data replicated to Core Archives VO Sandpit, November

37 CMIP5: Handling the metadata Three streams of provenance metadata: A) archive metadata B) browse metadata C) character metadata B: Browse Metadata, added independently of the archive Exploiting Metafor controlled vocabularies via a customised CMIP5 questionnaire. compliance enforced by CMIP5 quality control systems, leading to A: Archive Metadata: three levels of C: Character Metadata information from the file system: Data assessment I. CF compliance in the NetCDF files II. Extra CMIP5 required attributes including a unique identifier within each file. III. Use of the Directory Reference Syntax (DRS) to help maintain version information. Compliance enforced by ESG publisher. Three concepts to follow up on: 1) Metafor questionnaire 2) CMIP5 quality control 3) Combining the streams (the information pipeline to the Earth System Grid Gateways) 37

38 B: Metafor Questionnaire 38

39 Metafor questionnaire: many parts... 39

40 C: Metafor Quality Control Specialises ISO19115 DQ package describes quality for CIM_Quality Record 0..n Remote Resource (identified by URL and internal URI Extrinsic! Not intrinsic! 0..n against CIM_Issues CIM_Reports CIM_Measures 0..n Log Files etc Maps nicely onto the upcoming Open Annotation Model ( 40

41 CMIP5 qctool (courtesy of Metafor) (Another great advertisement for Python and Django) 41

42 Bringing it all together for CMIP5 The players: 1) NetCDF CF Conventions + CMIP5 extensions (orange) 2) Earth System Grid + Earth System Curator (green) 3) Metafor (yellow) Conversion code (light blue/grey) Glue: Atom 42

43 Earth System Grid Gateways Earth system grid data nodes publish to a gateway (essentially the gateway harvests the information in their TDS catalog) and provide a search interface both to the harvested data, and to metadata harvested from the metafor questionnaire NCAR PCMDI 43

44 ESG Gateways 44

45 45

46 46

47 Access Control and Delivery: (1) Via Gateway Wget scripts access secure data using myproxy and X509 certficates 47

48 Access Control and Delivery: (2) Direct from Data Node Also walk the directory tree via PyDAP at BADC 48

49 Access Control and Delivery: (2) Direct from Data Node 1 49

50 Access Control and Delivery: (2) Direct from Data Node

51 Access Control and Delivery: (2) Direct from Data Node ESG (and CEDA) have comprehensive access control middleware suitable for use in browsers and command line federated globally! 51

52 Data also available via normal BADC HadGEM2-ES/ rcp45/3hr/atmos/3hr/r1i1p1/latest

53 Access through NetCDF The archive can be accessed directly from applications using the latest beta release of the NetCDF library (4.1.2 beta2, coming soon). Users will need to register, download a security certificate, compile software with the new library further details will be made available shortly. Plot by Roland Schweitzer (NOAA) demonstrating direct access to BADC archive test data node (CMIP3 data from a FERRET program running on his machine).

54 The pieces of CMIP5 support HARDWARE SOFTWARE COLLABORATION METADATA DEVELOPMENTS USAGE TOOLKITS Data storage International effort Approx. 1,000TB International effort Sub-setting Replication system Faster network Batch processing QC & Versioning systems Describing models, experiments and datasets Servers to deliver and process data MOHC DATA SUPPORT Data handling of MOHC models Checking and QC Connection to tools Interfaces to data Standard format and description for all NERC DATA SUPPORT INTERNATIONAL DATA SUPPORT Harmonisation Data handling of HIGEM and Paleo models Data handling of models Format conversion Connection to tools Checking and QC Checking and QC Connection to tools Re-gridding Format conversion Visualisation Analysis Platform UK Community Engagement with Impacts Community Public Sector, general public and Private Sector access Development of Derived Products (From a BADC perspective). Not just about storage!

55 Status summary, 20/01/2011 MOHC data available from BADC Now Data from other modelling centres Mid-February Login facilities at BADC Now ESA datasets prepared for validation Scoping Data processing by BADC Scoping Enhanced meta-data describing model configurations Data entry in progress, portal view: weeks to months. Quality control In progress. Access to data via NetCDF library Coming soon NASA datasets for validation In preparation Issuing DOIs (for accurate data citation) Later this year

56 Moving forward The Earth System Grid is a U.S. Project. There will undoubtedly be successor projects (Key role of ESG Curator and the NOAA Global Interoperability Project) The Earth System Grid Federation is a global activity, led by the Global Organisation for Earth System Science Portals (GO-ESSP) In Europe, we are underpinning ESGF via two EC funded projects: Metafor (which we have seen a lot of), and IS-ENES (InfraStructure for a European Network for Earth Simulation) (and much national work too of course) Metafor and IS-ENES are working on complementary information architectures Metafor will finish in 2011, IS-ENS has some years to run. (Metafor will leave an international governance system in place for the Common Information Model) ExArch: G8 funded proposal (awarded, waiting on EPSRC...) 56

57 And now a segue from data modelling and systems for hpc data to producing hpc data via modelling!

58 Computing Futures Top 500 #1 Best available UK climate sys Bottom climate sys (~ tier 2)

59 HPC Futures: 10 Years Expect the first exascale systems before 2020 (2015/16 according to IBM - available for tier1 in roughly 2022, and tier2 in Similarly, petascale systems will be (in principle) available for climate from next year, but more practically from around By 2022, petascale machines will be deployable in departments, and certainly accessible to the NERC HPC community. Today's HPC performance will be near the desktop Condor pools (or equivalent) will be the bane of data management... BOINC class activity will likely remain equivalent to one big machine, but data-wise network links will constrain them from being a data issue.

60 Three Questions: (1) Can we (climate) effectively (or otherwise) use that HPC? The answer might depend on (2) How would we want to use it (science-wise)? And it might depend on (3) What infrastructure needs to be in place to utilise it? Do we have it? That depends on the science choices but in general, NO!

61 Infrastructure: A hardware and software ecosystem!

62 Kiviat Diagram for Climate Science

63 Scientific Goals? European strategy for climate likely to emphasize: Predictability (and limits therein) Sensitivity and Uncertainty Reliability (at the regional level) Validation Attribution of observed signals in the instrumental record? Reproduction (and understanding) of glacialinterglacial cycles? These push the computational requirements in different directions (sometimes but not always simultaneously): Effect of resolution, processing understanding. Ensemble size Complexity and resolution Data assimilation, Processes (such as volcanism) Carbon cycle, slow earth processes, long duration.

64 But not all computations are the same. Longer execution times, more memory Coupling: more nodes or longer execution time Longer execution times, more memory (more nodes, or more memory per node & need better science) Duration Coupling and Data Handling More Computers and/or Nodes They all involve handlng more data, and all but duration require more code!

65 Realistic Progress via Punctuated Equilibrium? Thus far it has taken roughly five years for evey doubling of climate model resolution, so this would be a pretty aggressive speed of improvement, but recent emphasis on complexity is likely to replaced by an emphasis on resolution for a while...

66 John Shalf, LBNL: May 2010 BUT: Accessed 24/01/2011

67 Capacity Computing in +10 Plenty of computing available: but what sort? What we can't exploit right now Memory per node! Time to fill memory! Timescale Current Peak Flop/s 20 Peta P 1 Exa Memory 1 PB 5PB PB (*) Node Flop/s 200 G 400G 1-10 T Node Memory BW 40 GB/s 100 GB/s GB/s Node Concurrency 32 o(100) o(1000) Interconnect BS 10 GB/s 25 GB/s 50 GB/s # Nodes 100, ,000 o(million) Total Concurrency 3 million 50 Million o(billion) Total I/O 2 TB/s 10 TB/s 20 TB/s Power ~10 MW ~10 MW ~30 MW (*) MTTI (Interuptions) Days Days O(1 day) Storage 30 PB 150 PB 300 PB Source: with (*) modifications from

68 Software Progress Status Some views of community readiness According to Ken Batcher, "A supercomputer is a device for turning compute-bound problems into I/O-bound problems." Scalable Code, Network Infrastructure, Application Environments, & Parallel Libraries Will appear with little lead time! More computing? Different computing? Bigger ensembles! No problem! which is a little unfair, but I think it is fair to say that the community underestimates the effort ahead!

69 New Dynamic Cores Not Enough! NERC and Met Office: NGWCP Five Years NERC funding of 4.4M ~ 8 FTE x 4 years? Oceans and Atmospheres! New Algorithms Needed Massive Parallelisation Replace Dynamic Core Initialisation (Assimilation) Evaluation (EO etc) Framework and Coupling? My Take: Necessary activity: catch us (UK) up in scalability to international peers. Compromise activity: most scalable algorithms are not necessarily the best algorithims Will get us to front edge of current petascale (to exploit what is available NOW!) but not to exploit what could be departmental scale petascale in ten years! (50 million concurrency) So not sufficient!

70 Programming Models and Frameworks Consider completely new However, whatever the programming programming models? model, the framework has to be e.g: Instead of splitting model smarter: domain onto specific More dynamic behaviour decomposition. Queue work (including in-run decomposition items into a processor queue. onto processors as adaptive grids and/or solvers get more New algorithms needed for iterative). global sums etc. Handle checkpointing cleverly! Hide more of the metal from the Would be rather fun to find out science (which is going to be whether one could do that with a hard to do). multi-million way parallelisation of a shallow water equation model! European framework and I/O layer, with national assemblies of model components?

71 More sub-models: More I/O, More Evaluation More Archive Data could scale faster than CPU! and what about data: More sub-models: More I/O, More Evaluation More Archive Data could scale faster than CPU! Data scales with CPU (unless I/O frequency changes.) I/O and data DON'T scale With CPU (some of which is spent on timestep) I/O and archive scale with CPU (ensemble statistics alone probably not enough.) Data complexity Problems + I/O and archive!

72 The Capacity Data Future CMIP CMIP A CMIP B CMIP Atmos Resolution 1 degree 0.5 degree 25 km 10km Atmos Gridpoints 360x x x x , , ,000 5,062, Atmos % Extra Variables Storage (GB/model-mon) Storage Ratio to Comp Ratio to 2012 (data alone) Comp Ratio to 2012 (inc CFL timestep) Atmos # Gridpoints Atmos # Vert Levels 1 PB 500 x? PB Archive Size From a data point of view: increased resolution may be less of a problem than more science & bigger ensembles plus wide availability of petascale computing.? likely to be significantly > 2 Exascale Data Still a HARD HPC problem

73 The Capability Data Future Not planning for this as a data sharing activity yet! John Shalf, LBNL: May Accessed 24/01/2011

74 We can produce it, and we can store it but can we move it? Moore's Law: CPU power doubles every 18 months Gilder's Law: the total communication bandwidth triples every twelve months (doubling time about 8 months). Experience of RAL Wide Area Network (WAN) is consistent with Gilder's Law, with doubling time a little over 7 months over the last 15 years. Sounds good, but Nielsen's Law: internet bandwith doubles every months. High performance network cards have moved from 1 Gbit/s to 10 Gbit/s in a decade: Local Area Network (LAN) access performance doubling every 21 months. In 1998, the BADC moved to Token Ring (100 Mb/s) to cope with ECMWF data rates In 2010, BADC is seriously trying to exploit 10 Gbit/s cards to cope with CMIP5 data replication and delivery but fighting with the WAN headline speed is not easily achievable. Internationally (and for the Met Office) we still exploit sneakernet (sending personal RAID systems and USB disks via courier). Very staff time intensive, would prefer to use WAN! In 2022 we can expect WAN performance to cope with massive distribution, but LAN limiting massive I/O at any particular location unless we take action!

75 Real networks and CMIP5 CMIP5 Notes Volume Net: 0.4 Gbit/s Net: 3 Gbit/s (1) 100 year atomic dataset of daily data for a one-degree grid with ten levels. 100x365x360x18 0x10x4 (bytes) 500M 10s 2s (2) 6 Hourly Version of (1) 2 GB 40s 7s (3) 20 Model Aggregation of (1) 10 GB 4 min 30s (4) 10 Variables of (3) 100 GB 30 min 6 min (5) 10 Users doing (4) 1 TB 5 Hr 1 Hr (6) Ocean 3D fields from decadal experiments (most folks doing decadal prediction?) 45 TB o(2) weeks o(2) days (7) the CFMIP atmos only fields (cloud forecasting community?) 125 TB o(1) month o(5) days (8) One RAID server (worst case single data loss?) 140 TB 5-6 weeks o(1) week (9) Entire Archive Rebuild archive? 1 PB 8 months 1 month We can probably support ONE 10 Gbit/s user and up to TEN 1 Gbit/s Users

76 Practicalities of Data Volumes CMIP CMIP6 2017A CMIP6 2017B CMIP degree 0.5 degree 25 km 10km 1 Gbit/s ,289 17, Gbit/s , Gbit/s Atmos Resolution Download times (hours) for 10 years of (all) data from one ensemble member! Clearly users won't be downloading ALL data from ensembles, now or in the future! ExArch Data comparison: must avoid N x N data movements between N data production sites! Prefer N movements to M (<N) processing sites, exploit better algorithms! Data Movement might be practical MxN times, but only with lightpaths in place ( lightpath = dedicate network slice over dark fibre, provides guaranteed bandwidth unlike shared network). Want dynamic lightpaths: drop when not in use.

77 Modelling Post-Processing Infrastructure: UK Plan? &3?G Local Analysis Cloud (Users control the software environment but can mount high performance disk WITH DATA ALREADY THERE) Why light paths? For CMIP5, synchronising 1 PB archive at 1 % level implies 10 TB/day movement implies 1 Gbit/s requirement. International Light Paths will cost money!

78 Our Exascale Future SI prefix k kilo Name Power of 10 or 2 Status thousand Count on fingers M mega million Trivial G giga billion Small T tera trillion Real P peta quadrillion Challenging 103 E exa quintillion Z zetta sextillion Y yotta septillion Aspirational Wacko Science fiction 2 2 Stuart Feldman, Google, 2010 Too easy to say! (bnl)

79 e-infrastructure for Climate: a roadmap TODAY: Models developed independently and integrated (sometimes) in parochial frameworks with various level of support for their usage communities. Data held in local archives (sometimes nationally), with IS-ENES working on distributed database concepts, but poor distributed access. Difficulties with network bandwidth abound. IN TEN YEARS: Models developed by communities working with common coding conventions and shared support. Data held in distributed archives, with key data sets aggregated and replicated as necessary, with well understood routes for moving data as necessary & cloud-like processing facilities. Good and improving support for CF-netcdf in the modelling community + immature model descriptions from Metafor & Observations and Measurements standards starting to be examined by the community. CF continues to be supported, but now prevalent in the EO and observational communities. The information and vocabularies built by Metafor are maintained by an international community. O&M in wide use. Data manipulation and access tools developed independently and with various levels of support. Data manipulation and access tools are developed by communities working with common conventions and shared support.

Data Issues for next generation HPC

Data Issues for next generation HPC Data Issues for next generation HPC Bryan Lawrence National Centre for Atmospheric Science National Centre for Earth Observation Rutherford Appleton Laboratory Caveats: Due to time, discussion is limited

More information

The CEDA Archive: Data, Services and Infrastructure

The CEDA Archive: Data, Services and Infrastructure The CEDA Archive: Data, Services and Infrastructure Kevin Marsh Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk with thanks to V. Bennett, P. Kershaw, S. Donegan and the rest of the CEDA Team

More information

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P.

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. Kushner, D. Waliser, S. Pascoe, A. Stephens, P. Kershaw,

More information

European and international background

European and international background IS-ENES2 General Assembly 11-13th June 2014 Barcelona European and international background Metadata and ES-DOC (including statistical downscaling) Sébastien Denvil, Mark Greenslade, Allyn Treshansky,

More information

The Future of ESGF. in the context of ENES Strategy

The Future of ESGF. in the context of ENES Strategy The Future of ESGF in the context of ENES Strategy With a subtext of the important role of IS-ENES2 In addressing solutions to the following question: Two thirds of data written is never read! WHY NOT?

More information

ExArch, Edinburgh, March 2014

ExArch, Edinburgh, March 2014 ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. Kushner, D. Waliser, S. Pascoe, A. Stephens, P. Kershaw,

More information

Meta4+CMIP5. Bryan Lawrence, Gerry Devine and many many others

Meta4+CMIP5. Bryan Lawrence, Gerry Devine and many many others Meta4+CMIP5 Bryan Lawrence, Gerry Devine and many many others, Outline A metadata taxonomy: a place for everyone (and everyone in their place :-) Introduction to Key Metafor Concepts CIM Metafor and IS-ENES

More information

Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners

Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners 24th Forum ORAP Cite Scientifique; Lille, France March 26, 2009 Don Middleton National

More information

Data Reference Syntax Governing Standards within Climate Research Data archived in the ESGF

Data Reference Syntax Governing Standards within Climate Research Data archived in the ESGF Data Reference Syntax Governing Standards within Climate Research Data archived in the ESGF Michael Kolax Swedish Meteorological and Hydrological Institute Motivation for a DRS within CMIP5 In CMIP5 the

More information

Uniform Resource Locator Wide Area Network World Climate Research Programme Coupled Model Intercomparison

Uniform Resource Locator Wide Area Network World Climate Research Programme Coupled Model Intercomparison Glossary API Application Programming Interface AR5 IPCC Assessment Report 4 ASCII American Standard Code for Information Interchange BUFR Binary Universal Form for the Representation of meteorological

More information

CMIP5 Datenmanagement erste Erfahrungen

CMIP5 Datenmanagement erste Erfahrungen CMIP5 Datenmanagement erste Erfahrungen Dr. Michael Lautenschlager Deutsches Klimarechenzentrum Helmholtz Open Access Webinare zu Forschungsdaten Webinar 18-17.01./28.01.14 CMIP5 Protocol +Timeline Taylor

More information

Index Introduction Setting up an account Searching and accessing Download Advanced features

Index Introduction Setting up an account Searching and accessing Download Advanced features ESGF Earth System Grid Federation Tutorial Index Introduction Setting up an account Searching and accessing Download Advanced features Index Introduction IT Challenges of Climate Change Research ESGF Introduction

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Linking datasets with user commentary, annotations and publications: the CHARMe project

Linking datasets with user commentary, annotations and publications: the CHARMe project Linking datasets with user commentary, annotations and publications: the CHARMe project Jon Blower j.d.blower@reading.ac.uk University of Reading On behalf of all CHARMe partners! http://www.charme.org.uk

More information

C3S Data Portal: Setting the scene

C3S Data Portal: Setting the scene C3S Data Portal: Setting the scene Baudouin Raoult Baudouin.raoult@ecmwf.int Funded by the European Union Implemented by Evaluation & QC function from European commission e.g.,fp7 Space call Selected set

More information

The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5

The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5 Proceedings of the Asia-Pacific Advanced Network 2011 v. 32, p. 121-130. The Earth System Grid Federation: Delivering globally accessible petascale data for CMIP5 Dean N. Williams 1, Bryan N. Lawrence

More information

InfraStructure for the European Network for Earth System modelling. From «IS-ENES» to IS-ENES2

InfraStructure for the European Network for Earth System modelling. From «IS-ENES» to IS-ENES2 InfraStructure for the European Network for Earth System modelling From «IS-ENES» to IS-ENES2 Sylvie JOUSSAUME, CNRS, Institut Pierre Simon Laplace, Coordinator ENES European Network for Earth System modelling

More information

Intro to CMIP, the WHOI CMIP5 community server, and planning for CMIP6

Intro to CMIP, the WHOI CMIP5 community server, and planning for CMIP6 Intro to CMIP, the WHOI CMIP5 community server, and planning for CMIP6 Caroline Ummenhofer, PO Overview - Background on IPCC & CMIP - WHOI CMIP5 server - Available model output - How to access files -

More information

Production Petascale Climate Data Replication at NCI Lustre and our engagement with the Earth Systems Grid Federation (ESGF)

Production Petascale Climate Data Replication at NCI Lustre and our engagement with the Earth Systems Grid Federation (ESGF) Joseph Antony, Andrew Howard, Jason Andrade, Ben Evans, Claire Trenham, Jingbo Wang Production Petascale Climate Data Replication at NCI Lustre and our engagement with the Earth Systems Grid Federation

More information

GSCB-Workshop on Ground Segment Evolution Strategy

GSCB-Workshop on Ground Segment Evolution Strategy GSCB-Workshop on Ground Segment Evolution Strategy N. Hanowski, EOP-G, Ground Segment and Operations Department, nicolaus.hanowski@esa.int 24 September 2015 ESA UNCLASSIFIED For Official Use - Slide 1

More information

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies

More information

Distributed Online Data Access and Analysis

Distributed Online Data Access and Analysis Distributed Online Data Access and Analysis Ruixin Yang George Mason University Slides from SIESIP Partners and from NOMADS PI, Glenn K. Rutledge of US NCDC on NOMADS SIESIP: Seasonal-to-Interannual Earth

More information

Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery. nci.org.

Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery. nci.org. The important role of HPC and data-intensive infrastructure facilities in supporting a diversity of Virtual Research Environments (VREs): working with Climate Clare Richards, Benjamin Evans, Kate Snow,

More information

Data Serving Climate Simulation Science at the NASA Center for Climate Simulation (NCCS)

Data Serving Climate Simulation Science at the NASA Center for Climate Simulation (NCCS) Data Serving Climate Simulation Science at the NASA Center for Climate Simulation (NCCS) MSST2011, May 24-25, 2011 Ellen Salmon ( Ellen.Salmon@nasa.gov ) High Performance Computing, Code 606.2 Computational

More information

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan Metadata for Data Discovery: The NERC Data Catalogue Service Steve Donegan Introduction NERC, Science and Data Centres NERC Discovery Metadata The Data Catalogue Service NERC Data Services Case study:

More information

CMIP5 Update. Karl E. Taylor. Program for Climate Model Diagnosis and Intercomparison (PCMDI) Lawrence Livermore National Laboratory

CMIP5 Update. Karl E. Taylor. Program for Climate Model Diagnosis and Intercomparison (PCMDI) Lawrence Livermore National Laboratory CMIP5 Update Karl E. Taylor Program for Climate Model Diagnosis and Intercomparison () Lawrence Livermore National Laboratory Presented to the WCRP Working Group on Coupled Modelling Hamburg, Germany 24

More information

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand and the middleware CiGri for the Whisper project Use Case of Data-Intensive processing with irods Collaboration between: IT part of Whisper: Sofware development, computation () Platform Ciment: IT infrastructure

More information

New Datasets, Functionality and Future Development. Ashwanth Srinivasan, (FSU) Steve Hankin (NOAA/PMEL) Major contributors: Jon Callahan (Mazama(

New Datasets, Functionality and Future Development. Ashwanth Srinivasan, (FSU) Steve Hankin (NOAA/PMEL) Major contributors: Jon Callahan (Mazama( HYCOM Data Service New Datasets, Functionality and Future Development Ashwanth Srinivasan, (FSU) Steve Hankin (NOAA/PMEL) Major contributors: Jon Callahan (Mazama( Consulting) Roland Schweitzer (Weathertop

More information

Ian Foster, An Overview of Distributed Systems

Ian Foster, An Overview of Distributed Systems The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Ian Foster,

More information

The CEDA Web Processing Service for rapid deployment of earth system data services

The CEDA Web Processing Service for rapid deployment of earth system data services The CEDA Web Processing Service for rapid deployment of earth system data services Stephen Pascoe Ag Stephens Phil Kershaw Centre of Environmental Data Archival 1 1 Overview of CEDA-WPS History first implementation

More information

Modeling groups and Data Center Requirements. Session s Keynote. Sébastien Denvil, CNRS, Institut Pierre Simon Laplace (IPSL)

Modeling groups and Data Center Requirements. Session s Keynote. Sébastien Denvil, CNRS, Institut Pierre Simon Laplace (IPSL) Modeling groups and Data Center Requirements. Session s Keynote. Sébastien Denvil, CNRS, Institut Pierre Simon Laplace (IPSL) Outline Major constraints (requirements' DNA) Modeling center requirements/constraints

More information

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations Ref. Ares(2017)3291958-30/06/2017 Readiness of ICOS for Necessities of integrated Global Observations Deliverable 6.4 Initial Data Management Plan RINGO (GA no 730944) PUBLIC; R RINGO D6.5, Initial Risk

More information

dan.fay@microsoft.com Scientific Data Intensive Computing Workshop 2004 Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through

More information

BIG DATA CHALLENGES A NOAA PERSPECTIVE

BIG DATA CHALLENGES A NOAA PERSPECTIVE BIG DATA CHALLENGES A NOAA PERSPECTIVE Dr. Edward J. Kearns NASA Examiner, Science and Space Branch, OMB/EOP and Chief (acting), Remote Sensing and Applications Division National Climatic Data Center National

More information

Introduction to PRECIS

Introduction to PRECIS Introduction to PRECIS Joseph D. Intsiful CGE Hands-on training Workshop on V & A, Asuncion, Paraguay, 14 th 18 th August 2006 Crown copyright Page 1 Contents What, why, who The components of PRECIS Hardware

More information

The EC Presenting a multi-terabyte dataset MWF via ER the web

The EC Presenting a multi-terabyte dataset MWF via ER the web The EC Presenting a multi-terabyte dataset MWF via ER the web Data Management at the BADC Ag Stephens BADC Data Scientist 11 November 2003 Presentation outline An introduction to the BADC. The project

More information

International Big Science Coming to Your Campus Soon (Sooner Than You Think )

International Big Science Coming to Your Campus Soon (Sooner Than You Think ) International Big Science Coming to Your Campus Soon (Sooner Than You Think ) Lauren Rotman ESnet Science Engagement Group Lead April 7, 2014 ESnet Supports DOE Office of Science Office of Science provides

More information

IBM Spectrum Scale IO performance

IBM Spectrum Scale IO performance IBM Spectrum Scale 5.0.0 IO performance Silverton Consulting, Inc. StorInt Briefing 2 Introduction High-performance computing (HPC) and scientific computing are in a constant state of transition. Artificial

More information

A Climate Monitoring architecture for space-based observations

A Climate Monitoring architecture for space-based observations A Climate Monitoring architecture for space-based observations Wenjian Zhang World Meteorological Organization Mark Dowell European Commission Joint Research Centre WMO OMM World Climate Conference-3:

More information

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products What is big data? Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

More information

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation INSPIRE 2010, KRAKOW Dr. Arif Shaon, Dr. Andrew Woolf (e-science, Science and Technology Facilities Council, UK) 3

More information

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality

More information

N. Marusov, I. Semenov

N. Marusov, I. Semenov GRID TECHNOLOGY FOR CONTROLLED FUSION: CONCEPTION OF THE UNIFIED CYBERSPACE AND ITER DATA MANAGEMENT N. Marusov, I. Semenov Project Center ITER (ITER Russian Domestic Agency N.Marusov@ITERRF.RU) Challenges

More information

Green Supercomputing

Green Supercomputing Green Supercomputing On the Energy Consumption of Modern E-Science Prof. Dr. Thomas Ludwig German Climate Computing Centre Hamburg, Germany ludwig@dkrz.de Outline DKRZ 2013 and Climate Science The Exascale

More information

Amazon Elastic Cloud (EC2): An Easy Route to Your Own Dedicated Webserver

Amazon Elastic Cloud (EC2): An Easy Route to Your Own Dedicated Webserver Amazon Elastic Cloud (EC2): An Easy Route to Your Own Dedicated Webserver The Amazon Elastic Cloud is a service which allows anyone to create their own dedicated web-servers which Aamazon manage & run.

More information

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The

More information

High Performance Computing. What is it used for and why?

High Performance Computing. What is it used for and why? High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial

More information

Chris Hickman President, Energy Services

Chris Hickman President, Energy Services focused on the part of the energy web that can be built out now, for profit, with near-term ROI and a growth path that could eventually produce macro-level network effects. Chris Hickman President, Energy

More information

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010. Market Report Scale-out 2.0: Simple, Scalable, Services- Oriented Storage Scale-out Storage Meets the Enterprise By Terri McClure June 2010 Market Report: Scale-out 2.0: Simple, Scalable, Services-Oriented

More information

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since

More information

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation, Integration Alan Blatecky Director OCI 1 1 Framing the

More information

CMIP5 Community Storage Server - Advanced Dataset Search: User Manual & Data Descriptiom

CMIP5 Community Storage Server - Advanced Dataset Search: User Manual & Data Descriptiom Department of Physical Oceanography CMIP5 Community Storage Server - Advanced Dataset Search: User Manual & Data Descriptiom December 2, 2013 Alexander K. Ekholm Engineer I Office Phone: +1 508 289 4930

More information

Massive Scalability With InterSystems IRIS Data Platform

Massive Scalability With InterSystems IRIS Data Platform Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special

More information

NARCCAP: North American Regional Climate Change Assessment Program. Seth McGinnis, NCAR

NARCCAP: North American Regional Climate Change Assessment Program. Seth McGinnis, NCAR NARCCAP: North American Regional Climate Change Assessment Program Seth McGinnis, NCAR mcginnis@ucar.edu NARCCAP: North American Regional Climate Change Assessment Program Nest highresolution regional

More information

Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net

Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net Data Access and Analysis with Distributed, Federated Data Servers in climateprediction.net Neil Massey 1 neil.massey@comlab.ox.ac.uk Tolu Aina 2, Myles Allen 2, Carl Christensen 1, David Frame 2, Daniel

More information

EUDAT & SeaDataCloud

EUDAT & SeaDataCloud EUDAT & SeaDataCloud SeaDataCloud Kick-off meeting Damien Lecarpentier CSC-IT Center for Science www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-infrastructures.

More information

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing. Data Centers Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

How to Cloud for Earth Scientists: An Introduction

How to Cloud for Earth Scientists: An Introduction How to Cloud for Earth Scientists: An Introduction Chris Lynnes, NASA EOSDIS* System Architect *Earth Observing System Data and Information System Outline Cloud Basics What good is cloud computing to an

More information

Can "scale" cloud applications "on the edge" by adding server instances. (So far, haven't considered scaling the interior of the cloud).

Can scale cloud applications on the edge by adding server instances. (So far, haven't considered scaling the interior of the cloud). Recall: where we are Wednesday, February 17, 2010 11:12 AM Recall: where we are Can "scale" cloud applications "on the edge" by adding server instances. (So far, haven't considered scaling the interior

More information

The C3S Climate Data Store and its upcoming use by CAMS

The C3S Climate Data Store and its upcoming use by CAMS Atmosphere The C3S Climate Data Store and its upcoming use by CAMS Miha Razinger, ECMWF thanks to Angel Alos, Baudouin Raoult, Cedric Bergeron and the CDS contractors Atmosphere What are C3S and CDS? The

More information

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc. UK LUG 10 th July 2012 Lustre at Exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Exascale I/O requirements Exascale I/O model 3 Lustre at Exascale - UK LUG 10th July 2012 Exascale I/O

More information

Data Management Components for a Research Data Archive

Data Management Components for a Research Data Archive Data Management Components for a Research Data Archive Steven Worley and Bob Dattore Scientific Computing Division Computational and Information Systems Laboratory National Center for Atmospheric Research

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

Identifier Infrastructure Usage for Global Climate Reporting

Identifier Infrastructure Usage for Global Climate Reporting Identifier Infrastructure Usage for Global Climate Reporting IoT Week 2017, Geneva Tobias Weigel Deutsches Klimarechenzentrum (DKRZ) World Data Center for Climate (WDCC) Scientific driver: Global climate

More information

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Slides courtesy of Tim Wood Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

The EU-funded BRIDGE project

The EU-funded BRIDGE project from Newsletter Number 117 Autumn 2008 COMPUTING The EU-funded BRIDGE project doi:10.21957/t8axr71gg0 This article appeared in the Computing section of ECMWF Newsletter No. 117 Autumn 2008, pp. 29-32.

More information

Toward the Development of a Comprehensive Data & Information Management System for THORPEX

Toward the Development of a Comprehensive Data & Information Management System for THORPEX Toward the Development of a Comprehensive Data & Information Management System for THORPEX Mohan Ramamurthy, Unidata Steve Williams, JOSS Jose Meitin, JOSS Karyn Sawyer, JOSS UCAR Office of Programs Boulder,

More information

IMOS/AODN ocean portal: tools for data delivery. Roger Proctor, Peter Blain, Sebastien Mancini IMOS

IMOS/AODN ocean portal: tools for data delivery. Roger Proctor, Peter Blain, Sebastien Mancini IMOS IMOS/AODN ocean portal: tools for data delivery Roger Proctor, Peter Blain, Sebastien Mancini IMOS Data from IMOS: The six Nodes Bluewater and Climate Node open ocean focus Five Regional Nodes continental

More information

International Climate Network Working Group (ICNWG) Meeting

International Climate Network Working Group (ICNWG) Meeting International Climate Network Working Group (ICNWG) Meeting Eli Dart ESnet Science Engagement Lawrence Berkeley National Laboratory Workshop on Improving Data Mobility & Management for International Climate

More information

Users and utilization of CERIT-SC infrastructure

Users and utilization of CERIT-SC infrastructure Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user

More information

Infrastructure Underpinnings of the GFDL Workflow

Infrastructure Underpinnings of the GFDL Workflow Infrastructure Underpinnings of the GFDL Workflow A Trio Of Lightning Talks For ISENES2 Chandin Wilson, Engility Amy Langenhorst, NOAA Karen Paffendorf, Princeton University Erik Mason, Engility Jeffrey

More information

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations Argonne National Laboratory Argonne National Laboratory is located on 1,500

More information

Identifying Workloads for the Cloud

Identifying Workloads for the Cloud Identifying Workloads for the Cloud 1 This brief is based on a webinar in RightScale s I m in the Cloud Now What? series. Browse our entire library for webinars on cloud computing management. Meet our

More information

Towards Zero Cost I/O:

Towards Zero Cost I/O: Towards Zero Cost I/O: Met Office Unified Model I/O Server Martyn Foster Martyn.Foster@metoffice.gov.uk A reminder Amdahl s Law Performance is always constrained by the serial code T T s T P T T s T 1

More information

CERA: Database System and Data Model

CERA: Database System and Data Model CERA: Database System and Data Model Michael Lautenschlager Frank Toussaint World Data Center for Climate (M&D/MPIMET, Hamburg) NINTH WORKSHOP ON METEOROLOGICAL OPERATIONAL SYSTEMS ECMWF, Reading/Berks.,

More information

Kenneth A. Hawick P. D. Coddington H. A. James

Kenneth A. Hawick P. D. Coddington H. A. James Student: Vidar Tulinius Email: vidarot@brandeis.edu Distributed frameworks and parallel algorithms for processing large-scale geographic data Kenneth A. Hawick P. D. Coddington H. A. James Overview Increasing

More information

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC INTERTWinE workshop Decoupling data computation from processing to support high performance data analytics Nick Brown, EPCC n.brown@epcc.ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation

More information

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data 46 Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

More information

CoG: The NEW ESGF WEB USER INTERFACE

CoG: The NEW ESGF WEB USER INTERFACE CoG: The NEW ESGF WEB USER INTERFACE ESGF F2F Workshop, Livermore, CA, December 2014 Luca Cinquini [1], Cecelia DeLuca [2], Sylvia Murphy [2] [1] California Ins/tute of Technology & NASA Jet Propulsion

More information

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration

Who am I? I m a python developer who has been working on OpenStack since I currently work for Aptira, who do OpenStack, SDN, and orchestration Who am I? I m a python developer who has been working on OpenStack since 2011. I currently work for Aptira, who do OpenStack, SDN, and orchestration consulting. I m here today to help you learn from my

More information

Workshop 4.4: Lessons Learned and Best Practices from GI-SDI Projects II

Workshop 4.4: Lessons Learned and Best Practices from GI-SDI Projects II Workshop 4.4: Lessons Learned and Best Practices from GI-SDI Projects II María Cabello EURADIN technical coordinator On behalf of the consortium mcabello@tracasa.es euradin@navarra.es Scope E-Content Plus

More information

Levels of Service Authors: R. Duerr, A. Leon, D. Miller, D.J Scott Date 7/31/2009

Levels of Service Authors: R. Duerr, A. Leon, D. Miller, D.J Scott Date 7/31/2009 Levels of Service Authors: R. Duerr, A. Leon, D. Miller, D.J Scott Date 7/31/2009 CHANGE LOG Revision Date Description Author 1.0 7/31/2009 Original draft Duerr, Leon, Miller, Scott 2.0 3/19/2010 Updated

More information

Finding a needle in Haystack: Facebook's photo storage

Finding a needle in Haystack: Facebook's photo storage Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,

More information

Introduction to High Performance Parallel I/O

Introduction to High Performance Parallel I/O Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing

More information

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Data Replication: Automated move and copy of data PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari c.cacciari@cineca.it Outline The issue

More information

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009 Magellan Project Jeff Broughton NERSC Systems Department Head October 7, 2009 1 Magellan Background National Energy Research Scientific Computing Center (NERSC) Argonne Leadership Computing Facility (ALCF)

More information

Building a Data Strategy for a Digital World

Building a Data Strategy for a Digital World Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service

More information

escience in the Cloud Dan Fay Director Earth, Energy and Environment

escience in the Cloud Dan Fay Director Earth, Energy and Environment escience in the Cloud Dan Fay Director Earth, Energy and Environment dan.fay@microsoft.com New ways to analyze and communicate data EOS Article: Mountain Hydrology, Snow Color, and the Fourth Paradigm

More information

C3S components. Copernicus Climate Change Service. climate. Data Store. Sectoral Information System. Outreach and Dissemination

C3S components. Copernicus Climate Change Service. climate. Data Store. Sectoral Information System. Outreach and Dissemination C3S components Climate Data Store Sectoral Information System Evaluation and Quality Control Outreach and Dissemination ECVs past, present and future Observed, reanalysed and simulated Derived climate

More information

Bruce Wright, John Ward, Malcolm Field, Met Office, United Kingdom

Bruce Wright, John Ward, Malcolm Field, Met Office, United Kingdom The Met Office s Logical Store Bruce Wright, John Ward, Malcolm Field, Met Office, United Kingdom Background are the lifeblood of the Met Office. However, over time, the organic, un-governed growth of

More information

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager The University of Oxford campus grid, expansion and integrating new partners Dr. David Wallom Technical Manager Outline Overview of OxGrid Self designed components Users Resources, adding new local or

More information

Cluster-Level Google How we use Colossus to improve storage efficiency

Cluster-Level Google How we use Colossus to improve storage efficiency Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International

More information

Exaflood Optics 1018

Exaflood Optics 1018 Exaflood Optics 10 18 Good News from US in 2009! Since 2000 US residential bandwidth grew 54X US wireless bandwidth grew 542X Total consumer bandwidth grew 91X Total per capita consumer BW grew 84X Total

More information

Introduction of SeaDataNet and EMODNET Working towards a harmonised data infrastructure for marine data. Peter Thijsse MARIS

Introduction of SeaDataNet and EMODNET Working towards a harmonised data infrastructure for marine data. Peter Thijsse MARIS Introduction of SeaDataNet and EMODNET Working towards a harmonised data infrastructure for marine data Peter Thijsse MARIS CLIPC & IS-ENES 2 workshop, KNMI, November 2014 Outline 1. Introduction to marine

More information

CSE 124: Networked Services Lecture-17

CSE 124: Networked Services Lecture-17 Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

Meteorology and Python

Meteorology and Python Meteorology and Python desperately trying to forget technical details Claude Gibert, Europython 2011 Background Meteorology - NWP Numerical Weather Prediction ECMWF European Centre for Medium-Range Weather

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San

More information