Percipient StorAGe for Exascale Data Centric Computing Computing for the Exascale
|
|
- Vernon O’Connor’
- 5 years ago
- Views:
Transcription
1 Percipient StorAGe for Exascale Data Centric Computing Computing for the Exascale Shaun de Witt Culham Centre for Fusion Energy, UK 2 nd Technical Meeting on Fusion Data Processing, Validation and Analysis - June 2 nd 2017 Per-cip-i-ent (pr-sp-nt) Adj. Having the power of perceiving, especially perceiving keenly and readily. n. One that perceives. This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No
2 Storage cannot keep up w/ Compute! Way too much data Way too much energy to move data New Storage devices use unclear Opportunity: Big Data Analytics and Extreme Computing Overlaps Storage Problems at Extreme Scale
3 Extreme Computing Changing I/O needs HDDs cannot keep up Big (Massive!) Data Analysis Avoid Data Movements Manage & Process extremely large data sets Need Exascale Data Centric Computing Systems Big Data Extreme Computing (BDEC Systems) SAGE Validates a BDEC System which can Ingest, Store, Process and Manage extreme amounts of data SAGE Project Goal
4 Consortium Coordinator: Seagate
5 Provide/Validate Novel Storage Architecture Object Storage w/ a very flexible API, driving Multi-tiered Hierarchical Storage System Providing integrated compute capability Purpose: Increase overall scientific throughput! Co-designed with Use cases Integrated with Ecosystem tools Provide roadmap of component technologies for achieving Extreme Scale Including Programing Models and Access Methods European Excellence in the area of Exascale Data Centric Computing by targeting; HPC & Big Data technology Influencers Scientific Communities & Infrastructure/Wider Markets SAGE Overall Objectives
6 Tracking very strongly European Exascale and HPC objectives Very Active Participation in ETP4HPC Strategic Research Agenda (SRA) Goals SRA2 ( Continue to be extremely well aligned Driving future H2020 projects ( FETHPC2, ESDs, etc) Synergistic with worldwide initiatives ECP (Exascale Computing Project): BDEC (Big Data Extreme Compute) : SAGE Overall Objectives
7 ITER Data Analysis Prompt analysis, pre-emptive caching, Engineering and Modelling Note PPPL already working on Exasacale development with XGC* opportunities for collaboration? * SAGE Relevance to Fusion
8 Research Areas
9 Primary Goal Demonstrate Use Cases & Co-Design the system Methodology Obtain Requirements from: Various Use Cases Detailed profiling supported by Tools Feedback requirements to the platform ( Co-Design ) Co-Design Requirements Automated Application Characterization w/ Tools Applications
10 CCFE fusion energy applications Analytics of Log-files for Fusion (ALF) Spectre: Providing near real time feedback on plasma Finite element analysis using ParaFEM Savu Tomography reconstruction and processing pipeline Ray Distributed assembly of metagenome JURASSIC Fast radiative transfer model simulation code ipic3d Particle-in-Cell code for simulations of space plasma NEST Simulator for spiking neural network models Angelia benchmarking framework Benchmarking framework for Apache Flink Applications
11 Goal Build the data centric computing platform Methodology Advanced Object Storage New NVRAM Technologies in I/O stack Ability for I/O to Accept computation Incl. Memory as part of storage tiers API for massive data ingest and extreme I/O Commodity Server & Computing Components in I/O stack Percipient Storage Overview
12 Percipient Storage Stack
13 API Layers
14 SAGE Unique Features
15 HSM design Data organization is based on Mero composite layout Data migration decision come from User access Knowledge base filled from all Mero information (hints, events, ) Tiered Object Storage
16 Function Offloading
17 Logical Grouping of Objects System properties access latency, resilience, Object properties Formats, access mechanism Scientific properties Location, time, diagnostic Event based shot, earthquake, hurricane, Containers
18 Built in FLINK connector Maybe NN are better for low latency
19 HDF5 and NetCDF Integration
20 Status: Global Partitioned Address Space for hierarchical storage Realized via MPI windows allocated on storage Can substitute MPI I/O eliminating distinction between programming interfaces for memory and storage Implemented in PMPI and MPICH (available as opensource on github) Use MPI hints No change to the MPI standard MPI and PGAS 20
21 ipic3d PGAS I/O Function shipping for data analysis JURASSIC Function off-loading or run-time system for Data pre-processing (data extraction) Data compress/decompression Asynchronous I/O with data staging using semi-persistent cache NEST Data post-processing (data analysis) using run-time system Savu Native HDF5 support doi: /j.fusengdes SAGE Feature Analysis Native block store with Ceph-like interface Function off-loading for slicing of data regions using a Python interface Spectre Apache Flink for parallelising FFT and buffered streaming of data to storage
22 pnfs is standard parallel file system protocol Separate MetaData access from Data access pnfs Protocol Posix namespace is stored in Mero KV Store Files data are stored in Mero Objects MD Server Mero KVS Clients Clients Clients Clients Clients Clients Network Mero Protocol Mero Objects pnfs Services on Object Storage 22
23 Distributed Transactions Groups of storage (including I/O) operations that are atomic in the face of certain failures known as allowed failures. Allowed Failures Transient network failures Mero DTM Node crash and restart Distributed Transactions Manager (DTM) Creates transactions Mero DTM Controls transactions Actions Scatter-gather write of data into a Mero object Scatter-gather read of data from a Mero object Mero DTM Creation of a new sub-directory Renaming of a file Writing of a data unit reconstructed from parity blocks in a spare unit Mero DTM DTM Mero DTM DTM DTM Mero Mero Mero Extreme resiliency for applications
24 Goal Hardware definition, integration and demonstration Methodology Design and Bring-up of SAGE hardware Seagate Hardware Atos Hardware Integration of all the software components Juelich Supercomputer Center(JSC) Demonstrate use cases Extrapolate performance to Exascale Study other Object stores vis-à-vis Mero SAGE Hardware Prototype Built, shipped and integrated at JS Integration, Demonstration
25 Sage is extremely well aligned to the broader goals for Europe in the area of Storage, I/O and Energy Efficiency M-BIO-1: Tightly coupled Storage class memory io systems demo M-BIO-3: Multi-tiered heterogeneous storage system demo M-BIO-5: Big data analytics tools developed for hpc use M-BIO-6: Active Storage capability demonstrated M-BIO-8: Extreme scale multi-tier data management tools available M-ARCH-3: New compute nodes and storage architecture use nvram M-ENER X: Addresses Energy goals by avoiding data movements 100x more energy to move data compared to compute!! M-ENER-FT-10: Application survival on unreliable hardware Alignment with European Goals [ ETP4HPC SRA]
26 Primary European Storage Platform for Extreme Scale (Applicability: Big Science and BDEC) Key inputs into European Road mapping Key inputs into Data Intensive Research programs Commercial and Market Impacts (Storage, Systems & Tools) Expected Impacts & Innovation
27 Acknowledgements Sai Narasimhamurthy(Seagate) Dirk Pleiter (Forschungszentrum Jülich) Stefano Markidis (Kungliga Tekniska Högskolan) Questions?
28 Backup
29 Goal Explore tools and services on top of Mero HSM PoC implementation ready pnfs PoC implementation ready Performance Analysis tools framework ready Completed scoping/arch of Data Integrity Checking Methodology HSM Methods to automatically move data across tiers pnfs parallel file system access on Mero Scale out Object Storage Integrity checking service provision Allinea Performance Analysis Tools provision Services, Systemware & Tools
30 Goal Mero Object Storage Platform Development(w/ Clovis API) Evaluate NVRAM Options Methodology Co-design Extreme Scale Object Store Components Concept & Architecture of Key Mero Exascale components NVRAM state of the Art studies Low Level system software/emulation of NVRAM Study NVRAM technologies incl. Emulation Mero Extreme scale Features & NVRAM
31 PGAS for SAGE Proof of Concept Following up from MPI-IO gap analysis Runtimes Proof of Concept Detailed architecture of Data Analytics Goal Explore usage of SAGE by programming models, runtimes and data analytics solutions Methodology Usage of SAGE through MPI and PGAS Adapt MPI-IO for SAGE Adapt PGAS for SAGE Runtimes for SAGE Pre/Post Processing Volume Rendering Exploit Caching hierarchy Data Analytics methods on top of Clovis Apache Flink over Clovis, looking beyond Hadoop Exploit NVRAM as extension of Memory Programming Models and Analytics
32 SAGE to lay the foundation for a European storage platform to be #1 at Extreme Scale SAGE Project Ambition
33 [Recommendation 1] As part of the validation process, we recommend the consortium to include criteria to evaluate the performance of the proposed storage system. These criteria will serve to monitor internally the progress of the project and will not entail additional obligations towards the European Commission. o WP5 Discussion o o A PEWG (Performance Evaluation Working Group) was setup Methods to continuously track the performance of the SAGE system [Recommendation 2] As part of the dissemination activities concerning future work, we recommend the consortium to better advertise the SAGE project to the scientific community, for instance through the publication of scientific papers. In order to facilitate user engagement, we also suggest exploring the possibility to give access to the prototype to users outside the consortium. o o More focus on open publications (WP6 Discussion) Initiation of activity to provide access to the prototype (WP5 Discussion) M9 Review Recommendations
34 Exascale/Extreme Computing Computing, Exaflop and Beyond Object Storage Grouping data into user defined objects. No particular notion of grouping in hierarchical trees. Data Centric Computing Computing that depends on and generates lots of data Classical HPC was mainly only about simulations, data was secondary Parallel File systems pnfs Popular paradigm for accessing storage in HPC by parallelizing I/O to storage subsystem A type of parallel file system NVM/NVRAM Non Volatile Memory MPI Popular programming model in HPC, MPI-IO is the IO library Definition of Terms
35 Goal Dissemination, Exploitation & Collaboration Methodology Disseminate SAGE through events, conferences, publications and Talks Exploitation through exploring marketing opportunity and expanding European IP Collaboration with other European and International Projects Seeking influence on Def-Facto standard methods Continued Website Updated Continued Publications Continued Social Media Continued Participation in Key events & Talks Press Releases and Press Coverage Discussion with potential users of SAGE technology WP6: Dissemination, Exploitation..
The SAGE Project: a Storage Centric Approach for Exascale Computing
The SAGE Project: a Storage Centric Approach for Exascale Computing SAI NARASIMHAMURTHY, Seagate Systems UK NIKITA DANILOV, Seagate Systems UK SINING WU, Seagate Systems UK GANESAN UMANESAN, Seagate Systems
More informationI/O Profiling Towards the Exascale
I/O Profiling Towards the Exascale holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden NEXTGenIO & SAGE: Working towards Exascale I/O Barcelona, NEXTGenIO facts Project Research & Innovation
More informationEarly Evaluation of the "Infinite Memory Engine" Burst Buffer Solution
Early Evaluation of the "Infinite Memory Engine" Burst Buffer Solution Wolfram Schenck Faculty of Engineering and Mathematics, Bielefeld University of Applied Sciences, Bielefeld, Germany Salem El Sayed,
More informationNext Generation CEA Computing Centres
Next Generation IO @ CEA Computing Centres J-Ch Lafoucriere ORAP Forum #39 2017-03-28 A long History of Storage Architectures Last Century Compute Systems Few Cray Supercomputers (vectors and MPP) Few
More informationAn Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar
An Exploration into Object Storage for Exascale Supercomputers Raghu Chandrasekar Agenda Introduction Trends and Challenges Design and Implementation of SAROJA Preliminary evaluations Summary and Conclusion
More informationStream Processing for Remote Collaborative Data Analysis
Stream Processing for Remote Collaborative Data Analysis Scott Klasky 146, C. S. Chang 2, Jong Choi 1, Michael Churchill 2, Tahsin Kurc 51, Manish Parashar 3, Alex Sim 7, Matthew Wolf 14, John Wu 7 1 ORNL,
More informationETP4HPC IN A NUTSHELL
ETP4HPC IN A NUTSHELL Building a globally competitive European world-class HPC technology value chain www.etp4hpc.eu office@etp4hpc.eu 13 November 2017 ETP4HPC Event 1 What you should know by the end of
More informationThe EuroHPC strategic initiative
Amsterdam, 12 December 2017 The EuroHPC strategic initiative Thomas Skordas Director, DG CONNECT-C, European Commission The European HPC strategy in Horizon 2020 Infrastructure Capacity of acquiring leadership-class
More informationarxiv: v1 [cs.dc] 1 May 2018
SAGE: Percipient Storage for Exascale Data Centric Computing arxiv:185.556v1 [cs.dc] 1 May 218 Abstract Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan Seagate Systems UK, UK Stefano Markidis,
More informationHPC & Quantum Technologies in Europe
64 th HPC User Forum HPC & Quantum Technologies in Europe Dr Gustav Kalbe Head of Unit High Performance Computing & Quantum Technologies DG CONNECT, European Commission European HPC Strategy & Support
More informationIME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
More informationFUSION PROCESSORS AND HPC
FUSION PROCESSORS AND HPC Chuck Moore AMD Corporate Fellow & Technology Group CTO June 14, 2011 Fusion Processors and HPC Today: Multi-socket x86 CMPs + optional dgpu + high BW memory Fusion APUs (SPFP)
More informationA Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED
A Breakthrough in Non-Volatile Memory Technology & 0 2018 FUJITSU LIMITED IT needs to accelerate time-to-market Situation: End users and applications need instant access to data to progress faster and
More informationHigh Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove
High Performance Data Analytics for Numerical Simulations Bruno Raffin DataMove bruno.raffin@inria.fr April 2016 About this Talk HPC for analyzing the results of large scale parallel numerical simulations
More informationLustre* - Fast Forward to Exascale High Performance Data Division. Eric Barton 18th April, 2013
Lustre* - Fast Forward to Exascale High Performance Data Division Eric Barton 18th April, 2013 DOE Fast Forward IO and Storage Exascale R&D sponsored by 7 leading US national labs Solutions to currently
More informationEIOW Exa-scale I/O workgroup (exascale10)
EIOW Exa-scale I/O workgroup (exascale10) Meghan McClelland Peter Braam Lug 2013 Large scale data management is fundamentally broken but functions somewhat successfully as an awkward patchwork Current
More informationMPI in 2020: Opportunities and Challenges. William Gropp
MPI in 2020: Opportunities and Challenges William Gropp www.cs.illinois.edu/~wgropp MPI and Supercomputing The Message Passing Interface (MPI) has been amazingly successful First released in 1992, it is
More informationNEXTGenIO Performance Tools for In-Memory I/O
NEXTGenIO Performance Tools for In- I/O holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden 22 nd -23 rd March 2017 Credits Intro slides by Adrian Jackson (EPCC) A new hierarchy New non-volatile
More informationEmploying HPC DEEP-EST for HEP Data Analysis. Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN)
Employing HPC DEEP-EST for HEP Data Analysis Viktor Khristenko (CERN, DEEP-EST), Maria Girone (CERN) 1 Outline The DEEP-EST Project Goals and Motivation HEP Data Analysis on HPC with Apache Spark on HPC
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationChina Big Data and HPC Initiatives Overview. Xuanhua Shi
China Big Data and HPC Initiatives Overview Xuanhua Shi Services Computing Technology and System Laboratory Big Data Technology and System Laboratory Cluster and Grid Computing Laboratory Huazhong University
More informationAnalyzing I/O Performance on a NEXTGenIO Class System
Analyzing I/O Performance on a NEXTGenIO Class System holger.brunst@tu-dresden.de ZIH, Technische Universität Dresden LUG17, Indiana University, June 2 nd 2017 NEXTGenIO Fact Sheet Project Research & Innovation
More informationLeveraging Parallelware in MAESTRO and EPEEC
Leveraging Parallelware in MAESTRO and EPEEC and Enhancements to Parallelware Manuel Arenaz manuel.arenaz@appentra.com PRACE booth #2033 Thursday, 15 November 2018 Dallas, US http://www.prace-ri.eu/praceatsc18/
More informationD4.7 Multi-Objective Dynamic Optimizer (b)
H2020 FETHPC-1-2014 An Exascale Programming, Multi-objective Optimisation and Resilience Management Environment Based on Nested Recursive Parallelism Project Number 671603 D4.7 Multi-Objective Dynamic
More informationLeveraging Flash in HPC Systems
Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,
More informationHigh-Performance Computing in Europe: Looking ahead. Dr Panagiotis Tsarchopoulos Future and Emerging Technologies DG CONNECT European Commission
High-Performance Computing in Europe: Looking ahead Dr Panagiotis Tsarchopoulos Future and Emerging Technologies DG CONNECT European Commission 2014-2015 Calls of the HPC cppp 140m funding committed Projects
More informationD6.1 AllScale Computing Infrastructure
H2020 FETHPC-1-2014 An Exascale Programming, Multi-objective Optimisation and Resilience Management Environment Based on Nested Recursive Parallelism Project Number 671603 D6.1 AllScale Computing Infrastructure
More informationAMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016
AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP
More informationPorting Scientific Applications to OpenPOWER
Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel
More informationJULEA: A Flexible Storage Framework for HPC
JULEA: A Flexible Storage Framework for HPC Workshop on Performance and Scalability of Storage Systems Michael Kuhn Research Group Scientific Computing Department of Informatics Universität Hamburg 2017-06-22
More informationECMWF's Next Generation IO for the IFS Model and Product Generation
ECMWF's Next Generation IO for the IFS Model and Product Generation Future workflow adaptations Tiago Quintino, B. Raoult, S. Smart, A. Bonanni, F. Rathgeber, P. Bauer ECMWF tiago.quintino@ecmwf.int ECMWF
More informationEnosis: Bridging the Semantic Gap between
Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation
More informationEU Research for Secure Societies
EU Research for Secure Societies Paolo Salieri European Commission DG Migration Home Affairs Innovation and Industry for Security Rotterdam October 9 th 20182013 Content EU Security Research "Secure Societies
More informationPHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan
PHX: Memory Speed HPC I/O with NVM Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan Node Local Persistent I/O? Node local checkpoint/ restart - Recover from transient failures ( node restart)
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationINtelligent solutions 2ward the Development of Railway Energy and Asset Management Systems in Europe.
INtelligent solutions 2ward the Development of Railway Energy and Asset Management Systems in Europe. Introduction to IN2DREAMS The predicted growth of transport, especially in European railway infrastructures,
More informationNew Business Opportunities Through Evolved OSS/BSS. SEMAFOUR vision on unified Self-Management
New Business Opportunities Through Evolved OSS/BSS SEMAFOUR vision on unified Self-Management Luis M Campoy Radio 12.10.2014 01 From SON functionalities to integrated management 01 Why Self-Management?
More informationUK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.
UK LUG 10 th July 2012 Lustre at Exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Exascale I/O requirements Exascale I/O model 3 Lustre at Exascale - UK LUG 10th July 2012 Exascale I/O
More informationFrom the latency to the throughput age. Prof. Jesús Labarta Director Computer Science Dept (BSC) UPC
From the latency to the throughput age Prof. Jesús Labarta Director Computer Science Dept (BSC) UPC ETP4HPC Post-H2020 HPC Vision Frankfurt, June 24 th 2018 To exascale... and beyond 2 Vision The multicore
More informationEuroHPC and the European HPC Strategy HPC User Forum September 4-6, 2018 Dearborn, Michigan, USA
EuroHPC and the European HPC Strategy HPC User Forum September 4-6, 2018 Dearborn, Michigan, USA Leonardo Flores Añover Senior Expert - HPC and Quantum technologies DG CONNECT European Commission Overall
More informationSciSpark 201. Searching for MCCs
SciSpark 201 Searching for MCCs Agenda for 201: Access your SciSpark & Notebook VM (personal sandbox) Quick recap. of SciSpark Project What is Spark? SciSpark Extensions scitensor: N-dimensional arrays
More informationSMART AND EFFICIENT ENERGY 5G PPP Phase 3 Topics ICT & ICT
SMART AND EFFICIENT ENERGY 5G PPP Phase 3 Topics ICT-17-2018 & ICT-19-2019 Ari SORSANIEMI Future Connectivity Systems DG Communications Networks, Content & Technology (DG CONNECT) Framework: 5G in the
More informationWORK PROGRAMME
WORK PROGRAMME 2014 2015 Topic ICT 9: Tools and Methods for Software Development Michel LACROIX European Commission DG CONNECT Software & Services, Cloud michel.lacroix@ec.europa.eu From FP7 to H2020 Preparation
More informationACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development
ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development Jeremy Fischer Indiana University 9 September 2014 Citation: Fischer, J.L. 2014. ACCI Recommendations on Long Term
More informationHigh Performance Computing in Europe and USA: A Comparison
High Performance Computing in Europe and USA: A Comparison Erich Strohmaier 1 and Hans W. Meuer 2 1 NERSC, Lawrence Berkeley National Laboratory, USA 2 University of Mannheim, Germany 1 Introduction In
More informationR&D to shape the networks and services of the future
R&D to shape the networks and services of the future IEEE-ICC'13 Budapest, 12 th June 2013 Panel: Horizon 2020 Europe's major new collaborative R&D Programme Luis Rodríguez-Roselló European Commission
More informationUCD Centre for Cybersecurity & Cybercrime Investigation
UCD Centre for Cybersecurity & Cybercrime Investigation Formally established in 2006 Assist in the fight against cybercrime Capacity Building with international organisations Extensive global stakeholder
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationArm's role in co-design for the next generation of HPC platforms
Arm's role in co-design for the next generation of HPC platforms Filippo Spiga Software and Large Scale Systems What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationEuroHPC: the European HPC Strategy
HPC User Forum Campus Teratec, Bruyères-le-Châtel (F) 6-7 March 2018 EuroHPC: the European HPC Strategy Leonardo Flores Añover Senior Expert, HPC and Quantum Computing Unit DG CONNECT, European Commission
More informationExploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center
Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating
More informationFujitsu SC 18. Human Centric Innovation Co-creation for Success 2018 FUJITSU
Fujitsu SC 18 Human Centric Innovation Co-creation for Success 2018 FUJITSU A Breakthrough in Non-Volatile Memory Technology & 2018 FUJITSU IT needs to accelerate time-to-market Situation: End users and
More informationH2020 EUB EU-Brazil Research and Development Cooperation in Advanced Cyber Infrastructure. NCP Training Brussels, 18 September 2014
H2020 EUB 2015 EU-Brazil Research and Development Cooperation in Advanced Cyber Infrastructure NCP Training Brussels, 18 September 2014 H2020 EUB 2015 This topic is a major element for the implementation
More informationThe Cray Programming Environment. An Introduction
The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent
More informationHelix Nebula, the Science Cloud
Helix Nebula, the Science Cloud A strategic Plan for a European Scientific Cloud Computing Infrastructure NORDUNet 2012, Oslo 18 th -20 th September Maryline Lengert, ESA Strategic Goal Helix Nebula, the
More informationGiovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France
Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France ERF, Big data & Open data Brussels, 7-8 May 2014 EU-T0, Data
More informationStorage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan
Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality
More informationThe DEEP (and DEEP-ER) projects
The DEEP (and DEEP-ER) projects Estela Suarez - Jülich Supercomputing Centre BDEC for Europe Workshop Barcelona, 28.01.2015 The research leading to these results has received funding from the European
More informationI/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings
Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing
More informationThird annual ITU IMT-2020/5G Workshop and Demo Day 2018
All Sessions Outcome Third annual ITU IMT-2020/5G Workshop and Demo Day 2018 Geneva, Switzerland, 18 July 2018 Session 1: IMT-2020/5G standardization (part 1): activities and future plan in ITU-T SGs 1.
More informationDeliverable D8.4 Certificate Transparency Log v2.0 Production Service
16-11-2017 Certificate Transparency Log v2.0 Production Contractual Date: 31-10-2017 Actual Date: 16-11-2017 Grant Agreement No.: 731122 Work Package/Activity: 8/JRA2 Task Item: Task 6 Nature of Deliverable:
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationHadoop and HDFS Overview. Madhu Ankam
Hadoop and HDFS Overview Madhu Ankam Why Hadoop We are gathering more data than ever Examples of data : Server logs Web logs Financial transactions Analytics Emails and text messages Social media like
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationmos: An Architecture for Extreme Scale Operating Systems
mos: An Architecture for Extreme Scale Operating Systems Robert W. Wisniewski, Todd Inglett, Pardo Keppel, Ravi Murty, Rolf Riesen Presented by: Robert W. Wisniewski Chief Software Architect Extreme Scale
More informationExtreme I/O Scaling with HDF5
Extreme I/O Scaling with HDF5 Quincey Koziol Director of Core Software Development and HPC The HDF Group koziol@hdfgroup.org July 15, 2012 XSEDE 12 - Extreme Scaling Workshop 1 Outline Brief overview of
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationTECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1
TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1 ABSTRACT This introductory white paper provides a technical overview of the new and improved enterprise grade features introduced
More informationApril Copyright 2013 Cloudera Inc. All rights reserved.
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationECMWF s Next Generation IO for the IFS Model
ECMWF s Next Generation IO for the Model Part of ECMWF s Scalability Programme Tiago Quintino, B. Raoult, P. Bauer ECMWF tiago.quintino@ecmwf.int ECMWF January 14, 2016 ECMWF s HPC Targets What do we do?
More informationToward a Memory-centric Architecture
Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains
More informationUsers and utilization of CERIT-SC infrastructure
Users and utilization of CERIT-SC infrastructure Equipment CERIT-SC is an integral part of the national e-infrastructure operated by CESNET, and it leverages many of its services (e.g. management of user
More informationDT-ICT : Big data solutions for energy
DT-ICT-11-2019: Big data solutions for energy info day Stefano Bertolo, DG CONNECT Mario Dionisio, DG ENER Scientific Programme Officers Who we are DG CONNECT, Unit G1 Data Policy and Innovation DG ENERGY,
More informationBig Data Value cppp Big Data Value Association Big Data Value ecosystem
Big Data Value cppp Big Data Value Association Big Data Value ecosystem Laure Le Bars, SAP, BDVA President and BDVe lead Nuria de Lama, ATOS, BDVA Deputy Secretary General, BDVe co-lead Ana García Robles,
More informationA New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.
A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. 1 Agenda Introduction Background and Motivation Hybrid Key-Value Data Store Architecture Overview Design details Performance
More informationIntroduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationExploiting Weather & Climate Data at Scale (WP4)
Exploiting Weather & Climate Data at Scale (WP4) Julian Kunkel 1 Bryan N. Lawrence 2,3 Jakob Luettgau 1 Neil Massey 4 Alessandro Danca 5 Sandro Fiore 5 Huang Hu 6 1 German Climate Computing Center (DKRZ)
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationWelcome to the 2017 Charm++ Workshop!
Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign 2017
More informationGOING ARM A CODE PERSPECTIVE
GOING ARM A CODE PERSPECTIVE ISC18 Guillaume Colin de Verdière JUNE 2018 GCdV PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France June 2018 A history of disruptions All dates are installation dates of the machines
More informationRuntime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-Parallel Programs
Runtime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-Parallel Programs Kai Wu Jie Ren University of California, Merced PASA Lab Dong Li SC 18 1 Non-volatile Memory is Promising
More informationIBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage
IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage Silverton Consulting, Inc. StorInt Briefing 2017 SILVERTON CONSULTING, INC. ALL RIGHTS RESERVED Page 2 Introduction Unstructured data has
More informationJoachim Biercamp Deutsches Klimarechenzentrum (DKRZ) With input from Peter Bauer, Reinhard Budich, Sylvie Joussaume, Bryan Lawrence.
Joachim Biercamp Deutsches Klimarechenzentrum (DKRZ) With input from Peter Bauer, Reinhard Budich, Sylvie Joussaume, Bryan Lawrence. The ESiWACE project has received funding from the European Union s Horizon
More informationThe Google File System. Alexandru Costan
1 The Google File System Alexandru Costan Actions on Big Data 2 Storage Analysis Acquisition Handling the data stream Data structured unstructured semi-structured Results Transactions Outline File systems
More informationBUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.
BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST 1 UNSTRUCTURED DATA GROWTH 75% 78% 80% 2015 71 EB 2016 106 EB 2017 133 EB Total Capacity Shipped, Worldwide % of Unstructured Data
More informationDELIVERABLE. D3.1 - TransformingTransport Website. TT Project Title. Project Acronym
Ref. Ares(2017)844805-15/02/2017 DELIVERABLE D3.1 - TransformingTransport Website Project Acronym TT Project Title Transforming Transport Grant Agreement number 731932 Call and topic identifier ICT-15-2016-2017
More informationNext-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data
Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data 46 Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data
More informationStreaming Analytics with Apache Flink. Stephan
Streaming Analytics with Apache Flink Stephan Ewen @stephanewen Apache Flink Stack Libraries DataStream API Stream Processing DataSet API Batch Processing Runtime Distributed Streaming Data Flow Streaming
More informationOpen Data Standards for Administrative Data Processing
University of Pennsylvania ScholarlyCommons 2018 ADRF Network Research Conference Presentations ADRF Network Research Conference Presentations 11-2018 Open Data Standards for Administrative Data Processing
More informationHPC IN EUROPE. Organisation of public HPC resources
HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC resources provided primarily to enable scientific research and development at European universities and other publicly-funded
More informationCoordinating Parallel HSM in Object-based Cluster Filesystems
Coordinating Parallel HSM in Object-based Cluster Filesystems Dingshan He, Xianbo Zhang, David Du University of Minnesota Gary Grider Los Alamos National Lab Agenda Motivations Parallel archiving/retrieving
More informationCREATING SMART TRANSPORT SERVICES BY FACILITATING THE RE-USE OF OPEN GIS DATA
OPEN TRANSPORT NET TOMAS MILDORF 16 JUNE 2014 INSPIRE CONFERENCE 2014, AALBORG, DENMARK CREATING SMART TRANSPORT SERVICES BY FACILITATING THE RE-USE OF OPEN GIS DATA 2 1 OTN AT A GLANCE Full title OpenTransportNet
More informationHDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
HDFS Architecture Gregory Kesden, CSE-291 (Storage Systems) Fall 2017 Based Upon: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoopproject-dist/hadoop-hdfs/hdfsdesign.html Assumptions At scale, hardware
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationHPC Storage Use Cases & Future Trends
Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationLustre Beyond HPC. Presented to the Lustre* User Group Beijing October 2013
Lustre Beyond HPC Presented to the Lustre* User Group Beijing October 2013 Brent Gorda General Manager High Performance Data Division, Intel Corpora:on Agenda From Whamcloud to Intel Today s Storage Challenges
More information