Accelerating Genome Assembly with Power8
|
|
- Deirdre Brown
- 5 years ago
- Views:
Transcription
1 Accelerating enome Assembly with Power8 Seung-Jong Park, Ph.D. School of EECS, CCT, Louisiana State University Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit
2 Agenda The enome Assembly Problem Accelerating raph Construction with POWER8 Accelerating raph Simplification with CAPI Flash 2
3 The enome Assembly Problem 3
4 Challenges for enome Assemblers NS Technologies Outpaced Moore s Law Software with Extreme Scalability HPC Platform More Compute Cycles Extreme I/O Performance Huge Storage Space NS enome Reads (TBs) HPC 4 Data and Compute Intensive Reconstructed enome (MBs/Bs)
5 MapReduce-based raph Construction 5 TATCA CT CTTTAAT C TACTTTA Map TTTAAACA ATCCATA TATCA CT Map TTTA: TAT:C TTA:A TAA: TCC: A TA: N TCA: AA: A AAC:A ACA: N ATCC: CCA: T CAT: ATA: ATC: CA: AC: T ATC: C AA: C ACA: AT:A TC: A A: C CT: N CT: T TC: A A: C CT: N CTT:T ATC: N A: C CT: T CTT:T ATC: CA: AC: T CTTT:A AAT:C AC: T CTTT:A TAT:C TA: TCA: TTTA: TTA:A TAA:T TTTA: TTA:N Reduce Reduce Reduce TAA:,T TAT:C TCC:A TCA: TA: TTA:A TTTA: ACA:N AAC:A AA:A AAT:C AC:T ATC: ATCC: ATA: CCA:T CA: CAT: CTTT:A ACA: AA:C A:C ATC:C AT:A CTT:T CT:T TC:A
6 Accelerating raph Construction with POWER8 6
7 Experimental Test Beds System Type IBM PKY Cluster LSU SuperMikeII Processor Two 10-core IBM Power8 Two 8-core Intel SandyBridge Xeon Maximum #Nodes used in various experiments #Physical cores/node 20 (8 Simultaneous Multi-Thread) 16 (Hyper threading disabled) #vcores/node RAM/node (B) #Disks/node 5 3 #Disks/node used for shuffled data 3 1 Total Storage space/node used for shuffled data Network 56bps InfiniBand (non-blocking) 40bps InfiniBand (2:1 blockings) 7
8 Datasets enome data set Input size Shuffle data size Rice genome 12B 70B 50B Bumble bee genome 90B 600B 95B Output size Metagenome 3.2TB 20TB 8.6TB 8
9 Hadoop Configurations Hadoop Parameters IBM Power8 SuperMikeII Yarn.nodemanager.cpu.resource.vcore Yarn.nodemanager.memory.mb Mapreduce.map/reduce.cpu.vcore 4 2 Mapreduce.map/reduce.memory.mb Mapreduce.map/reduce.java.opts 6500m 3000m 9
10 Hadoop Scalability with POWER8 SMTs Tested with small size rice genome data on 2 node Almost linear scalability with increasing SMTs 10
11 Rice enome Analyzing small size (12B) data Eliminate the impact of network and disk I/O 7.5X performance improvement per server 11
12 Bumble Bee enome Analyzing Medium size (90B) Bumble Bee genome 7.5x improvement in terms of Performance/server 12
13 Metagenome Analyzing huge (3.2TB) metagenome data Only 6.5 hours on 40-node IBM Power8 cluster More than 9x improvement in terms of performance per server 13
14 raph Simplification with Distributed NoSQL TAA:,T TAT:C ACA:N AAC:A ACA: AA:C CCA:T CA: TCC:A TCA: TA: AA:A AAT:C AC:T A:C ATC:C AT:A CAT: CTTT:A TTA:A ATC: CTT:T TTTA: ATCC: CT:T ATA: TC:A TATCA ACTTTAA 14
15 Accelerating Simplification with IBM CAPI Flash NoSQL I/O Throughput (keys/sec) CAPI Flash I/O Throughput (bytes/sec) Only 20 Power8 Cores + CAPI : 500B raph traversal in 7.5 Hrs 15
16 Computational Challenges The Next Step raph building is the most expensive phase in terms of time and resources The Obvious Solutions: Either use a single machine with LOTS of memory, or run on a cluster. Idea: Use CAPI accelerated flash instead of main memory 16
17 raph Construction on IBM CAPI Flash 17 TATCACT CTTTAATC TACTTTA Map TTTAAACA ATCCATA TATCACT ATC:C AA:C ACA: AT:A TC:A A:C CT:N CT:T TC:A A:C CT:N CTT:T ATC:N A:C CT:T CTT:T AA:A AAC:A ACA:N ATCC: CCA:T CAT: ATA: ATC: CA: AC:T ATC: CA: AC:T CTTT:A AAT:C AC:T CTTT:A TTTA: TAT:C TTA:A TAA: TCC:A TA:N TCA: TAT:C TA: TCA: TTTA: TTA:A TAA:T TTTA: TTA:N Sort ATC:C AA:C ACA: AT:A TC:A A:C CT:N CT:T TC:A A:C CT:N CTT:T ATC:N A:C CT:T CTT:T ACA: AA:C A:C ATC:C AT:A CTT:T CT:T TC:A AA:A AAC:A ACA:N ATCC: CCA:T CAT: ATA: ATC: CA: AC:T ATC: CA: AC:T CTTT:A AAT:C AC:T CTTT:A TTTA: TAT:C TTA:A TAA: TCC:A TA:N TCA: TAT:C TA: TCA: TTTA: TTA:A TAA:T TTTA: TTA:N Sort Sort ACA:N AAC:A AA:A AAT:C AC:T ATC: ATCC: ATA: CCA:T CA: CAT: CTTT:A TAA:,T TAT:C TCC:A TCA: TA: TTA:A TTTA: NoSQL data engine APIs
18 Initial Results of raph Construction Compared 85B bumblebee dataset on 8-node Hadoop cluster vs. a single node with CAPI-accelerated flash. Hadoop Cluster (20 physical cores per node) Peak memory usage of 60B per datanode 1 HDD per datanode 1 hr 56 mins CAPI Accelerated Flash server (20 physical cores) Peak memory usage of 7 B 1 HDD and 1 CAPI card 3 hrs 44 mins Peak memory usage reduced by 60 times. Execution time reduced by 3.5 times per node. 18
IBM POWER8 HPC System Accelerates Genomics Analysis with SMT8 Multithreading.
IBM Systems Group, Louisiana State University, CCT Dynamic White Paper for Louisiana State University collaboration with IBM. November 10, 2015 Highlights Results of Hadoop enabled Genome Analysis on an
More informationIBM POWER8 HPC System Accelerates Genomics Analysis with SMT8 Multithreading
IBM Systems Group, Louisiana State University, CCT Dynamic White Paper for Louisiana State University collaboration with IBM. Highlights Comparison of De Bruijn graph construction on x86 and POWER8-processor
More informationRevolutionizing the Datacenter Join the Conversation #OpenPOWERSummit
Redis Labs on POWER8 Server: The Promise of OpenPOWER Value Jeffrey L. Leeds, Ph.D. Vice President, Alliances & Channels Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Who We Are
More informationGenome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner
Genome Reconstruction: A Puzzle with a Billion Pieces Phillip E. C. Compeau and Pavel A. Pevzner Outline I. Problem II. Two Historical Detours III.Example IV.The Mathematics of DNA Sequencing V.Complications
More informationComputational Architecture of Cloud Environments Michael Schatz. April 1, 2010 NHGRI Cloud Computing Workshop
Computational Architecture of Cloud Environments Michael Schatz April 1, 2010 NHGRI Cloud Computing Workshop Cloud Architecture Computation Input Output Nebulous question: Cloud computing = Utility computing
More informationSolutions Exercise Set 3 Author: Charmi Panchal
Solutions Exercise Set 3 Author: Charmi Panchal Problem 1: Suppose we have following fragments: f1 = ATCCTTAACCCC f2 = TTAACTCA f3 = TTAATACTCCC f4 = ATCTTTC f5 = CACTCCCACACA f6 = CACAATCCTTAACCC f7 =
More informationby the Genevestigator program (www.genevestigator.com). Darker blue color indicates higher gene expression.
Figure S1. Tissue-specific expression profile of the genes that were screened through the RHEPatmatch and root-specific microarray filters. The gene expression profile (heat map) was drawn by the Genevestigator
More informationDetecting Superbubbles in Assembly Graphs. Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)!
Detecting Superbubbles in Assembly Graphs Taku Onodera (U. Tokyo)! Kunihiko Sadakane (NII)! Tetsuo Shibuya (U. Tokyo)! de Bruijn Graph-based Assembly Reads (substrings of original DNA sequence) de Bruijn
More informationMind the Gap: Large-Scale Frequent Sequence Mining
Mind the Gap: Large-Scale Frequent Sequence Mining Iris Miliaraki Klaus Berberich Rainer Gemulla Spyros Zoupanos Max Planck Institute for Informatics Saarbrücken, Germany SIGMOD 2013 27 th June 2013, New
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationScalable Solutions for DNA Sequence Analysis
Scalable Solutions for DNA Sequence Analysis Michael Schatz Dec 4, 2009 JHU/UMD Joint Sequencing Meeting The Evolution of DNA Sequencing Year Genome Technology Cost 2001 Venter et al. Sanger (ABI) $300,000,000
More informationData Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009
Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009 Presenter s Name Simon CW See Title & and Division HPC Cloud Computing Sun Microsystems Technology Center Sun Microsystems,
More informationPhilippe Thierry Sr Staff Engineer Intel Corp.
HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationEvolving To The Big Data Warehouse
Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from
More informationLBRN - HPC systems : CCT, LSU
LBRN - HPC systems : CCT, LSU HPC systems @ CCT & LSU LSU HPC Philip SuperMike-II SuperMIC LONI HPC Eric Qeenbee2 CCT HPC Delta LSU HPC Philip 3 Compute 32 Compute Two 2.93 GHz Quad Core Nehalem Xeon 64-bit
More informationInterconnect Your Future
#OpenPOWERSummit Interconnect Your Future Scot Schultz, Director HPC / Technical Computing Mellanox Technologies OpenPOWER Summit, San Jose CA March 2015 One-Generation Lead over the Competition Mellanox
More informationProcessing Genomics Data: High Performance Computing meets Big Data. Jan Fostier
Processing Genomics Data: High Performance Computing meets Big Data Jan Fostier Traditional HPC way of doing things Communication network (Infiniband) Lots of communication c c c c c Lots of computations
More informationAccelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationOpenSolaris and the Direction of Future Operating Systems
OpenSolaris and the Direction of Future Operating Systems James Hughes Sun Fellow Solaris Chief Technologist LISA'08 November 2008 San Diego, CA Agenda Operating System Trends Computer / OS architecture
More informationKubernetes for Stateful Workloads Benchmarks
Kubernetes for Stateful Workloads Benchmarks Baremetal Like Performance for For Big Data, Databases And AI/ML Executive Summary Customers are actively evaluating stateful workloads for containerization
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationS8765 Performance Optimization for Deep- Learning on the Latest POWER Systems
S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to
More informationIsilon Performance. Name
1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationOptimizing Server Designs for Speed
Optimizing Server Designs for Speed Optimizing Server Designs for Speed We will discuss the latest in server hardware, virtualization, and disk storage that boosts Skyward s performance. We will also discuss
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationPERFORMANCE STUDY OCTOBER 2017 ORACLE MONSTER VIRTUAL MACHINE PERFORMANCE. VMware vsphere 6.5
PERFORMANCE STUDY OCTOBER 2017 ORACLE MONSTER VIRTUAL MACHINE PERFORMANCE VMware vsphere 6.5 Table of Contents Executive Summary...3 Introduction...3 Test Environment... 4 Test Workload... 5 Virtual Machine
More informationApache Spark Graph Performance with Memory1. February Page 1 of 13
Apache Spark Graph Performance with Memory1 February 2017 Page 1 of 13 Abstract Apache Spark is a powerful open source distributed computing platform focused on high speed, large scale data processing
More informationApril Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.
1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map
More informationPyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds
Pyramidal and Chiral Groupings of Gold Nanocrystals Assembled Using DNA Scaffolds February 27, 2009 Alexander Mastroianni, Shelley Claridge, A. Paul Alivisatos Department of Chemistry, University of California,
More informationSuperMike-II Launch Workshop. System Overview and Allocations
: System Overview and Allocations Dr Jim Lupo CCT Computational Enablement jalupo@cct.lsu.edu SuperMike-II: Serious Heterogeneous Computing Power System Hardware SuperMike provides 442 nodes, 221TB of
More informationBlueDBM: An Appliance for Big Data Analytics*
BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More information3D NAND Technology Scaling helps accelerate AI growth
3D NAND Technology Scaling helps accelerate AI growth Jung Yoon, Ranjana Godse IBM Supply Chain Engineering Andrew Walls IBM Flash Systems August 2018 1 Agenda 3D-NAND Scaling & AI Flash density trend
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationThe Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center
The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationSUPPLEMENTARY INFORMATION. Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells
SUPPLEMENTARY INFORMATION Systematic evaluation of CRISPR-Cas systems reveals design principles for genome editing in human cells Yuanming Wang 1,2,7, Kaiwen Ivy Liu 2,7, Norfala-Aliah Binte Sutrisnoh
More informationSparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics
SparkBench: A Comprehensive Spark Benchmarking Suite Characterizing In-memory Data Analytics Min LI,, Jian Tan, Yandong Wang, Li Zhang, Valentina Salapura, Alan Bivens IBM TJ Watson Research Center * A
More informationService Oriented Performance Analysis
Service Oriented Performance Analysis Da Qi Ren and Masood Mortazavi US R&D Center Santa Clara, CA, USA www.huawei.com Performance Model for Service in Data Center and Cloud 1. Service Oriented (end to
More informationПочему IBM POWER8 оптимальная платформа для PostgreSQL
Почему IBM POWER8 оптимальная платформа для PostgreSQL Иван Гончаров Технический специалист igoncharov@ru.ibm.com What server should I choose for PG? 2 Old-fashioned approach Slides borrowed from Bruce
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationMixing and matching virtual and physical HPC clusters. Paolo Anedda
Mixing and matching virtual and physical HPC clusters Paolo Anedda paolo.anedda@crs4.it HPC 2010 - Cetraro 22/06/2010 1 Outline Introduction Scalability Issues System architecture Conclusions & Future
More informationHADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!
HADOOP 3.0 is here! Dr. Sandeep Deshmukh sandeep@sadepach.com Sadepach Labs Pvt. Ltd. - Let us grow together! About me BE from VNIT Nagpur, MTech+PhD from IIT Bombay Worked with Persistent Systems - Life
More informationPerformance Pack. Benchmarking with PlanetPress Connect and PReS Connect
Performance Pack Benchmarking with PlanetPress Connect and PReS Connect Contents 2 Introduction 4 Benchmarking results 5 First scenario: Print production on demand 5 Throughput vs. Output Speed 6 Second
More informationChina Big Data and HPC Initiatives Overview. Xuanhua Shi
China Big Data and HPC Initiatives Overview Xuanhua Shi Services Computing Technology and System Laboratory Big Data Technology and System Laboratory Cluster and Grid Computing Laboratory Huazhong University
More informationAccelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card
Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database
More informationThe Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases
The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases Gurmeet Goindi Principal Product Manager Oracle Flash Memory Summit 2013 Santa Clara, CA 1 Agenda Relational Database
More informationImproved Solutions for I/O Provisioning and Application Acceleration
1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer
More informationCSE 124: Networked Services Lecture-17
Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationwarm-up exercise Representing Data Digitally goals for today proteins example from nature
Representing Data Digitally Anne Condon September 6, 007 warm-up exercise pick two examples of in your everyday life* in what media are the is represented? is the converted from one representation to another,
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationWhen MPPDB Meets GPU:
When MPPDB Meets GPU: An Extendible Framework for Acceleration Laura Chen, Le Cai, Yongyan Wang Background: Heterogeneous Computing Hardware Trend stops growing with Moore s Law Fast development of GPU
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationBoost Performance and Extend NAS Life
Boost Performance and Extend NAS Life Doug Rainbolt Vice President of Marketing Alacritech, Inc. Santa Clara, CA August 2012 1 Agenda Spring 2012 Alacritech Confidential & Proprietary All Rights Reserved
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationDistributed Systems. CS422/522 Lecture17 17 November 2014
Distributed Systems CS422/522 Lecture17 17 November 2014 Lecture Outline Introduction Hadoop Chord What s a distributed system? What s a distributed system? A distributed system is a collection of loosely
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System
More informationA Distributed Hash Table for Shared Memory
A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationDecentralized Distributed Storage System for Big Data
Decentralized Distributed Storage System for Big Presenter: Wei Xie -Intensive Scalable Computing Laboratory(DISCL) Computer Science Department Texas Tech University Outline Trends in Big and Cloud Storage
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationLessons from Post-processing Climate Data on Modern Flash-based HPC Systems
Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems Adnan Haider 1, Sheri Mickelson 2, John Dennis 2 1 Illinois Institute of Technology, USA; 2 National Center of Atmospheric Research,
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing Instructors: Nicholas Weaver & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/ Coherency Tracked by
More informationLeveraging Flash in Scalable Environments: A Systems Perspective on How FLASH Storage is Displacing Disk Storage
Leveraging Flash in Scalable Environments: A Systems Perspective on How FLASH Storage is Displacing Disk Storage Roark Hilomen, Engineering Fellow Systems & Software Solutions May 3, 2016 Forward-Looking
More informationA Network-aware Scheduler in Data-parallel Clusters for High Performance
A Network-aware Scheduler in Data-parallel Clusters for High Performance Zhuozhao Li, Haiying Shen and Ankur Sarker Department of Computer Science University of Virginia May, 2018 1/61 Data-parallel clusters
More informationGenome Reconstruction: A Puzzle with a Billion Pieces. Phillip Compeau Carnegie Mellon University Computational Biology Department
http://cbd.cmu.edu Genome Reconstruction: A Puzzle with a Billion Pieces Phillip Compeau Carnegie Mellon University Computational Biology Department Eternity II: The Highest-Stakes Puzzle in History Courtesy:
More informationA ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC
A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2 Software
More informationOptimizing Parallel Access to the BaBar Database System Using CORBA Servers
SLAC-PUB-9176 September 2001 Optimizing Parallel Access to the BaBar Database System Using CORBA Servers Jacek Becla 1, Igor Gaponenko 2 1 Stanford Linear Accelerator Center Stanford University, Stanford,
More informationA Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS
A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS Adithya Bhat, Nusrat Islam, Xiaoyi Lu, Md. Wasi- ur- Rahman, Dip: Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng
More informationAxxonSoft. The Axxon Smart. Software Package. Recommended platforms. Version 1.0.4
AxxonSoft The Axxon Smart Software Package Recommended platforms Version 1.0.4 Moscow 2010 1 Contents 1 Recommended hardware platforms for Server and Client... 3 2 Size of disk subsystem... 4 3 Supported
More informationCloud Computing Paradigms for Pleasingly Parallel Biomedical Applications
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics, Pervasive Technology Institute Indiana University
More informationHIGH PERFORMANCE COMPUTING FROM SUN
HIGH PERFORMANCE COMPUTING FROM SUN Update for IDC HPC User Forum, Norfolk, VA April 2008 Bjorn Andersson Director, HPC and Integrated Systems Sun Microsystems Sun Constellation System Integrating the
More informationDisruptive Forces Affecting the Future
Michel Bakker Disruptive Forces Affecting the Future proof to the POWER8 architecture leadership What new innovation? Can t you see I m too busy? Semiconductor Scaling: No More Moore 2016:
More informationAccelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin
Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most
More informationSEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers
1 SEMem: deployment of MPI-based in-memory storage for Hadoop on supercomputers and Shigeru Chiba The University of Tokyo, Japan 2 Running Hadoop on modern supercomputers Hadoop assumes every compute node
More informationIntel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage
Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John
More informationAdapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES
Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES Tom Atwood Business Development Manager Sun Microsystems, Inc. Takeaways Understand the technical differences between
More informationTHE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9
PRODUCT CATALOG THE SUMMARY CLUSTER SERIES - pg. 3 ULTRA SERIES - pg. 5 EXTREME SERIES - pg. 9 CLUSTER SERIES THE HIGH DENSITY STORAGE FOR ARCHIVE AND BACKUP When downtime is not an option Downtime is
More informationAppendix A. Example code output. Chapter 1. Chapter 3
Appendix A Example code output This is a compilation of output from selected examples. Some of these examples requires exernal input from e.g. STDIN, for such examples the interaction with the program
More informationGridGain and Apache Ignite In-Memory Performance with Durability of Disk
GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More informationAccelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators
WHITE PAPER Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationIBM Power AC922 Server
IBM Power AC922 Server The Best Server for Enterprise AI Highlights More accuracy - GPUs access system RAM for larger models Faster insights - significant deep learning speedups Rapid deployment - integrated
More informationRAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
More informationParallel File Systems. John White Lawrence Berkeley National Lab
Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation
More informationArchitectures for Scalable Media Object Search
Architectures for Scalable Media Object Search Dennis Sng Deputy Director & Principal Scientist NVIDIA GPU Technology Workshop 10 July 2014 ROSE LAB OVERVIEW 2 Large Database of Media Objects Next- Generation
More informationAccelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet
WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More information