Performance of Trinity RNA-seq de novo assembly on an IBM POWER8 processor-based system

Similar documents
IBM Power Systems solution for SugarCRM

Jeremy Canady. IBM Systems and Technology Group ISV Enablement March 2013

Storwize V7000 real-time compressed volumes with Symantec Veritas Storage Foundation

Using IBM Flex System Manager for efficient VMware vsphere 5.1 resource deployment

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

jetnexus ALB-X on IBM BladeCenter

IBM and Lawson M3 (an Infor affiliate) ERP software workload optimization on the new IBM PureFlex System

IBM System Storage SAN Volume Controller IBM Easy Tier enhancements in release

Introduction to IBM System Storage SVC 2145-DH8 and IBM Storwize V7000 model 524

... IBM Advanced Technical Skills IBM Oracle International Competency Center September 2013

... WebSphere 6.1 and WebSphere 6.0 performance with Oracle s JD Edwards EnterpriseOne 8.12 on IBM Power Systems with IBM i

IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform

SAS workload performance improvements with IBM XIV Storage System Gen3

Infor M3 on IBM POWER7+ and using Solid State Drives

Infor Lawson on IBM i 7.1 and IBM POWER7+

Brendan Lelieveld-Amiro, Director of Product Development StorageQuest Inc. December 2012

Lawson M3 7.1 Large User Scaling on System i

... Performance benefits of POWER6 processors and IBM i 6.1 for Oracle s JD Edwards EnterpriseOne A performance case study for the Donaldson Company

... Oracle Database 11g and 12c on IBM Power E870 and E880. Tips and considerations

IBM System Storage DS8870 Release R7.3 Performance Update

IBM Active Cloud Engine centralized data protection

... IBM Power Systems with IBM i single core server tuning guide for JD Edwards EnterpriseOne

Oracle s JD Edwards EnterpriseOne IBM POWER7 performance characterization

... Tuning AIX for Oracle Hyperion and Essbase Products Support documentation for Oracle Service.

IBM System Storage IBM :

IBM System Storage SAN Volume Controller IBM Easy Tier in release

V6R1 System i Navigator: What s New

Deploying FC and FCoE SAN with IBM System Storage SVC and IBM Storwize platforms

... IBM AIX performance and tuning tips for Oracle s JD Edwards EnterpriseOne web server

Best practices for IBM ILOG CPLEX Optimizer on IBM POWER7 and AIX 7.1

Computing as a Service

iseries Tech Talk Linux on iseries Technical Update 2004

How Smarter Systems Deliver Smarter Economics and Optimized Business Continuity

Getting Started What?? Plan of Action Features and Function Short demo

Benefits of the IBM Storwize V7000 Real-time Compression feature with VMware vsphere 5.5

IBM SONAS with VMware vsphere 5: Bigger, better, and faster!

IBM PowerKVM available with the Linux only scale-out servers IBM Redbooks Solution Guide

11/8/2017 Trinity De novo Transcriptome Assembly Workshop trinityrnaseq/rnaseq_trinity_tuxedo_workshop Wiki GitHub

... HTTP load balancing for Oracle s JD Edwards EnterpriseOne HTML servers using WebSphere Application Server Express Edition

A Pragmatic Path to Compliance. Jaffa Law

IBM Data Center Networking in Support of Dynamic Infrastructure

Behind the Glitz - Is Life Better on Another Database Platform?

Implementing disaster recovery solution using IBM SAN Volume Controller stretched cluster and VMware Site Recovery Manager

Active Energy Manager. Image Management. TPMfOSD BOFM. Automation Status Virtualization Discovery

IBM Application Runtime Expert for i

Best practices. Linux system tuning for heavilyloaded. IBM Platform Symphony

Hardware Cryptography and z/tpf

Enterprise file sync and share using Citrix ShareFile and IBM Storwize V7000 Unified system

Continuous Availability with the IBM DB2 purescale Feature IBM Redbooks Solution Guide

Best practices. Starting and stopping IBM Platform Symphony Developer Edition on a two-host Microsoft Windows cluster. IBM Platform Symphony

1 Revisions. Storage Layout, DB, and OS performance tuning guideline for SAP - V4.4. IBM System Storage layout & performance guideline for SAP

Server for IBM i. Dawn May Presentation created by Tim Rowe, 2008 IBM Corporation

Performance analysis of parallel de novo genome assembly in shared memory system

Graphical debugging makes procedural SQL debugging on IBM i even easier

TPF Users Group Fall 2008 Title: z/tpf Support for OpenLDAP

WebSphere Application Server Base Performance

IBM System p5 550 and 550Q Express servers

TPF Debugger / Toolkit update PUT 12 contributions!

IBM Platform LSF. Best Practices. IBM Platform LSF and IBM GPFS in Large Clusters. Jin Ma Platform LSF Developer IBM Canada

IBM _` p5 570 servers

p5 520 server Robust entry system designed for the on demand world Highlights

Featuring: Call Hierarchy and Program Structure diagrams,

IBM Geographically Dispersed Resiliency for Power Systems. Version Release Notes IBM

IBM SmartCloud Desktop Infrastructure with VMware View Reference architecture. 12 December 2012

IBM Power Systems Performance Report. POWER9, POWER8 and POWER7 Results

TPFUG JavaScript Object Notation (JSON)

... Characterizing IBM Power Systems POWER7+ and Solid State Drive Performance with Oracle s JD Edwards EnterpriseOne

IBM Client Center z/vm 6.2 Single System Image (SSI) & Life Guest Relocation (LGR) DEMO

Chris Filachek Database/TPFDF Subcommittee. AIM Enterprise Platform Software IBM z/transaction Processing Facility Enterprise Edition 1.1.

Best practices. Reducing concurrent SIM connection requests to SSM for Windows IBM Platform Symphony

IBM Power Systems Performance Capabilities Reference

IBM System p5 185 Express Server

KVM for IBM z Systems

Subex Fraud Management System version 8 on the IBM PureFlex System

z/tpf Enhanced HTTP Client and High Speed Connector Enhancements

TPF Users Group Code Coverage in TPF Toolkit

Your Roadmap to POWER9: Migration Scenarios

Data: ftp://ftp.broad.mit.edu/pub/users/bhaas/rnaseq_workshop/rnaseq_workshop_dat a.tgz. Software:

IBM Storage Tier Advisor Tool with IBM Easy Tier

Netcool/Impact Version Release Notes GI

Utility Capacity on Demand: What Utility CoD Is and How to Use It

Flagship UNIX and Linux server designed to drive business. innovation. Highlights

IBM BigInsights Security Implementation: Part 1 Introduction to Security Architecture

Manual of SOAPdenovo-Trans-v1.03. Yinlong Xie, Gengxiong Wu, Jingbo Tang,

IBM POWER7 Systems Express Blades Quick Reference Guide November 2011

HMC and System Firmware

IBM Platform HPC V3.2:

New Data Reduction Tool

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

IBM tape libraries help Arkivum make the difference

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

The Value of Using Name-Value Pairs

IBM High IOPS SSD PCIe Adapters IBM System x at-a-glance guide

K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity

IBM Power AC922 Server

IBM System p5 510 and 510Q Express Servers

Implementing IBM Easy Tier with IBM Real-time Compression IBM Redbooks Solution Guide

IBM System Storage SAN Volume Controller Enhanced Stretched Cluster

Thomas Petrolino IBM Poughkeepsie Session 17696

Mary Komor Development Tools Subcommittee

Transcription:

Performance of Trinity RNA-seq de novo assembly on an IBM POWER8 processor-based system Ruzhu Chen and Mark Nellen IBM Systems and Technology Group ISV Enablement August 2014 Copyright IBM Corporation, 2014

Table of contents Abstract...1 Introduction...1 Trinity on IBM Power Systems... 1 IBM POWER8 processor-based system for NGS analysis... 2 Benchmark configurations...3 Benchmark results...3 Performance comparison of RNA-seq de novo assembly using the standard version of Trinity... 3 Performance of RNA-seq de novo assembly using tuned Trinity... 4 Performance of reference assembly using huge page memory and system with maximum processor frequency options... 5 Benchmark comparisons using GCC and IBM XLC compilers... 6 Acknowledgement...6 Resources...7 About the authors...7 Trademarks and special notices...8

Abstract Trinity is a RNA-seq de novo assembly application that consists of three programs (Inchworm, Chrysalis, and Butterfly) performing three different tasks. When compared to other RNA-seq assembly programs such as Trans-Abyss, Velvet, and SOAPdenovo, it performed better than these in recovering full length genes or transcriptomes. It produces high quality transcriptome assemblies but is considered to be a time-consuming program. The benchmarks on IBM POWER8 processor-based systems showed performance that is more than two times faster than IBM System x3850 X5 and 40% to 50% faster than IBM POWER7+ in all test cases. The intend of this paper is to evaluate the performance of POWER8 processor-based system in next generation sequencing (NGS) analysis. IBM XLC built programs showed better performance than that of GCC built program. Trinity benchmarks under maximum processor frequency give 5% performance improvement. Proper I/O configuration and mount options reduced I/O time and improved Trinity performance in larger test cases. Introduction High throughput next generation sequencing (NGS) technology is evolving into tools for genome and gene expression analysis that generates large volume of genome sequence data. This prompts us to seek computing resource of faster and multiple processors, large memory and storage systems to assemble and map sequences obtained from NGS machines, as well as reliable NGS software tools. Trinity on IBM Power Systems With the RNA sequences (RNA-seq) data generated rapidly by NGS technology, researchers and clinicians are in severe need of fully optimized NGS analyzing and data managing software for sequence mapping/alignment, assembly, SNPs/InDels identification and downstream analysis (Buckingham, 2010). Trinity is such a tool for RNA-seq assembly and analysis available since 2011 and quickly became a popular program (Refer to Table 1). Trinity components Inchworm Chrysalis Butterfly Assembles the RNA-seq data into the unique full length contigs of transcripts and then reports just the unique portions of alternatively spliced transcripts. Clusters Inchworm contigs into groups and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene Processes the individual graphs in parallel: trace reads paths, reports full length spliced isoforms, and teases apart transcripts that correspond to paralogous genes. Table 1: Overview of the Trinity components (M.G. Grabherr and others, 2011) 1

Trinity and its plug-in source codes are open sources and you can download them from the following references: Trinity-r2013-02-25: http://trinityrnaseq.sourceforge.net/ bowtie-1.0.0: http://sourceforge.net/projects/bowtie-bio/files/bowtie/1.0.0 rsem-1.2.8: http://deweylab.biostat.wisc.edu/rsem/src/rsem-1.2.8.tar.gz jellyfish 1.1.6: http://www.cbcb.umd.edu/software/jellyfish/ R-3.0.2: http://cran.rstudio.com/src/base/r-3/r-3.0.2.tar.gz After building and installing the Trinity package on Linux on Power, the code modifications has to be made for endian conversion to its plug-in tool jellyfish 1.1.6, and minor code changes in many places to include IBM PowerPC directives. The code is tuned in the Chrysalis module to improve performance on an IBM Power system. Both GCC of Advance Toolchain and IBM XL C/C++ compiler were used to compile the program. IBM POWER8 processor-based system for NGS analysis The large volume of NGS data prompts us to seek computing resources of faster and multiple processors, large memory and storage systems to assemble and map sequences obtained from NGS systems. IBM Power System S824 is a two-socket server that is manufactured with 22 nm Silicon-On-Insulator (SOI) technology (as shown in Figure 1). Each chip is 567 mm and consists of 1.2 billion transistors. The IBM POWER8 processor chip contains 12 cores, with each core having its own 512 KB L2 and 8 MB L3 edram (embedded dynamic random access memory) caches, and 32/64 KB L1 instructions per data cache. The chip has two memory controllers, PCIe Gen3 I/O controllers and other features. Figure 1: A 2-U 2-socket Power System S824 server for managing data-intensive workloads with the ability to run Linux and AIX concurrently The POWER8 processor chip implements SMT8 mode, which supports eight hardware threads per processor core. In the test team s benchmark environment, the POWER8 processor node is running with either Red Hat Enterprise Linux (RHEL) version 6.5 or version 7.0 and is activated with 16-core with maximum processor frequency. 2

Benchmark configurations Data sets The RNA sequence reads were downloaded from public genome databases and the data set properties are listed in Table-2. RNA-seq data set File size and format Number of unique reads Yeast 3.0 GB fastq 39966076 Rice 17.6 GB fasta 125228840 Wild rice leaf 111.6 GB fastq 1211157704 Wild rice root 81.0 GB fastq 3137436217 Table 2: Size of the data sets Benchmark execution Benchmark execution of Trinity RNA-seq de novo assembly used the following script. $TRINITY_HOME/Trinity.pl --seqtype fq --SS_lib_type RF --JM 100G --left $left_input --right $right_input --CPU 16 --inchworm_cpu=8 -- bflyheapspacemax 50G --bflycpu 16 output $Output_dir Benchmark results In order to evaluate Trinity performance on the POWER8 processor-based system, the Trinity code was built with various POWER8 compiler options and the source codes were tuned to improve the performance. The results are summarized and analyzed in the following categories. Performance comparison of RNA-seq de novo assembly using the standard version of Trinity The standard version of Trinity and its supporting software were built and ported on the IBM Power server with big endian conversion and minor changes without performance consideration. The relative performance of a 16-core POWER8 processor-based system to a 32-core IBM System x3850 X5 node (2.27 GHz x7560) is about two times faster in rice RNA-seq, which contains 125 millions short reads. It takes less than 3 hours to assemble the rice RNA sequences. Profile data of Trinity that ran on a POWER8 processor-based server showed poor parallelism in Inchworm and Chrysalis running OpenMP threads. 3

2.50 2.00 X3850 X5 POWER740+ POWER8 RHEL7.0 2.08 1.50 1.00 1.58 1.14 1.00 1.00 1.43 0.50 0.00 Yeast Rice Figure 2: Performance of Trinity RNA-seq assembly on POWER processor-based servers relative to System x3850 servers Performance of RNA-seq de novo assembly using tuned Trinity RNA-seq data sets Performance speedup Standard Tuned (GCC) Tuned (XLC) Yeast 1.00 1.05 1.09 Rice 1.00 1.08 1.22 Wild rice leaf 1.00 1.07 1.16 Wild rice root 1.00 1.18 1.29 Table 3: Performance of tuned Trinity relative to the standard version on POWER8 Trinity source codes were modified to improve the performance on IBM Power Systems by optimizing the number of OpenMP threads in certain regions (refer to Table 3). Tuned code was then run on the available POWER processor-based servers using all four benchmark data sets. The results (as shown in Figure 3) indicate that POWER8 outperforms previous POWER processor-based systems in all test cases. The Trinity ran 20% to 30% faster on POWER8 with RHEL 7.0, which fully supports POWER8 architecture, than with RHEL 6.5, which enables POWER8 in IBM POWER7 compatible mode. 4

Elapsed Time (min) 4000 3500 3000 2500 POWER740 POWER740+ POWER8 RHEL6.5 POWER8 RHEL7.0 Elapsed Time (min) 300 250 200 Elapsed Time (min) 30 25 20 2000 150 15 1500 1000 100 10 500 50 5 0 Wild Rice Leaf Wild Rice Root 0 Datasets Rice 0 Yeast Figure 3: De novo RNA-seq assembly benchmarks run on 16-core Power 740 servers based on POWER7 and POWER7+ processor technologies, and Power S824 servers running either RHEL 6.5 or RHEL 7.0 Performance of reference assembly using huge page memory and system with maximum processor frequency options 21 20.5 20 19.5 Elapsed Time (min) Base Max CPU Frequency Laregpage 190 185 180 175 170 Elapsed Time (min) Base Max CPU Frequency Laregpage 19 165 18.5 Yeast 160 Datasets Rice Figure 4: De novo RNA-seq assembly benchmark run on a 16-core Power S824 server running RHEL7.0 that was activated with maximum CPU frequency (4.15 GHz) through firmware. Figure 4 illustrates the performance comparisons of RNA-seq assembly on POWER8 operating in different system options. The POWER8 in base mode is the default option running RHEL 7.0, where the maximum frequency mode is activated to run with a much faster processor clock speed. The large page option enables the program to use 16 MB huge memory pages. 5

The POWER8 processor-based system running maximum processor frequency gave 5% to 10% performance improvement for RNA-seq assembly using Trinity, while using large page memory did not add benefits to the program execution. The possible reason is that the scalability of running multiple threads is the major performance bottleneck in a Trinity program. Benchmark comparisons using GCC and IBM XLC compilers The Trinity program was built on a POWER8 processor-based system with either GCC or XL C/C++ compilers to evaluate the application performance. The GCC is specifically optimized in a package by IBM team and is known as IBM Advance Toolchain for PowerLinux. The compiler options used for building the program are -Ofast -ffast-math -funroll-loops mcpu=power8 mtune=power8 mabi=mass for GCC and -O3 qhot qarch=pwr8, -qtune=pwr8 qcache=auto for XLC. The benchmark results (as shown in Figure 5) indicate that XLC-built programs run much faster than GCC s in all test cases. A tuned and optimized program improves the benchmark run time by 3 % to 5% for GCC s and by 5% to 10% for XLC s. Elapsed Time 22 21.02 21 20 19.95 20.60 GCC Baseline GCC Tuned XLC Baseline XLC Tuned Elapsed Time (min.) 205 201.02 200 195 190 185 180 194.22 186.00 19 19.20 175 170 165 171.00 160 18 Yeast 155 Rice Datasets Figure 5: Benchmark comparisons using GCC and IBM XLC compilers. De novo RNA-seq assembly benchmark run on a 16-core Power S824 server running RHEL 7.0 activated with maximum processor frequency Acknowledgement The test team would like to thank David A Dubetsky and Victor Liu for their system administration support. 6

Resources The following websites provide useful references to supplement the information contained in this paper: IBM Systems on PartnerWorld ibm.com/partnerworld/systems IBM Power Systems Information Center http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp IBM Redbooks ibm.com/redbooks IBM Publications Center www.elink.ibmlink.ibm.com/public/applications/publications/cgibin/pbi.cgi?cty=us S. D. Buckingham, 2010: Next generation data explosion http://www.labtimes.org/labtimes/issues/lt2010/lt01/lt_2010_01_52_53.pdf M.G. Grabherr and others, 2011: Full-length transcriptome assembly from RNA-Seq data without a reference genome Nature Biotechnology 29, 644 652. www.nature.com/nbt/journal/v29/n7/full/nbt.1883.html B. Langmead and others, 2009: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25. genomebiology.com/2009/10/3/r25 G. Marcais and C. Kingsford, 2011: A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6): 764-770 bioinformatics.oxfordjournals.org/content/27/6/764 About the authors Ruzhu Chen is a certified consultant IT specialist in IBM Systems and Technology Group, ISV Enablement organization. You can reach Ruzhu at ruzhuchen@us.ibm.com. Mark Nellen is a Program Manager in IBM Systems and Technology Group, ISV Enablement organization. You can reach mark at mnellen@us.ibm.com. 7

Trademarks and special notices Copyright IBM Corporation 2014. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml. PowerLinux uses the registered trademark Linux pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the Linux mark on a world-wide basis. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-ibm products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-ibm list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-ibm products. Questions on the capability of non-ibm products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. 8

Photographs shown are of engineering prototypes. Changes may be incorporated in production models. Any references in this information to non-ibm websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. 9