CrocoBLAST: Running BLAST Efficiently in the Age of Next-Generation Sequencing
|
|
- Octavia Gardner
- 5 years ago
- Views:
Transcription
1 CrocoBLAST: Running BLAST Efficiently in the Age of Next-Generation Sequencing Ravi José Tristão Ramos, Allan Cézar de Azevedo Martins, Gabriele da Silva Delgado, Crina- Maria Ionescu, Turán Peter Ürményi, Rosane Silva and Jaroslav Koča
2 Fig. S1. Parallelization strategy for improving the performance of BLAST+ The parallelization strategy involves three main processes (white elements): fragmenting the input file into temporary FASTA files containing a smaller number of sequences (fragments); aligning each fragment against the target database using one BLAST+ process; and composing the alignment results for each fragment into the overall output file. Portions of each process are assigned in parallel (gray elements) to the available threads. Thread assignment is periodically evaluated and optimized (black arrows) to ensure both computational efficiency and easy accessibility to partial output.
3 Table S1. Summary of data files and machines used in the benchmark Designation Description Input sequences Proteins_E_coli_k12 E. coli strain K12 proteome, obtained on April 5, 2016 by searching NCBI for proteins, using the search term "Escherichia coli"[porgn: txid562] k12 S_cerevisiae_Genome File SRR containing reads from whole-genome shotgun sequencing of Saccharomyces cerevisiae W303; available from NCBI. Human_microbiome_metagenome File SRS from the Human Microbiome Project, containing reads from shotgun sequencing of a human stool microbiome sample; available from Human_microbiome_16S_RNA File SRR from the Human Microbiome Project containing reads from 16S sequencing of multiple samples of the human microbiome; available from Databases E_coli_P E. coli strain K12 proteome obtained in database format by conversion from Proteins_E_coli_k12 E_coli_G E. coli strain K12 genome (NCBI reference sequence: NC_ ), obtained on April 5, 2016 by searching NCBI genomic databases using the search term E coli. 16S_Micro NCBI database containing bacterial and archaeal 16S ribosomal RNA sequences; available from Machines Desktop Intel Core2 Quad Q9650 (4 cores, 4 threads); 4 GB RAM (DDR 2); ext4 file system on HDD; Ubuntu 16.4 Workstation Intel Core i7-5820k CPU (6 cores, 12 threads); 32.0 GB RAM (DDR 4); ext4 file system on SSD; Ubuntu 16.4 Server 4 x AMD Opteron 6174 (12 cores, 12 threads; total: 48 cores, 48 threads); 128 GB RAM (DDR3); ext4 on HDD; Fedora 20 Cluster node 4 x AMD Opteron 6274 (16 cores, 16 threads; total: 64 cores, 64 threads); 256 GB RAM (DDR 3); nfs4; Ubuntu The input and database files were obtained from data files available on the NCBI ( or Human Microbiome Project pages ( Abbreviations: DDR, double data rate; CPU, central processing unit; HDD, hard disk drive; ext4, fourth extended file system; NCBI, National Center for Biotechnology Information; nfs4, network file system 4; RAM, random access memory; SSD, solid-state drive
4 Fig. S2. Benchmarking strategy for assessing the performance of CrocoBLAST against the performance of BLAST+ The five most common BLAST programs were tested, and publicly available datasets (see also Table S1) were used as input sequences or reference databases, which resulted in 11 case studies. The performance of CrocoBLAST was compared against those of single- and multi-threaded BLAST+ (with the -num_threads option set to the maximum threads achievable for each machine). Each test case was run on four machines. For each case study, the tests were run in the following order: CrocoBLAST, multi-threaded BLAST+, and single-threaded BLAST+. Each test was run in triplicate, with a few minutes of break between jobs, to allow for normalization of CPU temperature and avoid other hardware interference. All benchmark tests were run using BLAST+ version 2.4.0, released by NCBI on June 2 nd, CrocoBLAST always runs BLAST+ in single-thread configuration, but assigns several such calculations to the CPU. For the benchmark tests, CrocoBLAST used a fragment size of 20 KB.
5 Fig. S3. Average runtimes on various machines The columns indicate the average runtime in minutes, with shorter runtime indicating better use of available resources. Results are compared among single-threaded BLAST+ (light gray), multi-threaded BLAST+ with the -num_threads parameter set to the maximum number of threads achievable on each machine (dark gray), and CrocoBLAST+ with the default settings (black). CrocoBLAST always runs BLAST+ in single-thread configuration, but assigns several such calculations to the CPU. An overview of the benchmarking strategy is provided in Fig. S2, while detailed machine specifications and full description of the case studies are available in Table S1. Each test was run in triplicate, and a maximum standard deviation of 1.8% from the mean was noted. Only average values are plotted. * When attempting to run case 2 (a tblastn alignment of the Escherichia coli proteome against the translated E. coli genome), multi-threaded BLAST+ crashed on all machines, whereas CrocoBLAST and singlethreaded BLAST+ ran successfully.
6 Fig. S4. Improvement of CrocoBLAST relative to multi-threaded BLAST+ on various machines Columns indicate speedup provided by CrocoBLAST with default set-tings over multi-threaded BLAST+ (dark gray) with the -num_threads parameter set to the maximum number of threads achievable on each machine (dotted lines). Each test was run in triplicate, and a maximum standard deviation of 1.8% from the mean was noted. Speedup was calculated based on average values. CrocoBLAST always runs BLAST+ in single-thread configuration, but assigns several such calculations to the CPU. An overview of the benchmarking strategy is provided in Fig. S2, while detailed machine specifications and full description of the case studies are available in Table S1. The results regarding case study 2 (a tblastn alignment of the E. coli proteome against the translated E. coli genome) were not included in this assessment because multi-threaded BLAST+ crashed on all machines when attempting to run this case study.
7 Fig. S5. Average CPU usage on various machines The columns indicate CPU usage as the ratio between the CPU capacity assigned to a given calculation and the total CPU capacity available for that machine, averaged over the entire duration of the calculation and expressed as a percentage, with higher CPU usage indicating reduced idle time and better use of available resources. Results are compared among single-threaded BLAST+ (light gray), multi-threaded BLAST+ with the -num_threads parameter set to the maximum number of threads achievable on each machine (dark gray), and CrocoBLAST+ with the default settings (black). CrocoBLAST always runs BLAST+ in single-thread configuration, but assigns several such calculations to the CPU. CPU usage was calculated based on data extracted using a script to parse and log the output of the UNIX command top in order to record the CPU utilization for each second. CPU usage was then calculated with the following formula: Where: 1s_CPU_utilization means the CPU utilization given by the UNIX top command for each
8 second (%CPU column). For CrocoBLAST, process goes through all processes associated with CrocoBLAST, namely crocoblast (input file fragmenter, thread manager, and assembler of final output), CCblast (wrapper for each single-threaded BLAST+ process), CCblast_asbly (assembler of partial output during the alignment stage), and the corresponding BLAST+ processes for that test case (e.g. blastn ). For BLAST+, process indicates the corresponding BLAST+ process for that test case (e.g. blastn ). The time was measured in seconds, from the beginning to the end of each run, registered by the script used to parse the log output. The number of CPUs accounted for both physical and virtual CPUs. An overview of the benchmarking strategy is provided in Fig. S2, while detailed machine specifications and full description of the case studies are available in Table S1. * When attempting to run case 2 (a tblastn alignment of the Escherichia coli proteome against the translated E. coli genome), multi-threaded BLAST+ crashed on all machines, whereas CrocoBLAST and single-threaded BLAST+ ran successfully.
9 Fig. S6. Peak memory usage on various machines The columns indicate the highest memory requirement over the course of each calculation, expressed as a percentage of the total memory of each machine, with lower peak memory indicating reduced requirements and thus the possibility to run said calculation on a less expensive machine. Results are compared among single-threaded BLAST+ (light gray), multi-threaded BLAST+ with the - num_threads parameter set to the maximum number of threads achievable on each machine (dark gray), and CrocoBLAST+ with the default settings (black). CrocoBLAST always runs BLAST+ in singlethread configuration, but assigns several such calculations to the CPU. An overview of the benchmarking strategy is provided in Fig. S2, while detailed machine specifications and full description of the case studies are available in Table S1. In most cases, CrocoBLAST increased the peak memory used during the calculation, but this increase was often negligible (typically, <10% of the total memory for each machine) and did not detrimentally affect performance even at its worst (case 9 on the server, where CrocoBLAST used up to 28% of the total memory); in some cases, CrocoBLAST reduced the peak memory used (case 9 on the desktop and workstation). * When attempting to run case 2 (a tblastn alignment of the E. coli proteome against the translated E. coli genome), multi-threaded BLAST+ crashed on all machines, whereas CrocoBLAST and single-threaded BLAST+ ran successfully.
10 Table S2. I/O statistics when running CrocoBLAST or BLAST+ on each machine Read Write Workstation CrocoBlast Multi-threaded BLAST Single-threaded BLAST CrocoBLAST Multi-threaded BLAST Single-threaded BLAST Read + Write CrocoBLAST Multi-threaded BLAST Single-threaded BLAST Read Write Desktop CrocoBlast Multi-threaded BLAST Single-threaded BLAST CrocoBLAST Multi-threaded BLAST Single-threaded BLAST Read + Write CrocoBLAST Multi-threaded BLAST Single-threaded BLAST Average I/O usage is computed based on the entire run. Each calculation was run in triplicate, and only mean values are shown. Only case studies 1, 7, 9, and 11 were chosen to illustrate I/O statistics, as they were expected to provide a good overview of the typical use of CrocoBLAST regarding database coverage of the query sequences and number of results for each query sequence with a hit. Detailed machine specifications and full description of the case studies are available in Table S1 and Figure S2.
11 Fig. S7. Graphical user interface of CrocoBLAST The work space is organized into three main tabs focused on queue management, job creation, and database management. Full information regarding the running status, queue position, and setup is available for all jobs, along with information regarding the progress of the currently running job (estimated time remaining, percentage completed).
Tutorial 4 BLAST Searching the CHO Genome
Tutorial 4 BLAST Searching the CHO Genome Accessing the CHO Genome BLAST Tool The CHO BLAST server can be accessed by clicking on the BLAST button on the home page or by selecting BLAST from the menu bar
More informationMetaPhyler Usage Manual
MetaPhyler Usage Manual Bo Liu boliu@umiacs.umd.edu March 13, 2012 Contents 1 What is MetaPhyler 1 2 Installation 1 3 Quick Start 2 3.1 Taxonomic profiling for metagenomic sequences.............. 2 3.2
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2019 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationINTRODUCTION TO BIOINFORMATICS
Molecular Biology-2017 1 INTRODUCTION TO BIOINFORMATICS In this section, we want to provide a simple introduction to using the web site of the National Center for Biotechnology Information NCBI) to obtain
More informationHow to Run NCBI BLAST on zcluster at GACRC
How to Run NCBI BLAST on zcluster at GACRC BLAST: Basic Local Alignment Search Tool Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 OVERVIEW What is BLAST?
More informationBrowser Exercises - I. Alignments and Comparative genomics
Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)
More informationAMPHORA2 User Manual. An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences. COPYRIGHT 2011 by Martin Wu
AMPHORA2 User Manual An Automated Phylogenomic Inference Pipeline for Bacterial and Archaeal Sequences. COPYRIGHT 2011 by Martin Wu AMPHORA2 is free software: you may redistribute it and/or modify its
More information24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, This lecture is based on the following papers, which are all recommended reading:
24 Grundlagen der Bioinformatik, SS 10, D. Huson, April 26, 2010 3 BLAST and FASTA This lecture is based on the following papers, which are all recommended reading: D.J. Lipman and W.R. Pearson, Rapid
More informationENABLING NEW SCIENCE GPU SOLUTIONS
ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a
More informationAlwan CMYK Optimizer
Alwan CMYK Optimizer Benchmark on processing performances October 25, 2012 I. Introduction The goal of this paper is to help users choose the appropriate Mac configuration for their PDF usage of CMYK Optimizer.
More informationTutorial. Aligning contigs manually using the Genome Finishing. Sample to Insight. February 6, 2019
Aligning contigs manually using the Genome Finishing Module February 6, 2019 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com
More informationSequence Alignment. GBIO0002 Archana Bhardwaj University of Liege
Sequence Alignment GBIO0002 Archana Bhardwaj University of Liege 1 What is Sequence Alignment? A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity.
More informationWhite Paper. File System Throughput Performance on RedHawk Linux
White Paper File System Throughput Performance on RedHawk Linux By: Nikhil Nanal Concurrent Computer Corporation August Introduction This paper reports the throughput performance of the,, and file systems
More informationOPERATING SYSTEM. PREPARED BY : DHAVAL R. PATEL Page 1. Q.1 Explain Memory
Q.1 Explain Memory Data Storage in storage device like CD, HDD, DVD, Pen drive etc, is called memory. The device which storage data is called storage device. E.g. hard disk, floppy etc. There are two types
More informationTesting 6x DS-CAM-600. Gigabit-Ethernet Camera
Gigabit-Ethernet Camera 1. System requirements o 6 x independent Gigabit-Ethernet ports Used network cards at the testing: Intel PRO/1000 PT Quad Port Low Profile Server Adapter Tenda TEL9901 o Good PC
More informationm6aviewer Version Documentation
m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.
More informationThe SHARED hosting plan is designed to meet the advanced hosting needs of businesses who are not yet ready to move on to a server solution.
SHARED HOSTING @ RS.2000/- PER YEAR ( SSH ACCESS, MODSECURITY FIREWALL, DAILY BACKUPS, MEMCHACACHED, REDIS, VARNISH, NODE.JS, REMOTE MYSQL ACCESS, GEO IP LOCATION TOOL 5GB FREE VPN TRAFFIC,, 24/7/365 SUPPORT
More informationBasic Local Alignment Search Tool (BLAST)
BLAST 26.04.2018 Basic Local Alignment Search Tool (BLAST) BLAST (Altshul-1990) is an heuristic Pairwise Alignment composed by six-steps that search for local similarities. The most used access point to
More informationAnnotating sequences in batch
BioNumerics Tutorial: Annotating sequences in batch 1 Aim The annotation application in BioNumerics has been designed for the annotation of coding regions on sequences. In this tutorial you will learn
More informationPerformance analysis of parallel de novo genome assembly in shared memory system
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Performance analysis of parallel de novo genome assembly in shared memory system To cite this article: Syam Budi Iryanto et al 2018
More informationCOMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP. Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas
COMPARATIVE MICROBIAL GENOMICS ANALYSIS WORKSHOP Exercise 2: Predicting Protein-encoding Genes, BlastMatrix, BlastAtlas First of all connect once again to the CBS system: Open ssh shell client. Press Quick
More informationAnnotating a single sequence
BioNumerics Tutorial: Annotating a single sequence 1 Aim The annotation application in BioNumerics has been designed for the annotation of coding regions on sequences. In this tutorial you will learn how
More informationIntroduction to Phylogenetics Week 2. Databases and Sequence Formats
Introduction to Phylogenetics Week 2 Databases and Sequence Formats I. Databases Crucial to bioinformatics The bigger the database, the more comparative research data Requires scientists to upload data
More informationScalability Testing with Login VSI v16.2. White Paper Parallels Remote Application Server 2018
Scalability Testing with Login VSI v16.2 White Paper Parallels Remote Application Server 2018 Table of Contents Scalability... 3 Testing the Scalability of Parallels RAS... 3 Configurations for Scalability
More informationSlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching
SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching Ilya Y. Zhbannikov 1, Samuel S. Hunter 1,2, Matthew L. Settles 1,2, and James
More informationSEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi
SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University
More informationMacVector for Mac OS X. The online updater for this release is MB in size
MacVector 17.0.3 for Mac OS X The online updater for this release is 143.5 MB in size You must be running MacVector 15.5.4 or later for this updater to work! System Requirements MacVector 17.0 is supported
More informationWilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment
An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi
More informationRecovering Disk Storage Metrics from low level Trace events
Recovering Disk Storage Metrics from low level Trace events Progress Report Meeting May 05, 2016 Houssem Daoud Michel Dagenais École Polytechnique de Montréal Laboratoire DORSAL Agenda Introduction and
More informationDifferential Expression Analysis at PATRIC
Differential Expression Analysis at PATRIC The following step- by- step workflow is intended to help users learn how to upload their differential gene expression data to their private workspace using Expression
More informationKraken: ultrafast metagenomic sequence classification using exact alignments
Kraken: ultrafast metagenomic sequence classification using exact alignments Derrick E. Wood and Steven L. Salzberg Bioinformatics journal club October 8, 2014 Märt Roosaare Need for speed Metagenomic
More informationEnhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations
Performance Brief Quad-Core Workstation Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations With eight cores and up to 80 GFLOPS of peak performance at your fingertips,
More informationGenomic Island Hunter (GIHunter)
2013 Genomic Island Hunter (GIHunter) Han Wang, Dongsheng Che Department of Computer Science East Stroudsburg University Contents 1. Requirements 2 2. Installation 3 2.1 Download GIHunter 3 2.2 Extract
More informationAssessing Transcriptome Assembly
Assessing Transcriptome Assembly Matt Johnson July 9, 2015 1 Introduction Now that you have assembled a transcriptome, you are probably wondering about the sequence content. Are the sequences from the
More informationMSFragger Manual. (build )
MSFragger Manual (build 20170103.0) Introduction MSFragger is an ultrafast database search tool for peptide identifications in mass spectrometry-based proteomics. It differs from conventional search engines
More informationMap3D V58 - Multi-Processor Version
Map3D V58 - Multi-Processor Version Announcing the multi-processor version of Map3D. How fast would you like to go? 2x, 4x, 6x? - it's now up to you. In order to achieve these performance gains it is necessary
More informationTechnical Brief: Specifying a PC for Mascot
Technical Brief: Specifying a PC for Mascot Matrix Science 8 Wyndham Place London W1H 1PP United Kingdom Tel: +44 (0)20 7723 2142 Fax: +44 (0)20 7725 9360 info@matrixscience.com http://www.matrixscience.com
More informationDatabase Searching Using BLAST
Mahidol University Objectives SCMI512 Molecular Sequence Analysis Database Searching Using BLAST Lecture 2B After class, students should be able to: explain the FASTA algorithm for database searching explain
More informationUsing many concepts related to bioinformatics, an application was created to
Patrick Graves Bioinformatics Thursday, April 26, 2007 1 - ABSTRACT Using many concepts related to bioinformatics, an application was created to visually display EST s. Each EST was displayed in the correct
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationTutorial:OverRepresentation - OpenTutorials
Tutorial:OverRepresentation From OpenTutorials Slideshow OverRepresentation (about 12 minutes) (http://opentutorials.rbvi.ucsf.edu/index.php?title=tutorial:overrepresentation& ce_slide=true&ce_style=cytoscape)
More informationHPC Enabling R&D at Philip Morris International
HPC Enabling R&D at Philip Morris International Jim Geuther*, Filipe Bonjour, Bruce O Neel, Didier Bouttefeux, Sylvain Gubian, Stephane Cano, and Brian Suomela * Philip Morris International IT Service
More informationWhole genome assembly comparison of duplication originally described in Bailey et al
WGAC Whole genome assembly comparison of duplication originally described in Bailey et al. 2001. Inputs species name path to FASTA sequence(s) to be processed either a directory of chromosomal FASTA files
More informationRunning SNAP. The SNAP Team February 2012
Running SNAP The SNAP Team February 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More
More informationPARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology
Nucleic Acids Research, 2005, Vol. 33, Web Server issue W535 W539 doi:10.1093/nar/gki423 PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology Per Eystein
More informationUser's guide: Manual for V-Xtractor 2.0
User's guide: Manual for V-Xtractor 2.0 This is a guide to install and use the software utility V-Xtractor. The software is reasonably platform-independent. The instructions below should work fine with
More informationDEDICATED SERVERS WITH EBS
DEDICATED WITH EBS TABLE OF CONTENTS WHY CHOOSE A DEDICATED SERVER? 3 DEDICATED WITH EBS 4 INTEL ATOM DEDICATED 5 AMD OPTERON DEDICATED 6 INTEL XEON DEDICATED 7 MANAGED SERVICES 8 SERVICE GUARANTEES 9
More informationE. coli functional genotyping: predicting phenotypic traits from whole genome sequences
BioNumerics Tutorial: E. coli functional genotyping: predicting phenotypic traits from whole genome sequences 1 Aim In this tutorial we will screen genome sequences of Escherichia coli samples for phenotypic
More informationEfficient large-scale biological sequence comparisons on cluster architectures
FINAL REPORT Efficient large-scale biological sequence comparisons on cluster architectures NOTUR Advanced User Support project nn4008k Project responsible: Torbjørn Rognes (CMBN 1 & IFI 2 ) Other personnel
More informationIntroduction to HPC Using the New Cluster at GACRC
Introduction to HPC Using the New Cluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is the new cluster
More informationWhen we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame
1 When we search a nucleic acid databases, there is no need for you to carry out your own six frame translation. Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from
More informationRunning SNAP. The SNAP Team October 2012
Running SNAP The SNAP Team October 2012 1 Introduction SNAP is a tool that is intended to serve as the read aligner in a gene sequencing pipeline. Its theory of operation is described in Faster and More
More informationTechnical Documentation Version 7.4. Performance
Technical Documentation Version 7.4 These documents are copyrighted by the Regents of the University of Colorado. No part of this document may be reproduced, stored in a retrieval system, or transmitted
More informationImproving Throughput in Cloud Storage System
Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because
More informationGPS Explorer Software For Protein Identification Using the Applied Biosystems 4700 Proteomics Analyzer
GPS Explorer Software For Protein Identification Using the Applied Biosystems 4700 Proteomics Analyzer Getting Started Guide GPS Explorer Software For Protein Identification Using the Applied Biosystems
More informationDBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki
DBMS Data Loading: An Analysis on Modern Hardware Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki Data loading: A necessary evil Volume => Expensive 4 zettabytes
More informationGegenees genome format...7. Gegenees comparisons...8 Creating a fragmented all-all comparison...9 The alignment The analysis...
User Manual: Gegenees V 1.1.0 What is Gegenees?...1 Version system:...2 What's new...2 Installation:...2 Perspectives...4 The workspace...4 The local database...6 Populate the local database...7 Gegenees
More informationParallels Remote Application Server. Scalability Testing with Login VSI
Parallels Remote Application Server Scalability Testing with Login VSI Contents Introduction... 3 Scalability... 4 Testing the Scalability of Parallels RAS... 4 Configurations for Scalability Testing...
More informationExercise 2: Browser-Based Annotation and RNA-Seq Data
Exercise 2: Browser-Based Annotation and RNA-Seq Data Jeremy Buhler July 24, 2018 This exercise continues your introduction to practical issues in comparative annotation. You ll be annotating genomic sequence
More informationFaster Metal Forming Solution with Latest Intel Hardware & Software Technology
12 th International LS-DYNA Users Conference Computing Technologies(3) Faster Metal Forming Solution with Latest Intel Hardware & Software Technology Nick Meng 1, Jixian Sun 2, Paul J Besl 1 1 Intel Corporation,
More informationForensic Toolkit System Specifications Guide
Forensic Toolkit System Specifications Guide February 2012 When it comes to performing effective and timely investigations, we recommend examiners take into consideration the demands the software, and
More informationChapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.
Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE
More informationAssessment of LS-DYNA Scalability Performance on Cray XD1
5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123
More informationProcess Historian Administration SIMATIC. Process Historian V8.0 Update 1 Process Historian Administration. Basics 1. Hardware configuration 2
Basics 1 Hardware configuration 2 SIMATIC Process Historian V8.0 Update 1 Management console 3 Process control messages 4 System Manual 04/2012 A5E03916798-02 Legal information Legal information Warning
More informationSoftware within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual
Software within building physics and ground heat storage HEAT3 version 7 A PC-program for heat transfer in three dimensions Update manual June 15, 2015 BLOCON www.buildingphysics.com Contents 1. WHAT S
More informationEnvironmental Sample Classification E.S.C., Josh Katz and Kurt Zimmer
Environmental Sample Classification E.S.C., Josh Katz and Kurt Zimmer Goal: The task we were given for the bioinformatics capstone class was to construct an interface for the Pipas lab that integrated
More informationShotgun sequencing. Coverage is simply the average number of reads that overlap each true base in genome.
Shotgun sequencing Genome (unknown) Reads (randomly chosen; have errors) Coverage is simply the average number of reads that overlap each true base in genome. Here, the coverage is ~10 just draw a line
More informationParallel Exact Inference on the Cell Broadband Engine Processor
Parallel Exact Inference on the Cell Broadband Engine Processor Yinglong Xia and Viktor K. Prasanna {yinglonx, prasanna}@usc.edu University of Southern California http://ceng.usc.edu/~prasanna/ SC 08 Overview
More informationFPGA Based Agrep for DNA Microarray Sequence Searching
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (20) (20) IACSIT Press, Singapore FPGA Based Agrep for DNA Microarray Sequence Searching Gabriel F. Villorente, 2 Mark
More informationPerformance Benchmark and Capacity Planning. Version: 7.3
Performance Benchmark and Capacity Planning Version: 7.3 Copyright 215 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied
More informationDEDICATED SERVERS WITH WEB HOSTING PRICED RIGHT
DEDICATED SERVERS WITH WEB HOSTING PRICED RIGHT TABLE OF CONTENTS WHY CHOOSE A DEDICATED SERVER? 3 DEDICATED SERVER ADVANTAGES 4 DEDICATED SERVERS WITH WEB HOSTING PRICED RIGHT 5 SERVICE GUARANTEES 6 WHY
More informationPowerVault MD3 SSD Cache Overview
PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS
More informationMaking Tables and Graphs with Excel. The Basics
Making Tables and Graphs with Excel The Basics Where do my IV and DV go? Just like you would create a data table on paper, your IV goes in the leftmost column and your DV goes to the right of the IV Enter
More informationSanger Data Assembly in SeqMan Pro
Sanger Data Assembly in SeqMan Pro DNASTAR provides two applications for assembling DNA sequence fragments: SeqMan NGen and SeqMan Pro. SeqMan NGen is primarily used to assemble Next Generation Sequencing
More informationQuickSpecs. PCIe Solid State Drives for HP Workstations
Solid State Drives for HP Workstations Overview Solid State Drives for HP Workstations Introduction Storage technology with NAND media is outgrowing the bandwidth limitations of the SATA bus. New high
More informationChapter 18 - Multicore Computers
Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca
More informationIMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM
IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information
More informationINTRODUCTION TO NEXTFLOW
INTRODUCTION TO NEXTFLOW Paolo Di Tommaso, CRG NETTAB workshop - Roma October 25th, 2016 @PaoloDiTommaso Research software engineer Comparative Bioinformatics, Notredame Lab Center for Genomic Regulation
More informationSFS: Random Write Considered Harmful in Solid State Drives
SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea
More informationBIOL591: Introduction to Bioinformatics Alignment of pairs of sequences
BIOL591: Introduction to Bioinformatics Alignment of pairs of sequences Reading in text (Mount Bioinformatics): I must confess that the treatment in Mount of sequence alignment does not seem to me a model
More informationCLC Server. End User USER MANUAL
CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark
More informationCOL862 Programming Assignment-1
Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,
More informationLow Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces
Low Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces Evaluation report prepared under contract with LSI Corporation Introduction IT professionals see Solid State Disk (SSD) products as
More informationLecture 01: Basic Structure of Computers
CSCI2510 Computer Organization Lecture 01: Basic Structure of Computers Ming-Chang YANG mcyang@cse.cuhk.edu.hk Reading: Chap. 1.1~1.3 Outline Computer: Tools for the Information Age Basic Functional Units
More informationPARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM
PARALLELIZATION OF THE NELDER-MEAD SIMPLEX ALGORITHM Scott Wu Montgomery Blair High School Silver Spring, Maryland Paul Kienzle Center for Neutron Research, National Institute of Standards and Technology
More informationCube Base Reference Guide Cube Base CUBE BASE VERSION 6.4.4
Cube Base Reference Guide Cube Base CUBE BASE VERSION 6.4.4 1 Introduction System requirements of Cube, outlined in this section, include: Recommended workstation configuration Recommended server configuration
More information<Insert Picture Here> Boost Linux Performance with Enhancements from Oracle
Boost Linux Performance with Enhancements from Oracle Chris Mason Director of Linux Kernel Engineering Linux Performance on Large Systems Exadata Hardware How large systems are different
More informationGenome Assembly and De Novo RNAseq
Genome Assembly and De Novo RNAseq BMI 7830 Kun Huang Department of Biomedical Informatics The Ohio State University Outline Problem formulation Hamiltonian path formulation Euler path and de Bruijin graph
More informationSharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment
SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment This document is provided as-is. Information and views expressed in this document, including
More informationUser Guide. Setup and Installation guide
User Guide Setup and Installation guide Contents 1. Getting Help... 2 2. System Requirements... 2 3. Loki Architecture and Installation Types... 3 4. Installation... 4 5. First Run... 7 6. Licencing...
More informationTutorial: De Novo Assembly of Paired Data
: De Novo Assembly of Paired Data September 20, 2013 CLC bio Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com support@clcbio.com : De Novo Assembly
More informationHORIZONTAL GENE TRANSFER DETECTION
HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all
More informationFUSION1200 Scalable x86 SMP System
FUSION1200 Scalable x86 SMP System Introduction Life Sciences Departmental System Manufacturing (CAE) Departmental System Competitive Analysis: IBM x3950 Competitive Analysis: SUN x4600 / SUN x4600 M2
More informationBLAST MCDB 187. Friday, February 8, 13
BLAST MCDB 187 BLAST Basic Local Alignment Sequence Tool Uses shortcut to compute alignments of a sequence against a database very quickly Typically takes about a minute to align a sequence against a database
More informationAppendix A Recommended Server/Workstation Specifications
Appendix A Recommended Server/Workstation Specifications Server and System SQL Express Systems CPU Quad Core or Better Preferred 8 GB RAM or more (Note that to have 8 GB RAM or above recognized, OS must
More informationOrthoMCL v1.4. Recall: Web Service: Datadoc v.1 1/29/ Algorithm Description (SCIENCE)
OrthoMCL v1.4 Datadoc v.1 1/29/2007 1. Algorithm Description (SCIENCE) Summary: OrthoMCL is a method that calculates the closest relative to a gene within another species set. For example, protein kinase
More informationIntroduction to Computational Molecular Biology
18.417 Introduction to Computational Molecular Biology Lecture 13: October 21, 2004 Scribe: Eitan Reich Lecturer: Ross Lippert Editor: Peter Lee 13.1 Introduction We have been looking at algorithms to
More informationDe novo genome assembly
BioNumerics Tutorial: De novo genome assembly 1 Aims This tutorial describes a de novo assembly of a Staphylococcus aureus genome, using single-end and pairedend reads generated by an Illumina R Genome
More informationMinimum Laserfiche Rio Hardware Specifications
Minimum Laserfiche Rio Hardware Specifications Note: The estimates and advice in this paper are intended as guidelines only, not as rules. We cannot guarantee the hardware we suggest will always be sufficient
More informationUsing Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD
More information