SolexaLIMS: A Laboratory Information Management System for the Solexa Sequencing Platform

Size: px
Start display at page:

Download "SolexaLIMS: A Laboratory Information Management System for the Solexa Sequencing Platform"

Transcription

1 SolexaLIMS: A Laboratory Information Management System for the Solexa Sequencing Platform Brian D. O Connor, 1, Jordan Mendler, 1, Ben Berman, 2, Stanley F. Nelson 1 1 Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA 2 USC/Keck School of Medicine, University of Southern California, Los Angeles, CA, USA Brian D. O Connor - boconnor@ucla.edu; Jordan Mendler - jmendler@ucla.edu; Ben Berman -benberman@usc.edu; Stanley F. Nelson - snelson@ucla.edu; Corresponding author Abstract The SolexaLIMS system is a collection of software components and web tools designed to facilitate the processing, analysis, sharing, and long term storage of sequence data produced with the Solexa sequencing instrument. The SolexaLIMS system sits between the end user and the processing pipeline provided by Solexa for processing raw sequence data into final base calls and for performing simple alignments with a reference genome. It consists of three major components: a database component that tracks sample annotations, a web component for entering and editing sample annotations, and an analysis component that allows for basic reporting. The SolexaLIMS system is a free and open source project that welcomes community involvement to further enhance the LIMS components. Software and a user forum are available at 1

2 Background The Solexa 1G genome analyzer provides the ability to perform massively parallel sequencing of 1 billion bases per run. With this extreme amount of sequence data comes challenges to efficiently manage the acquisition, analysis, and long-term storage of these results. Solexa provides software that automates many of the analysis steps involved in generating sequence data from a sample. However, a comprehensive laboratory information management system (LIMS) for controlling this data flow has previously been unavailable. Here we present SolexaLIMS, a web application and application program interface (API) for managing Solexa experiments. LIMS systems are designed to manage samples, instruments, and laboratory work flows in a way that maximizes sample throughput and minimizes experimental bottlenecks. The SolexaLIMS system was designed to meet these needs and provides the ability to record the basic sample information key to describing what is being sequenced on the instrument. This functionality uses a web-based LIMS interface which is linked to an analysis and reporting functions via a database back-end. This makes it possible to enter a sample information once and generate many reports, such as genomic alignments with polymorphism detected, in an automated way. The SolexaLIMS system also includes the ability to backup and validate data from Solexa datasets. Implementation The SolexaLIMS software is divided into 3 major functional categories: the LIMS user website, the Solexa pipeline wrappers, and finally the LIMS reports and analysis tool. These components are integrated and use a common database to track the overall progress of a Solexa run within the LIMS system. Figure 1 shows an overview of the specific steps and components making up the processing of a typical Solexa experiment. SolexaLIMS Web Application The SolexaLIMS web application is the main conduit through which users interact with the LIMS system. All sample annotation entry happens here and the progress a sample takes through various processing steps is documented here as well. The SolexaLIMS web application is built using standard Java web application APIs including the popular Spring framework and the Hibernate data access layer. Using Java allows this web application to be portable and easy to customize for many web developers already familiar with these popular technologies. 2

3 The LIMS website allows for basic access restrictions and requires that each user to create an account and login to use the site. Once logged in, a user can see the list of Solexa datasets that have been, or are currently being, processed by the LIMS system. Uses can add new experiments and this process begins with collecting some basic information about the experiment being done along with details describing the contents of each lane in the flowcell. This process is iterative and the user may come back to edit an experiment s information multiple times as the sample is processed by the Solexa instrument. For example, as additional quality information is determined, the user may specify which sequencing cycles should be used for further analysis. This type of information is typically not available before a sample is processed by the Solexa instrument. Once a user is satisfied with the accuracy of the entries for a given sample in the Solexa LIMS website, they can activate the processing pipeline by selecting process now in the experiment page. This indicates that the sample has been completely processed by the Solexa instrument and is ready for processing by both the Solex-provided pipeline and additional validation, analysis, and reporting tools provided by the LIMS system. Each step is documented and its status is updated in real time to the website so users can track each experiment. Solexa LIMS Database The Solexa LIMS web site stores information about each run in a database along with information about the algorithms, reporting and validation tools applied to the sample data. It represents these as experiment and process tables respectively. Since Solexa using multiple distinct lanes in their flowcell, each lane is represented as a separate lane entry to track information about it. The LIMS database is designed to act as a natural point of integration and all processing, reporting, and validation tools write to the database to track their progress and status. Solexa LIMS Pipeline Information written to the database by the LIMS web application is read using a daemon that monitors for new experiments. This process then spawns the solexa pipeline and various reporting/ validation tools. This pipeline wrapper is cluster aware and integrates with the Sun Grid Engine to distribute the Solexa pipeline across cluster nodes. The pipeline code in the Solexa LIMS system acts as a wrapper for the intensity and base calling algorithms provided by Solexa. 3

4 Solexa LIMS Reporting and Analysis Tools The SolexaLIMS system provides several analysis and reporting tools designed to examine sequencing experiments of several different types. The first is a wrapper for the ELAND alignment algorithm that generates fast, ungapped alignments between a reference genome and the Solexa sequence with less than 3 mismatches. Alternatively the LIMS system includes another alignment wrapper for BLAT which allows for more flexibility in the search criteria. Depending on the experiment performed and the reference genome used the SolexaLIMS system provides two reporting mechanisms. First, for genomic alignments BED and WIG files are produced that represent polymorphisms observed from the Solexa sequences and the sequence coverage per base position respectively. Second, alignments to a RefSeq cdna library can be analyzed to generate a report of RefSeq counts based on the number of Solex sequences aligned within each cdna sequence. All reports are linked to via the SolexaLIMS website. The SolexaLIMS analysis and reporting component also includes tools for validating the various steps in the Solexa pipeline. This ensures that the experimental data integrity at each step in the process. The analysis component also includes compression tools for representing Solexa data in a more compact form. This includes converting intensity information derived from the Solexa pipeline from plain text files to NetCDF binary files. This conversion preserves all information in the text files yet takes only a fraction of the storage space. Similarly, sequence data produced by the Solexa pipeline is also converted from plain text output to a binary version following the 2bit format. This allows for a both sequence and quality information to be stored in separate tracks while reducing overall file size considerably. Results In our SolexaLIMS system number of lanes have been processed through the Solexa pipeline. This includes the Firecrest algorithm for image processing and the Bustard base-calling algorithm. This has resulting in number sequenced bases. Datasets of this size require considerable storage space with the average run taking up numbergb of space for image files. The intensity and base calling process adds another number GB to that size. While each alignment using ELAND takes approximately number GB per lane for alignments to the human genome. The SolexaLIMS system has been successfully used to sequence approximately? bases worth of sequence, many of which representing deep sequencing of exons.? bases of this total sequence were aligned to the human genome (build 18, UCSC) using the ELAND tool from the Solexa pipeline. This process was facilitated by the data entered in the SolexaLIMS system. Here, only lanes from experiments annotated as 4

5 human genomic DNA were aligned to the human genome reference sequence in this way. Of this sequence, approximately? novel SNPs were identified in exon regions. These were identified using the reporting components of SolexaLIMS including tools for outputting BED and WIG file formats for display in the UCSC browser. Figure 2 shows and example region visualized in this way. This section should be expanded... I could just go into the types of experiments that have been done and how many sequences/alignments have been produced. Conclusions The SolexaLIMS system was designed to meet the needs of research groups, laboratories, and core facilities using the next generations sequencing techniques provided by Solexa. While software exists to process raw data from the 1G instrument to finished sequence files, there are currently no options for managing this overall work-flow from start to finish. SolexaLIMS provides this missing capability and wraps not only the raw data processing but also common analysis and archiving of the results. The interface controlling the overall flow and analysis is a simple web-based application familiar to most users. This web interface provides the ability to annotate key pieces of information about the experiment that describe the nature of the sequencing being done. It also enables users to follow the validation, backup, analysis and reporting modules resulting, making it easy to ensure a successful experiment and easy access to results. Since the Solexa instrument, and other next generation sequencing technologies, are just now coming online in core facilities and laboratories, many of the applications and common usage patterns are still to be worked out. A challenge going forward with the SolexaLIMS project is the adequate representation of experimental procedures in the LIMS system. For example, paired-end sequencing equipment and procedures will soon be available for the 1G instrument. This will necessitate a different data processing flow in the processing pipeline wrapper script and front-end changes to the SolexaLIMS web application to capture appropriate sample annotations. The opensource nature of SolexaLIMS means that many changes in responses to new experimental procedures using the Solexa instrument can be modeled in the LIMS system directly by users. This community approach ensures that the SolexaLIMS system meets users needs in the future. Another major challenge for the platform, and other similar devices, is the large amount of data produced with each run. With over 1/2TB of data produced per experiment, the need to develop more efficient storage techniques is a top concern. This is a current focus of the SolexaLIMS project and the development of a NetCDF-based file format for intensity scores and a 2BIT format for sequence data produced is a 5

6 near-term goal. Finally, interoperability with existing systems is another key concern for the effective distribution of sequence results from the Solexa system. Current development efforts are focused on integrating gene expression counting information, raw sequences, and sequence alignments from Solexa experiments into the Celsius pipeline. This introduces the possibility of distributing results via the ubiquitous DAS protocol which would greatly enhance the ability to share Solexa data with other parties. Availability and Requirements The Nelson lab open source project provides community mailing lists where users can discuss and contribute changes to the SolexaLIMS system. This website provides access to the sourcecode, user forums, and documentation to setup and use the SolexaLIMS system. We fully intended to respond to user feedback and integrate suggestions and patches from end users that are active participants in the development process. In order to use Solexa data the image files need to be analyzed to produce sequence. The SolexaLIMS system uses the pipeline provided by Solexa to accomplish this. The pipeline is configurable to run on a Sun grid engine (SGE cluster) using standard Linux tools. Processing time greatly depends on the hardware and network capacity connection computers. With a such large collection of files, moving them efficiently across the network can be a major bottleneck. In our setup we use a temporary processing diskspace and 8 cluster nodes to run the Solexa pipeline. Each cluster node is powered by a 2.? GHz Intel processor verify, fill in and contains 8GB of memory. Processing a complete Solexa dataset requires approximately 22 hours of compute time. Further parallelization of the pipeline is limited by the input and output (I/O) bottlenecks at the temporary diskspace storage level. Transfer times to move a complete Solexa dataset onto this temporary cluster space are approximately for a complete run. Give our real-world performance benchmarking local disk I/O and network transfer times are the major limiting factor affecting the processing of raw images to base calls using the Solexa pipeline. Authors contributions B.O. is the author of the SolexaLIMS web application and pipeline with contributions by J.M. and B.B. Acknowledgments Grants... 6

7 Figures Figure 1 - SolexaLIMS Flow Overview SolexaLIMS is layered on top of the processing pipeline that transforms raw images from the instrument to finished sequence. It manages sample annotations via a web interface and coordinates additional analysis and reporting via a centralized database. Figure 2 - Genomic Alignment Report A genomic alignment produces reports on potential polymorphisms and overall sequence coverage that can easily be viewed in the UCSC browser. The use of standardized output formats and visualization streamlines analysis. 7

8 LIMS Website Solexa Sequencer store annoations Solexa Images LIMS DB Solexa pipeline Solexa Sequence Genomic or cdna genomic alignment cdna alignment alignment report mrna counts Figure 1: An overview of the SolexaLIMS system. 8

9 Figure 2: An example report showing polymorphisms identified by the genomic sequencing of KCNJ12. 9

Bioinformatics Services for HT Sequencing

Bioinformatics Services for HT Sequencing Bioinformatics Services for HT Sequencing Tyler Backman, Rebecca Sun, Thomas Girke December 19, 2008 Bioinformatics Services for HT Sequencing Slide 1/18 Introduction People Service Overview and Rates

More information

Bioinformatics in next generation sequencing projects

Bioinformatics in next generation sequencing projects Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet March 2011 Once sequenced the problem becomes computational

More information

Galaxy Platform For NGS Data Analyses

Galaxy Platform For NGS Data Analyses Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory Collaboratory Workshops Workshop Outline ü Day 1 UCLA galaxy and user account

More information

Unix tutorial, tome 5: deep-sequencing data analysis

Unix tutorial, tome 5: deep-sequencing data analysis Unix tutorial, tome 5: deep-sequencing data analysis by Hervé December 8, 2008 Contents 1 Input files 2 2 Data extraction 3 2.1 Overview, implicit assumptions.............................. 3 2.2 Usage............................................

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

Genome Browsers - The UCSC Genome Browser

Genome Browsers - The UCSC Genome Browser Genome Browsers - The UCSC Genome Browser Background The UCSC Genome Browser is a well-curated site that provides users with a view of gene or sequence information in genomic context for a specific species,

More information

Illumina Next Generation Sequencing Data analysis

Illumina Next Generation Sequencing Data analysis Illumina Next Generation Sequencing Data analysis Chiara Dal Fiume Sr Field Application Scientist Italy 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life,

More information

User Manual. Ver. 3.0 March 19, 2012

User Manual. Ver. 3.0 March 19, 2012 User Manual Ver. 3.0 March 19, 2012 Table of Contents 1. Introduction... 2 1.1 Rationale... 2 1.2 Software Work-Flow... 3 1.3 New in GenomeGems 3.0... 4 2. Software Description... 5 2.1 Key Features...

More information

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) Genome Informatics (Part 1) https://bioboot.github.io/bggn213_f17/lectures/#14 Dr. Barry Grant Nov 2017 Overview: The purpose of this lab session is

More information

NGS Data Visualization and Exploration Using IGV

NGS Data Visualization and Exploration Using IGV 1 What is Galaxy Galaxy for Bioinformaticians Galaxy for Experimental Biologists Using Galaxy for NGS Analysis NGS Data Visualization and Exploration Using IGV 2 What is Galaxy Galaxy for Bioinformaticians

More information

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1- 1. PURPOSE To provide instructions for finding rs Numbers (SNP database ID numbers) and increasing sequence length by utilizing the UCSC Genome Bioinformatics Database. 2. MATERIALS 2.1. Sequence Information

More information

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi SEASHORE SARUMAN Summary 1 / 24 SEASHORE / SARUMAN Short Read Matching using GPU Programming Tobias Jakobi Center for Biotechnology (CeBiTec) Bioinformatics Resource Facility (BRF) Bielefeld University

More information

ChIP-seq (NGS) Data Formats

ChIP-seq (NGS) Data Formats ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/

More information

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi

More information

ChIP-Seq Tutorial on Galaxy

ChIP-Seq Tutorial on Galaxy 1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data

More information

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF RNA-Seq in Galaxy: Tuxedo protocol Igor Makunin, UQ RCC, QCIF Acknowledgments Genomics Virtual Lab: gvl.org.au Galaxy for tutorials: galaxy-tut.genome.edu.au Galaxy Australia: galaxy-aust.genome.edu.au

More information

Advanced UCSC Browser Functions

Advanced UCSC Browser Functions Advanced UCSC Browser Functions Dr. Thomas Randall tarandal@email.unc.edu bioinformatics.unc.edu UCSC Browser: genome.ucsc.edu Overview Custom Tracks adding your own datasets Utilities custom tools for

More information

Galaxy workshop at the Winter School Igor Makunin

Galaxy workshop at the Winter School Igor Makunin Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis

More information

Data Curation Profile Human Genomics

Data Curation Profile Human Genomics Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date

More information

LONI IMAGE & DATA ARCHIVE USER MANUAL

LONI IMAGE & DATA ARCHIVE USER MANUAL LONI IMAGE & DATA ARCHIVE USER MANUAL Laboratory of Neuro Imaging Dr. Arthur W. Toga, Director April, 2017 LONI Image & Data Archive INTRODUCTION The LONI Image & Data Archive (IDA) is a user-friendly

More information

Genome Browsers Guide

Genome Browsers Guide Genome Browsers Guide Take a Class This guide supports the Galter Library class called Genome Browsers. See our Classes schedule for the next available offering. If this class is not on our upcoming schedule,

More information

Shaking-and-Baking on a Grid

Shaking-and-Baking on a Grid Shaking-and-Baking on a Grid Russ Miller & Mark Green Center for Computational Research, SUNY-Buffalo Hauptman-Woodward Medical Inst NSF ITR ACI-02-04918 University at Buffalo The State University of New

More information

DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure

DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure TM DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure About DRAGEN Edico Genome s DRAGEN TM (Dynamic Read Analysis for GENomics) Bio-IT Platform provides ultra-rapid secondary analysis of

More information

Genome Browser. Background and Strategy

Genome Browser. Background and Strategy Genome Browser Background and Strategy Contents What is a genome browser? Purpose of a genome browser Examples Structure Extra Features Contents What is a genome browser? Purpose of a genome browser Examples

More information

Analysis of ChIP-seq data

Analysis of ChIP-seq data Before we start: 1. Log into tak (step 0 on the exercises) 2. Go to your lab space and create a folder for the class (see separate hand out) 3. Connect to your lab space through the wihtdata network and

More information

Two Examples of Datanomic. David Du Digital Technology Center Intelligent Storage Consortium University of Minnesota

Two Examples of Datanomic. David Du Digital Technology Center Intelligent Storage Consortium University of Minnesota Two Examples of Datanomic David Du Digital Technology Center Intelligent Storage Consortium University of Minnesota Datanomic Computing (Autonomic Storage) System behavior driven by characteristics of

More information

Genomic Analysis with Genome Browsers.

Genomic Analysis with Genome Browsers. Genomic Analysis with Genome Browsers http://barc.wi.mit.edu/hot_topics/ 1 Outline Genome browsers overview UCSC Genome Browser Navigating: View your list of regions in the browser Available tracks (eg.

More information

HPC Current Development in Indonesia. Dr. Bens Pardamean Bina Nusantara University Indonesia

HPC Current Development in Indonesia. Dr. Bens Pardamean Bina Nusantara University Indonesia HPC Current Development in Indonesia Dr. Bens Pardamean Bina Nusantara University Indonesia HPC Facilities Educational & Research Institutions in Indonesia CIBINONG SITE Basic Nodes: 80 node 2 processors

More information

Cloudian Sizing and Architecture Guidelines

Cloudian Sizing and Architecture Guidelines Cloudian Sizing and Architecture Guidelines The purpose of this document is to detail the key design parameters that should be considered when designing a Cloudian HyperStore architecture. The primary

More information

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST

Wilson Leung 05/27/2008 A Simple Introduction to NCBI BLAST A Simple Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at http://www.ncbi.nih.gov/blast/

More information

Tutorial 1: Exploring the UCSC Genome Browser

Tutorial 1: Exploring the UCSC Genome Browser Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.

More information

Supplementary Information. Detecting and annotating genetic variations using the HugeSeq pipeline

Supplementary Information. Detecting and annotating genetic variations using the HugeSeq pipeline Supplementary Information Detecting and annotating genetic variations using the HugeSeq pipeline Hugo Y. K. Lam 1,#, Cuiping Pan 1, Michael J. Clark 1, Phil Lacroute 1, Rui Chen 1, Rajini Haraksingh 1,

More information

Decrypting your genome data privately in the cloud

Decrypting your genome data privately in the cloud Decrypting your genome data privately in the cloud Marc Sitges Data Manager@Made of Genes @madeofgenes The Human Genome 3.200 M (x2) Base pairs (bp) ~20.000 genes (~30%) (Exons ~1%) The Human Genome Project

More information

Design and Annotation Files

Design and Annotation Files Design and Annotation Files Release Notes SeqCap EZ Exome Target Enrichment System The design and annotation files provide information about genomic regions covered by the capture probes and the genes

More information

ScalaIOTrace: Scalable I/O Tracing and Analysis

ScalaIOTrace: Scalable I/O Tracing and Analysis ScalaIOTrace: Scalable I/O Tracing and Analysis Karthik Vijayakumar 1, Frank Mueller 1, Xiaosong Ma 1,2, Philip C. Roth 2 1 Department of Computer Science, NCSU 2 Computer Science and Mathematics Division,

More information

How to use earray to create custom content for the SureSelect Target Enrichment platform. Page 1

How to use earray to create custom content for the SureSelect Target Enrichment platform. Page 1 How to use earray to create custom content for the SureSelect Target Enrichment platform Page 1 Getting Started Access earray Access earray at: https://earray.chem.agilent.com/earray/ Log in to earray,

More information

ZFS for NGS data analysis

ZFS for NGS data analysis ZFS for NGS data analysis saving space from the galactic expansion Davide Cittaro - Cogentech (Milan, Italy) Galaxy DevCon 2010 - CHSL NY Motivation Motivation Deploy Galaxy to serve a small NGS facility

More information

Galaxy. Daniel Blankenberg The Galaxy Team

Galaxy. Daniel Blankenberg The Galaxy Team Galaxy Daniel Blankenberg The Galaxy Team http://galaxyproject.org Overview What is Galaxy? What you can do in Galaxy analysis interface, tools and datasources data libraries workflows visualization sharing

More information

Performance analysis of parallel de novo genome assembly in shared memory system

Performance analysis of parallel de novo genome assembly in shared memory system IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Performance analysis of parallel de novo genome assembly in shared memory system To cite this article: Syam Budi Iryanto et al 2018

More information

High-throughout sequencing and using short-read aligners. Simon Anders

High-throughout sequencing and using short-read aligners. Simon Anders High-throughout sequencing and using short-read aligners Simon Anders High-throughput sequencing (HTS) Sequencing millions of short DNA fragments in parallel. a.k.a.: next-generation sequencing (NGS) massively-parallel

More information

WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server Mohamed Abouelhoda Nile University

WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server Mohamed Abouelhoda Nile University WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server 2008 joint project between Nile University, Microsoft Egypt, and Cairo Microsoft Innovation Center Mohamed Abouelhoda

More information

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment Case Study Order Number: 334534-002US Ordering Information Contact your local Intel sales representative for ordering

More information

Utilizing Databases in Grid Engine 6.0

Utilizing Databases in Grid Engine 6.0 Utilizing Databases in Grid Engine 6.0 Joachim Gabler Software Engineer Sun Microsystems http://sun.com/grid Current status flat file spooling binary format for jobs ASCII format for other objects accounting

More information

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The

More information

Min Wang. April, 2003

Min Wang. April, 2003 Development of a co-regulated gene expression analysis tool (CREAT) By Min Wang April, 2003 Project Documentation Description of CREAT CREAT (coordinated regulatory element analysis tool) are developed

More information

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar

ClinVar. Jennifer Lee, PhD, NCBI/NLM/NIH ClinVar ClinVar What is ClinVar ClinVar is a freely available, central archive for associating observed variation with supporting clinical and experimental evidence for a wide range of disorders. The database

More information

Part 1: How to use IGV to visualize variants

Part 1: How to use IGV to visualize variants Using IGV to identify true somatic variants from the false variants http://www.broadinstitute.org/igv A FAQ, sample files and a user guide are available on IGV website If you use IGV in your publication:

More information

ArcSDE 8.1 Questions and Answers

ArcSDE 8.1 Questions and Answers ArcSDE 8.1 Questions and Answers 1. What is ArcSDE 8.1? ESRI ArcSDE software is the GIS gateway that facilitates managing spatial data in a database management system (DBMS). ArcSDE allows you to manage

More information

Single Pass, BLAST-like, Approximate String Matching on FPGAs*

Single Pass, BLAST-like, Approximate String Matching on FPGAs* Single Pass, BLAST-like, Approximate String Matching on FPGAs* Martin Herbordt Josh Model Yongfeng Gu Bharat Sukhwani Tom VanCourt Computer Architecture and Automated Design Laboratory Department of Electrical

More information

Computational Detection of CPE Elements Within DNA Sequences

Computational Detection of CPE Elements Within DNA Sequences Computational Detection of CPE Elements Within DNA Sequences Report dated 19 July 2006 Author: Ashutosh Koparkar Graduate Student, CECS Dept., University of Louisville, KY Advisor: Dr. Eric C. Rouchka

More information

Overview. Dataset: testpos DNA: CCCATGGTCGGGGGGGGGGAGTCCATAACCC Num exons: 2 strand: + RNA (from file): AUGGUCAGUCCAUAA peptide (from file): MVSP*

Overview. Dataset: testpos DNA: CCCATGGTCGGGGGGGGGGAGTCCATAACCC Num exons: 2 strand: + RNA (from file): AUGGUCAGUCCAUAA peptide (from file): MVSP* Overview In this homework, we will write a program that will print the peptide (a string of amino acids) from four pieces of information: A DNA sequence (a string). The strand the gene appears on (a string).

More information

IBM Tivoli Storage Manager for AIX Version Installation Guide IBM

IBM Tivoli Storage Manager for AIX Version Installation Guide IBM IBM Tivoli Storage Manager for AIX Version 7.1.3 Installation Guide IBM IBM Tivoli Storage Manager for AIX Version 7.1.3 Installation Guide IBM Note: Before you use this information and the product it

More information

How to store and visualize RNA-seq data

How to store and visualize RNA-seq data How to store and visualize RNA-seq data Gabriella Rustici Functional Genomics Group gabry@ebi.ac.uk EBI is an Outstation of the European Molecular Biology Laboratory. Talk summary How do we archive RNA-seq

More information

CLC Server. End User USER MANUAL

CLC Server. End User USER MANUAL CLC Server End User USER MANUAL Manual for CLC Server 10.0.1 Windows, macos and Linux March 8, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark

More information

LA-UR Approved for public release; distribution is unlimited.

LA-UR Approved for public release; distribution is unlimited. LA-UR-15-27727 Approved for public release; distribution is unlimited. Title: Survey and Analysis of Multiresolution Methods for Turbulence Data Author(s): Pulido, Jesus J. Livescu, Daniel Woodring, Jonathan

More information

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines 454 GS Junior,

More information

Integrated Genome browser (IGB) installation

Integrated Genome browser (IGB) installation Integrated Genome browser (IGB) installation Navigate to the IGB download page http://bioviz.org/igb/download.html You will see three icons for download: The three icons correspond to different memory

More information

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University

Lightweight Streaming-based Runtime for Cloud Computing. Shrideep Pallickara. Community Grids Lab, Indiana University Lightweight Streaming-based Runtime for Cloud Computing granules Shrideep Pallickara Community Grids Lab, Indiana University A unique confluence of factors have driven the need for cloud computing DEMAND

More information

Exercise 2: Browser-Based Annotation and RNA-Seq Data

Exercise 2: Browser-Based Annotation and RNA-Seq Data Exercise 2: Browser-Based Annotation and RNA-Seq Data Jeremy Buhler July 24, 2018 This exercise continues your introduction to practical issues in comparative annotation. You ll be annotating genomic sequence

More information

SPAR outputs and report page

SPAR outputs and report page SPAR outputs and report page Landing results page (full view) Landing results / outputs page (top) Input files are listed Job id is shown Download all tables, figures, tracks as zip Percentage of reads

More information

Welcome to GenomeView 101!

Welcome to GenomeView 101! Welcome to GenomeView 101! 1. Start your computer 2. Download and extract the example data http://www.broadinstitute.org/~tabeel/broade.zip Suggestion: - Linux, Mac: make new folder in your home directory

More information

Introduction to Genome Browsers

Introduction to Genome Browsers Introduction to Genome Browsers Rolando Garcia-Milian, MLS, AHIP (Rolando.milian@ufl.edu) Department of Biomedical and Health Information Services Health Sciences Center Libraries, University of Florida

More information

PERFORMANCE STUDY OCTOBER 2017 ORACLE MONSTER VIRTUAL MACHINE PERFORMANCE. VMware vsphere 6.5

PERFORMANCE STUDY OCTOBER 2017 ORACLE MONSTER VIRTUAL MACHINE PERFORMANCE. VMware vsphere 6.5 PERFORMANCE STUDY OCTOBER 2017 ORACLE MONSTER VIRTUAL MACHINE PERFORMANCE VMware vsphere 6.5 Table of Contents Executive Summary...3 Introduction...3 Test Environment... 4 Test Workload... 5 Virtual Machine

More information

!"#$%&$'()#$*)+,-./).01"0#,23+3,303456"6,&((46,7$+-./&((468,

!#$%&$'()#$*)+,-./).010#,23+3,3034566,&((46,7$+-./&((468, !"#$%&$'()#$*)+,-./).01"0#,23+3,303456"6,&((46,7$+-./&((468, 9"(1(02)1+(',:.;.4(*.',?9@A,!."2.4B.'#A,C(;.

More information

RNA-Seq Analysis With the Tuxedo Suite

RNA-Seq Analysis With the Tuxedo Suite June 2016 RNA-Seq Analysis With the Tuxedo Suite Dena Leshkowitz Introduction In this exercise we will learn how to analyse RNA-Seq data using the Tuxedo Suite tools: Tophat, Cuffmerge, Cufflinks and Cuffdiff.

More information

Genome Browser. Background & Strategy. Spring 2017 Faction II

Genome Browser. Background & Strategy. Spring 2017 Faction II Genome Browser Background & Strategy Spring 2017 Faction II Outline Beginning of the Last Phase Goals State of Art Applicable Genome Browsers Not So Genome Browsers Storing Data Strategy for the website

More information

NextGenMap and the impact of hhighly polymorphic regions. Arndt von Haeseler

NextGenMap and the impact of hhighly polymorphic regions. Arndt von Haeseler NextGenMap and the impact of hhighly polymorphic regions Arndt von Haeseler Joint work with: The Technological Revolution Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program

More information

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.

More information

Genomics 92 (2008) Contents lists available at ScienceDirect. Genomics. journal homepage:

Genomics 92 (2008) Contents lists available at ScienceDirect. Genomics. journal homepage: Genomics 92 (2008) 75 84 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno Review UCSC genome browser tutorial Ann S. Zweig a,, Donna Karolchik a, Robert

More information

UNIFIED MANAGEMENT OF CONVERGED VOICE, DATA, AND VIDEO TECHNOLOGIES WITH AUTOMATED SUBSCRIBER AND SERVICE PROVISIONING

UNIFIED MANAGEMENT OF CONVERGED VOICE, DATA, AND VIDEO TECHNOLOGIES WITH AUTOMATED SUBSCRIBER AND SERVICE PROVISIONING 01010101000101010 10001010010001001 ZMS UNIFIED MANAGEMENT OF CONVERGED VOICE, DATA, AND VIDEO TECHNOLOGIES WITH AUTOMATED SUBSCRIBER AND SERVICE PROVISIONING SINGLE MANAGEMENT SYSTEM FOR THE ENTIRE LOCAL

More information

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome.

Supplementary Figure 1. Fast read-mapping algorithm of BrowserGenome. Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome. (a) Indexing strategy: The genome sequence of interest is divided into non-overlapping 12-mers. A Hook table is generated that contains

More information

Genetics 211 Genomics Winter 2014 Problem Set 4

Genetics 211 Genomics Winter 2014 Problem Set 4 Genomics - Part 1 due Friday, 2/21/2014 by 9:00am Part 2 due Friday, 3/7/2014 by 9:00am For this problem set, we re going to use real data from a high-throughput sequencing project to look for differential

More information

Reduced regulatory compliance violations/fines

Reduced regulatory compliance violations/fines PSS ODMS 1.1 About PSS ODMS PSS ODMS is a multi-purpose software product for electrical power transmission system planners and operators. The software is currently used by power companies around the globe

More information

How Smarter Systems Deliver Smarter Economics and Optimized Business Continuity

How Smarter Systems Deliver Smarter Economics and Optimized Business Continuity 9-November-2010 Singapore How Smarter Systems Deliver Smarter Economics and Optimized Business Continuity Shiva Anand Neiker Storage Sales Leader STG ASEAN How Smarter Systems Deliver Smarter Economics

More information

High-Performance Algorithm Engineering for Computational Phylogenetics

High-Performance Algorithm Engineering for Computational Phylogenetics High-Performance Algorithm Engineering for Computational Phylogenetics Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 High-Performance

More information

Fusion Detection Using QIAseq RNAscan Panels

Fusion Detection Using QIAseq RNAscan Panels Fusion Detection Using QIAseq RNAscan Panels June 11, 2018 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com ts-bioinformatics@qiagen.com

More information

Data management for Proteomics ABRF 2005

Data management for Proteomics ABRF 2005 MASCOTIntegra Data management for Proteomics 1 Mascot Integra: Data management for proteomics What is Mascot Integra? What Mascot Integra is not Security and Electronic signatures in Mascot Integra Instrument

More information

Prof. Konstantinos Krampis Office: Rm. 467F Belfer Research Building Phone: (212) Fax: (212)

Prof. Konstantinos Krampis Office: Rm. 467F Belfer Research Building Phone: (212) Fax: (212) Director: Prof. Konstantinos Krampis agbiotec@gmail.com Office: Rm. 467F Belfer Research Building Phone: (212) 396-6930 Fax: (212) 650 3565 Facility Consultant:Carlos Lijeron 1/8 carlos@carotech.com Office:

More information

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation

More information

LVis. Counting Laboratory Application Manager for GammaVision

LVis. Counting Laboratory Application Manager for GammaVision Counting Laboratory Application Manager for GammaVision Streamlines Counting Laboratory operation Administrator and operator modes Detector and Sample user focus Easy installation and easily learned All

More information

Practical Course in Genome Bioinformatics

Practical Course in Genome Bioinformatics Practical Course in Genome Bioinformatics 20/01/2017 Exercises - Day 1 http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017/ Answer questions Q1-Q3 below and include requested Figures 1-5

More information

TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq

TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq TECH NOTE Improving the Sensitivity of Ultra Low Input mrna Seq SMART Seq v4 Ultra Low Input RNA Kit for Sequencing Powered by SMART and LNA technologies: Locked nucleic acid technology significantly improves

More information

Analysis of high-throughput sequencing data. Simon Anders EBI

Analysis of high-throughput sequencing data. Simon Anders EBI Analysis of high-throughput sequencing data Simon Anders EBI Outline Overview on high-throughput sequencing (HTS) technologies, focusing on Solexa's GenomAnalyzer as example Software requirements to works

More information

MySQL and Virtualization Guide

MySQL and Virtualization Guide MySQL and Virtualization Guide Abstract This is the MySQL and Virtualization extract from the MySQL Reference Manual. For legal information, see the Legal Notices. For help with using MySQL, please visit

More information

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight

Resequencing Analysis. (Pseudomonas aeruginosa MAPO1 ) Sample to Insight Resequencing Analysis (Pseudomonas aeruginosa MAPO1 ) 1 Workflow Import NGS raw data Trim reads Import Reference Sequence Reference Mapping QC on reads Variant detection Case Study Pseudomonas aeruginosa

More information

iloci software is used to calculate the gene-gene interactions from GWAS data. This software was implemented by the OpenCL framework.

iloci software is used to calculate the gene-gene interactions from GWAS data. This software was implemented by the OpenCL framework. iloci software iloci software is used to calculate the gene-gene interactions from GWAS data. This software was implemented by the OpenCL framework. Software requirements : 1. Linux or Mac operating system

More information

Oracle Utilities Smart Grid Gateway

Oracle Utilities Smart Grid Gateway Oracle Utilities Smart Grid Gateway Quick Install Guide Release 2.1.0 Service Pack 3 E41189-06 May 2015 E41189-06 Copyright 2011, 2015, Oracle and/or its affiliates. All rights reserved. This software

More information

Core Lab LIMS. Quick Reference Guide

Core Lab LIMS. Quick Reference Guide Core Lab LIMS Quick Reference Guide May 2005 Table of Contents Introduction... 1 Improved customer service... 1 Improved accountability... 1 We need your feedback... 1 Create and Maintain Projects...

More information

Data Management at CHESS

Data Management at CHESS Data Management at CHESS Marian Szebenyi 1 Outline Background Big Data at CHESS CHESS-DAQ What our users say Conclusions 2 CHESS and MacCHESS CHESS: National synchrotron facility, 11 stations (NSF $) CHESS

More information

GenomeStudio Software Release Notes

GenomeStudio Software Release Notes GenomeStudio Software 2009.2 Release Notes 1. GenomeStudio Software 2009.2 Framework... 1 2. Illumina Genome Viewer v1.5...2 3. Genotyping Module v1.5... 4 4. Gene Expression Module v1.5... 6 5. Methylation

More information

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Sep. Guide.  Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Sep 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Corp. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated

More information

Enabling Science Through Cyber Security At 100G

Enabling Science Through Cyber Security At 100G Enabling Science Through Cyber Security At 100G Submitted by: Rosio Alvarez, Ph.D. Chief Information Officer, Berkeley Lab RAlvarez@lbl.gov Project team: IT Division, Cyber Security Team Aashish Sharma

More information

COL862 Programming Assignment-1

COL862 Programming Assignment-1 Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,

More information

User's guide to ChIP-Seq applications: command-line usage and option summary

User's guide to ChIP-Seq applications: command-line usage and option summary User's guide to ChIP-Seq applications: command-line usage and option summary 1. Basics about the ChIP-Seq Tools The ChIP-Seq software provides a set of tools performing common genome-wide ChIPseq analysis

More information

Analyzing Variant Call results using EuPathDB Galaxy, Part II

Analyzing Variant Call results using EuPathDB Galaxy, Part II Analyzing Variant Call results using EuPathDB Galaxy, Part II In this exercise, we will work in groups to examine the results from the SNP analysis workflow that we started yesterday. The first step is

More information

QIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL

QIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL QIAseq Targeted RNAscan Panel Analysis Plugin USER MANUAL User manual for QIAseq Targeted RNAscan Panel Analysis 0.5.2 beta 1 Windows, Mac OS X and Linux February 5, 2018 This software is for research

More information

Appendix to The Health of Software Engineering Research

Appendix to The Health of Software Engineering Research Appendix to The Health of Software Engineering Research David Lo School of Information Systems Singapore Management University Singapore davidlo@smu.edu.sg Nachiappan Nagappan and Thomas Zimmermann Research

More information

BovineMine Documentation

BovineMine Documentation BovineMine Documentation Release 1.0 Deepak Unni, Aditi Tayal, Colin Diesh, Christine Elsik, Darren Hag Oct 06, 2017 Contents 1 Tutorial 3 1.1 Overview.................................................

More information

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course?

Ioan Raicu. Everyone else. More information at: Background? What do you want to get out of this course? Ioan Raicu More information at: http://www.cs.iit.edu/~iraicu/ Everyone else Background? What do you want to get out of this course? 2 Data Intensive Computing is critical to advancing modern science Applies

More information

GAIA CU6 Bruxelles Meeting (12-13 october 2006)

GAIA CU6 Bruxelles Meeting (12-13 october 2006) GAIA CU6 Bruxelles Meeting (12-13 october 2006) Preparation of CNES DPC Infrastructure Technology studies prepared by F. Jocteur Monrozier Context: GAIA CNES Infrastructure: Functional blocks import /

More information