Hinri Kerstens. NGS pipeline using Broad's Cromwell
|
|
- Florence Franklin
- 5 years ago
- Views:
Transcription
1 Hinri Kerstens NGS pipeline using Broad's Cromwell
2 Introduction Princess Máxima Center is a organization fully specialized in pediatric oncology. By combining the best possible research and care, we will be able to heal more children in the future. That is our mission! Currently about 150 cancer patients are treated on yearly basis. From 2018 onward we are expecting a total of ~700 patients a year: ~550 new cases and ~ relapses. For all patients we will perform genome sequencing on primary/ metastatic tumors and controls. Targeted sequencing depths range from 50X for normal to 150x for tumors.
3 Genomics environment External Data End User interfaces Curation / Morphing Repositories/Archival Lab Experiments LIMS / Metadata Clinical Data API calls Storage and workflow infrastructure
4 Workflow infrastructure Persistent analyses and scale out to the cloud API Workflow execution service + database
5 Analyses Workflows Until recently, analyses workflows had major shortcomings not readable for human beings backend specific (not very well portable) laborious to write/implement/maintain/debug With the advance of human readable Workflow Languages and their executers this has changed.. Now a audience can build robust pipelines that understand things like parallelism, dependencies of inputs and outputs between tasks, and resume intelligently if they get interrupted.
6 Supported Parallelisms Users are not required to understand parallelisms in detail, but just can make use of them.
7 Workflow and tasks Workflow = calls to set of tasks The order in which the workflow block, task definitions, or call task statements are arranged does not matter. The are inferred by the executer based in the task in and outputs. Task components
8 Task level variables A task can accept variables from a call Define variables in the workflow and apply them in the task call statements java -jar wdltool.jar validate myworkflow.wdl
9 Workflow inputs for a particular run { "myworkflowname.my_input": "~/path/to/input.bam, "myworkflowname.name": "NA12878, "myworkflowname.my_ref": ~/path/to/grch38.fa }
10 Validate syntax and run java jar wdltool.jar validate myworkflow.wdl java -jar Cromwell.jar run myworkflow.wdl myworkflow_inputs.json
11 Progress status log cromwell-executions 3096c720-a b-a348-ea4dd99118bb call-getbwaversion rc script script.submit stderr stdout version call-getpicardversion rc script call-samtofastqandbwamem
12 Progress and status log scatter cromwell-executions 3096c720-a b-a348-ea4dd99118bb call-getbwaversion rc.. call-getpicardversion rc call-samtofastqandbwamem shard-0 rc script script.submit stderr stdout shard-1 rc.. shard-2
13 Progress and status log inputs outputs cromwell-executions 3096c720-a b-a348-ea4dd99118bb call-getbwaversion call-getpicardversion call-samtofastqandbwamem shard-0 rc script script.submit stderr stdout C2CPVACXX_0_4_none.unmerged.bam C2CPVACXX_0_4_none.unmerged.bwa.stderr.log /path/to/c2cpvacxx_0_4_none.bam (ubam input) Homo_sapiens_assembly38.dict Homo_sapiens_assembly38.fasta These are copies! shard-1
14 Workflow options Persistent workflow results Enables resume of failed workflows (not getting to work yet) Lots of well structures workflow logging database { // This specifies which database to use config = main.mysql main { mysql { driver = "slick.driver.mysqldriver$" db { driver = "com.mysql.jdbc.driver" url = "jdbc:mysql://localhost:3306/cromwell_m" user = "cromwell" password = "cromwell" connectiontimeout = 5000 } }
15 Cromwell database (job store)
16 Adaptation to our HPC environment On big machines running the docker image provided by Broad Institute, example workflows will run without modifications Management of field specific software missing: LMOD environmental module system Job Scheduler, resources missing: Management of requestable resources: cores, memory, runtime, scratch
17 Software modules # Read unmapped BAM, convert on-the-fly to FASTQ and stream to BWA MEM for alignment task SamToFastqAndBwaMem { String module_java_version String module_picard_version String module_bwa_version String module_samtools_version File input_bam command <<< module load ${module_java_version} module load ${module_picard_version} module load ${module_bwa_version} module load ${module_samtools_version} java -Xmx4G -jar $PICARD \ SamToFastq \ INPUT=${input_bam} \ FASTQ=/dev/stdout \ INTERLEAVE=true \ NON_PF=true \ ${bwa_commandline} /dev/stdin - \ samtools view -1 - > ${output_bam_basename}.bam >>> # WORKFLOW DEFINITION Workflow myworkflownmae { String module_bwa_version String module_java_version String module_picard_version String module_samtools_version # Map reads to reference call SamToFastqAndBwaMem { input: module_java_version = module_java_version, module_picard_version = module_picard_version, module_bwa_version = module_bwa_version, module_samtools_version = module_samtools_version,
18 Software versions as workflow inputs myworkflowname.inputs.json { "##_COMMENT6": "MODULES", "TestFlow.module_bwa_version": "bwa/1.0", "TestFlow.module_java_version": "Java/1.8.0_60", "TestFlow.module_picard_version": "picardtools/2.5.0", "TestFlow.module_samtools_version": "samtools/1.3 }
19 Requestable resources # Sort BAM file by coordinate order and fix tag values for NM and UQ task SortAndFixSampleBam { String module_java_version String module_picard_version File input_bam Int tmp_space String wallclock command { module load ${module_java_version} module load ${module_picard_version} java -Djava.io.tmpdir=$TMPDIR -Xmx4G -jar $PICARD \ SortSam \ runtime { cpu: "2" memory: "6 GB" tmp_space: "${tmp_space}" wallclock: "${wallclock}" } Values for tmp_space and wallclock are task specific but might need modification with data size
20 Workflow inputs with task specific values { } myworkflowname.markduplicates.tmp_space": 200, myworkflowname.sortandfixsamplebam.tmp_space": 400, myworkflowname.baserecalibrator.tmp_space": 4 task MarkDuplicates { Int tmp_space command {... } } task BaseRecalibrator { Int tmp_space command {... } } task SortAndFixSampleBam { Int tmp_space command {... } } workflow myworkflowname { call MarkDuplicates {} call BaseRecalibrator {} call SortAndFixSampleBam {} }
21 Tell the executer about these backend resources backend { default = SGE SGE { config { runtime-attributes = """ Int? cpu Int? memory_gb String? tmp_space = "1" String? wallclock = "00:1:00" "" submit = """ qsub \ -terse -b n -N ${job_name} \ -wd ${cwd} \ -o ${out} \ -e ${err} \ -pe threaded ${cpu} -l h_vmem=${memory_gb}g,tmpspace=${tmp_space}g,h_rt=${wallclock} \ ${script} """
22 Cromwell server
23 Work in progress Succesfull resubmit/resumes Testing more recent version: tested: cromwell-0.20-ff3bb7a-snapshot
24 WDL features
25 Lineair chain call stepb { input: in=stepa.out } call stepc { input: in=stepb.out }
26 Multi-input/Multi-output call stepc { input: in1=stepb.out1, in2=stepb.out2 }
27 Branche & Merge call stepb { input: in=stepa.out } call stepc { input: in=stepa.out } call stepd { input: in1=stepc.out, in2=stepb.out }
28 Scatter-Gather Parallelism Array[File] inputfiles #explicit array scatter (onefile in inputfiles) { call stepa { input: in=onefile } } call stepb { input: files=stepa.out } #implicit array
29 Task Aliasing call stepa as firstsample { input: in=firstinput } call stepa as secondsample { input: in=secondinput } call stepb { input: in=firstsample.out } call stepc { input: in=secondsample.out }
Decrypting your genome data privately in the cloud
Decrypting your genome data privately in the cloud Marc Sitges Data Manager@Made of Genes @madeofgenes The Human Genome 3.200 M (x2) Base pairs (bp) ~20.000 genes (~30%) (Exons ~1%) The Human Genome Project
More informationSnakemake overview. Thomas Cokelaer. Nov 9th 2017 Snakemake and Sequana overview. Institut Pasteur
Snakemake overview Thomas Cokelaer Institut Pasteur Nov 9th 2017 Snakemake and Sequana overview Many bioinformatic pipeline frameworks available A review of bioinformatic pipeline frameworks. Jeremy Leipzig
More informationFalcon Accelerated Genomics Data Analysis Solutions. User Guide
Falcon Accelerated Genomics Data Analysis Solutions User Guide Falcon Computing Solutions, Inc. Version 1.0 3/30/2018 Table of Contents Introduction... 3 System Requirements and Installation... 4 Software
More informationAeromancer: A Workflow Manager for Large- Scale MapReduce-Based Scientific Workflows
Aeromancer: A Workflow Manager for Large- Scale MapReduce-Based Scientific Workflows Presented by Sarunya Pumma Supervisors: Dr. Wu-chun Feng, Dr. Mark Gardner, and Dr. Hao Wang synergy.cs.vt.edu Outline
More informationReads Alignment and Variant Calling
Reads Alignment and Variant Calling CB2-201 Computational Biology and Bioinformatics February 22, 2016 Emidio Capriotti http://biofold.org/ Institute for Mathematical Modeling of Biological Systems Department
More informationRead mapping with BWA and BOWTIE
Read mapping with BWA and BOWTIE Before We Start In order to save a lot of typing, and to allow us some flexibility in designing these courses, we will establish a UNIX shell variable BASE to point to
More informationPractical Linux Examples
Practical Linux Examples Processing large text file Parallelization of independent tasks Qi Sun & Robert Bukowski Bioinformatics Facility Cornell University http://cbsu.tc.cornell.edu/lab/doc/linux_examples_slides.pdf
More informationWM2 Bioinformatics. ExomeSeq data analysis part 1. Dietmar Rieder
WM2 Bioinformatics ExomeSeq data analysis part 1 Dietmar Rieder RAW data Use putty to logon to cluster.i med.ac.at In your home directory make directory to store raw data $ mkdir 00_RAW Copy raw fastq
More informationNA12878 Platinum Genome GENALICE MAP Analysis Report
NA12878 Platinum Genome GENALICE MAP Analysis Report Bas Tolhuis, PhD Jan-Jaap Wesselink, PhD GENALICE B.V. INDEX EXECUTIVE SUMMARY...4 1. MATERIALS & METHODS...5 1.1 SEQUENCE DATA...5 1.2 WORKFLOWS......5
More informationREPORT. NA12878 Platinum Genome. GENALICE MAP Analysis Report. Bas Tolhuis, PhD GENALICE B.V.
REPORT NA12878 Platinum Genome GENALICE MAP Analysis Report Bas Tolhuis, PhD GENALICE B.V. INDEX EXECUTIVE SUMMARY...4 1. MATERIALS & METHODS...5 1.1 SEQUENCE DATA...5 1.2 WORKFLOWS......5 1.3 ACCURACY
More informationSuper-Fast Genome BWA-Bam-Sort on GLAD
1 Hututa Technologies Limited Super-Fast Genome BWA-Bam-Sort on GLAD Zhiqiang Ma, Wangjun Lv and Lin Gu May 2016 1 2 Executive Summary Aligning the sequenced reads in FASTQ files and converting the resulted
More informationPreparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers
Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers Data used in the exercise We will use D. melanogaster WGS paired-end Illumina data with NCBI accessions
More informationPRACTICAL SESSION 5 GOTCLOUD ALIGNMENT WITH BWA JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR
PRACTICAL SESSION 5 GOTCLOUD ALIGNMENT WITH BWA JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR GOAL OF THIS SESSION Assuming that The audiences know how to perform GWAS
More informationVMware vrealize Code Stream Reference Architecture. 16 MAY 2017 vrealize Code Stream 2.3
VMware vrealize Code Stream Reference Architecture 16 MAY 2017 vrealize Code Stream 2.3 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If you
More informationUsing ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim
Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding
More informationCORE Year 1 Whole Genome Sequencing Final Data Format Requirements
CORE Year 1 Whole Genome Sequencing Final Data Format Requirements To all incumbent contractors of CORE year 1 WGS contracts, the following acts as the agreed to sample parameters issued by NHLBI for data
More informationTumor-Specific NeoAntigen Detector (TSNAD) v2.0 User s Manual
Tumor-Specific NeoAntigen Detector (TSNAD) v2.0 User s Manual Zhan Zhou, Xingzheng Lyu and Jingcheng Wu Zhejiang University, CHINA March, 2016 USER'S MANUAL TABLE OF CONTENTS 1 GETTING STARTED... 1 1.1
More informationBy Ludovic Duvaux (27 November 2013)
Array of jobs using SGE - an example using stampy, a mapping software. Running java applications on the cluster - merge sam files using the Picard tools By Ludovic Duvaux (27 November 2013) The idea ==========
More informationShark Cluster Overview
Shark Cluster Overview 51 Execution Nodes 1 Head Node (shark) 2 Graphical login nodes 800 Cores = slots 714 TB Storage RAW Slide 1/17 Introduction What is a High Performance Compute (HPC) cluster? A HPC
More informationDRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure
TM DRAGEN Bio-IT Platform Enabling the Global Genomic Infrastructure About DRAGEN Edico Genome s DRAGEN TM (Dynamic Read Analysis for GENomics) Bio-IT Platform provides ultra-rapid secondary analysis of
More informationGenome 373: Mapping Short Sequence Reads III. Doug Fowler
Genome 373: Mapping Short Sequence Reads III Doug Fowler What is Galaxy? Galaxy is a free, open source web platform for running all sorts of computational analyses including pretty much all of the sequencing-related
More informationHelpful Galaxy screencasts are available at:
This user guide serves as a simplified, graphic version of the CloudMap paper for applicationoriented end-users. For more details, please see the CloudMap paper. Video versions of these user guides and
More informationCalling variants in diploid or multiploid genomes
Calling variants in diploid or multiploid genomes Diploid genomes The initial steps in calling variants for diploid or multi-ploid organisms with NGS data are the same as what we've already seen: 1. 2.
More informationCopy Number Variations Detection - TD. Using Sequenza under Galaxy
Copy Number Variations Detection - TD Using Sequenza under Galaxy I. Data loading We will analyze the copy number variations of a human tumor (parotid gland carcinoma), limited to the chr17, from a WES
More informationSequencing Data. Paul Agapow 2011/02/03
Webservices for Next Generation Sequencing Data Paul Agapow 2011/02/03 Aims Assumed parameters: Must have a system for non-technical users to browse and manipulate their Next Generation Sequencing (NGS)
More informationELPREP PERFORMANCE ACROSS PROGRAMMING LANGUAGES PASCAL COSTANZA CHARLOTTE HERZEEL FOSDEM, BRUSSELS, BELGIUM, FEBRUARY 3, 2018
ELPREP PERFORMANCE ACROSS PROGRAMMING LANGUAGES PASCAL COSTANZA CHARLOTTE HERZEEL FOSDEM, BRUSSELS, BELGIUM, FEBRUARY 3, 2018 USA SAN FRANCISCO USA ORLANDO BELGIUM - HQ LEUVEN THE NETHERLANDS EINDHOVEN
More informationNext Generation Sequence Alignment on the BRC Cluster. Steve Newhouse 22 July 2010
Next Generation Sequence Alignment on the BRC Cluster Steve Newhouse 22 July 2010 Overview Practical guide to processing next generation sequencing data on the cluster No details on the inner workings
More informationExam C IBM Cloud Platform Application Development v2 Sample Test
Exam C5050 384 IBM Cloud Platform Application Development v2 Sample Test 1. What is an advantage of using managed services in IBM Bluemix Platform as a Service (PaaS)? A. The Bluemix cloud determines the
More informationVMware vrealize Code Stream Reference Architecture. 12 APRIL 2018 vrealize Code Stream 2.4
VMware vrealize Code Stream Reference Architecture 12 APRIL 2018 vrealize Code Stream 2.4 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If
More informationGalaxy workshop at the Winter School Igor Makunin
Galaxy workshop at the Winter School 2016 Igor Makunin i.makunin@uq.edu.au Winter school, UQ, July 6, 2016 Plan Overview of the Genomics Virtual Lab Introduce Galaxy, a web based platform for analysis
More informationGenomes On The Cloud GotCloud. University of Michigan Center for Statistical Genetics Mary Kate Wing Goo Jun
Genomes On The Cloud GotCloud University of Michigan Center for Statistical Genetics Mary Kate Wing Goo Jun Friday, March 8, 2013 Why GotCloud? Connects sequence analysis tools together Alignment, quality
More informationDDN s Vision for the Future of Lustre LUG2015 Robert Triendl
DDN s Vision for the Future of Lustre LUG2015 Robert Triendl 3 Topics 1. The Changing Markets for Lustre 2. A Vision for Lustre that isn t Exascale 3. Building Lustre for the Future 4. Peak vs. Operational
More informationRun Setup and Bioinformatic Analysis. Accel-NGS 2S MID Indexing Kits
Run Setup and Bioinformatic Analysis Accel-NGS 2S MID Indexing Kits Sequencing MID Libraries For MiSeq, HiSeq, and NextSeq instruments: Modify the config file to create a fastq for index reads Using the
More informationCloudMan cloud clusters for everyone
CloudMan cloud clusters for everyone Enis Afgan usecloudman.org This is accessibility! But only sometimes So, there are alternatives BUT WHAT IF YOU WANT YOUR OWN, QUICKLY The big picture A. Users in different
More informationMERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced
MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced Sarvani Chadalapaka HPC Administrator University of California
More informationREPRODUCIBLE NGS WORKFLOWS WITH NEXTFLOW. Paolo Di Tommaso NGS'17 - Workshop, 5 April 2017
REPRODUCIBLE NGS WORKFLOWS WITH NEXTFLOW Paolo Di Tommaso NGS'17 - Workshop, 5 April 2017 AGENDA Common problems with genomic pipelines Coffee break Quick overview of Nextflow framework How write a Nextflow
More informationName Department/Research Area Have you used the Linux command line?
Please log in with HawkID (IOWA domain) Macs are available at stations as marked To switch between the Windows and the Mac systems, press scroll lock twice 9/27/2018 1 Ben Rogers ITS-Research Services
More informationelprep: a high- performance tool for preparing SAM/BAM files for variant calling Charlo<e Herzeel (Imec) Pascal Costanza (Intel) July 2014
elprep: a high- performance tool for preparing SAM/BAM files for variant calling Charlo
More informationarxiv: v2 [q-bio.gn] 13 May 2014
BIOINFORMATICS Vol. 00 no. 00 2005 Pages 1 2 Fast and accurate alignment of long bisulfite-seq reads Brent S. Pedersen 1,, Kenneth Eyring 1, Subhajyoti De 1,2, Ivana V. Yang 1 and David A. Schwartz 1 1
More informationINTRODUCTION TO NEXTFLOW
INTRODUCTION TO NEXTFLOW Paolo Di Tommaso, CRG NETTAB workshop - Roma October 25th, 2016 @PaoloDiTommaso Research software engineer Comparative Bioinformatics, Notredame Lab Center for Genomic Regulation
More informationShark Cluster Overview
Shark Cluster Overview 51 Execution Nodes 1 Head Node (shark) 1 Graphical login node (rivershark) 800 Cores = slots 714 TB Storage RAW Slide 1/14 Introduction What is a cluster? A cluster is a group of
More informationDemultiplexing Illumina sequencing data containing unique molecular indexes (UMIs)
next generation sequencing analysis guidelines Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs) See what more we can do for you at www.idtdna.com. For Research Use Only
More informationNBIC Cloud. Mattias de Hollander David van Enckevort Leon Mei Rob Hooft
NBIC Galaxy@HPC Cloud Mattias de Hollander David van Enckevort Leon Mei Rob Hooft SURFsara HPC Cloud 19 nodes, 32 cores and 256 GB RAM each Intel 2.13 GHz 32 cores (Xeon-E7 "Westmere-EX") 400 TB storage
More information: 10961C: Automating Administration With Windows PowerShell
Module Title Duration : 10961C: Automating Administration With Windows PowerShell : 5 days About this course This course provides students with the fundamental knowledge and skills to use Windows PowerShell
More informationExome sequencing. Jong Kyoung Kim
Exome sequencing Jong Kyoung Kim Genome Analysis Toolkit The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic
More informationPresented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory CONTAINERS IN HPC WITH SINGULARITY
Presented By: Gregory M. Kurtzer HPC Systems Architect Lawrence Berkeley National Laboratory gmkurtzer@lbl.gov CONTAINERS IN HPC WITH SINGULARITY A QUICK REVIEW OF THE LANDSCAPE Many types of virtualization
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationSequence Mapping and Assembly
Practical Introduction Sequence Mapping and Assembly December 8, 2014 Mary Kate Wing University of Michigan Center for Statistical Genetics Goals of This Session Learn basics of sequence data file formats
More informationBioinformatics Framework
Persona: A High-Performance Bioinformatics Framework Stuart Byma 1, Sam Whitlock 1, Laura Flueratoru 2, Ethan Tseng 3, Christos Kozyrakis 4, Edouard Bugnion 1, James Larus 1 EPFL 1, U. Polytehnica of Bucharest
More informationNGI-RNAseq. Processing RNA-seq data at the National Genomics Infrastructure. NGI stockholm
NGI-RNAseq Processing RNA-seq data at the National Genomics Infrastructure Phil Ewels phil.ewels@scilifelab.se NBIS RNA-seq tutorial 2017-11-09 SciLifeLab NGI Our mission is to offer a state-of-the-art
More informationCorporate Training Centre (306)
Corporate Training Centre www.sbccollege.ca/corporate (306)244-6340 corporate@sbccollege.ca Automating Administration with Windows PowerShell: 10961C 5 Day Training Program November 5-9, 2018 Cost: $2,700.00
More informationIntroduction to High-Performance Computing (HPC)
Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid
More informationGEL APIs. ACGS Bioinformatics Group Meeting Aled Jones 12th June 2017
GEL APIs ACGS Bioinformatics Group Meeting Aled Jones 12th June 2017 Application Programming Interface (API) A way of accessing and interacting with an application Interact using the url eg https://bioinfo.extge.co.uk/crowdsourcing/webservices/get_panel/56fa8eb88f62030f36e3026b/
More informationConfiguring the Pipeline Docker Container
WES / WGS Pipeline Documentation This documentation is designed to allow you to set up and run the WES/WGS pipeline either on your own computer (instructions assume a Linux host) or on a Google Compute
More informationwhat is cloud computing?
what is cloud computing? (Private) Cloud Computing with Mesos at Twi9er Benjamin Hindman @benh scalable virtualized self-service utility managed elastic economic pay-as-you-go what is cloud computing?
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationNew User Seminar: Part 2 (best practices)
New User Seminar: Part 2 (best practices) General Interest Seminar January 2015 Hugh Merz merz@sharcnet.ca Session Outline Submitting Jobs Minimizing queue waits Investigating jobs Checkpointing Efficiency
More informationClearSpeed Visual Profiler
ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are
More informationEnsembl RNASeq Practical. Overview
Ensembl RNASeq Practical The aim of this practical session is to use BWA to align 2 lanes of Zebrafish paired end Illumina RNASeq reads to chromosome 12 of the zebrafish ZV9 assembly. We have restricted
More informationThe software comes with 2 installers: (1) SureCall installer (2) GenAligners (contains BWA, BWA- MEM).
Release Notes Agilent SureCall 4.0 Product Number G4980AA SureCall Client 6-month named license supports installation of one client and server (to host the SureCall database) on one machine. For additional
More informationGenomics on Cisco Metacloud + SwiftStack
Genomics on Cisco Metacloud + SwiftStack Technology is a large component of driving discovery in both research and providing timely answers for clinical treatments. Advances in genomic sequencing have
More informationCDIS Biomedical Data Commons
CDIS Biomedical Data Commons Computational Life Science Seminar Series October 18, 2017 Michael Fitzsimons Center for Data Intensive Science Agenda What is a Data Commons? Data Commons at CDIS NCI GDC
More informationProduct Page: https://digitalrevolver.com/product/automating-administration-with-windows-powershell/
Automating Administration with Windows PowerShell Course Code: Duration: 5 Days Product Page: https://digitalrevolver.com/product/automating-administration-with-windows-powershell/ This course provides
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class PBIO/BINF8350 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What
More informationOracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service
Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various
More informationScaling Slack. Bing Wei
Scaling Slack Bing Wei Infrastructure@Slack 2 3 Our Mission: To make people s working lives simpler, more pleasant, and more productive. 4 From supporting small teams To serving gigantic organizations
More informationHigh Performance Computing (HPC) Using zcluster at GACRC
High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?
More information10961C: Automating Administration with Windows PowerShell
10961C: Automating Administration with Windows Course Details Course Code: Duration: Notes: 10961C 5 days This course syllabus should be used to determine whether the course is appropriate for the students,
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is HPC Concept? What is
More informationUsing the Galaxy Local Bioinformatics Cloud at CARC
Using the Galaxy Local Bioinformatics Cloud at CARC Lijing Bu Sr. Research Scientist Bioinformatics Specialist Center for Evolutionary and Theoretical Immunology (CETI) Department of Biology, University
More informationAn Introduction to Cluster Computing Using Newton
An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.
More informationCopyright 2016 Pivotal. All rights reserved. Cloud Native Design. Includes 12 Factor Apps
1 Cloud Native Design Includes 12 Factor Apps Topics 12-Factor Applications Cloud Native Design Guidelines 2 http://12factor.net Outlines architectural principles and patterns for modern apps Focus on
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu 1 Outline What is GACRC? What is HPC Concept? What
More informationRTOG Common Data Management System Implementation. Shashi Solipuram ACR IT Tao Wang ACR IT
RTOG Common Data Management System Implementation Shashi Solipuram ACR IT Tao Wang ACR IT Radiation Therapy Oncology Group (RTOG) Implemented three trials in Medidata Rave Single and multi-step registration
More informationContent. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center
Content IBM PSSC Montpellier Customer Center MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler Control System Service Node (SN) An IBM system-p 64-bit system Control
More informationWelcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.
Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your
More informationNew High Performance Computing Cluster For Large Scale Multi-omics Data Analysis. 28 February 2018 (Wed) 2:30pm 3:30pm Seminar Room 1A, G/F
New High Performance Computing Cluster For Large Scale Multi-omics Data Analysis 28 February 2018 (Wed) 2:30pm 3:30pm Seminar Room 1A, G/F The Team (Bioinformatics & Information Technology) Eunice Kelvin
More information1. Download the data from ENA and QC it:
GenePool-External : Genome Assembly tutorial for NGS workshop 20121016 This page last changed on Oct 11, 2012 by tcezard. This is a whole genome sequencing of a E. coli from the 2011 German outbreak You
More informationUsing Galaxy to provide a NGS Analysis Platform
11/15/11 Using Galaxy to provide a NGS Analysis Platform Friedrich Miescher Institute - part of the Novartis Research Foundation - affiliated institute of Basel University - member of Swiss Institute of
More informationA Hands-On Tutorial: RNA Sequencing Using High-Performance Computing
A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:
More informationContinuous Integration and Deployment (CI/CD)
WHITEPAPER OCT 2015 Table of contents Chapter 1. Introduction... 3 Chapter 2. Continuous Integration... 4 Chapter 3. Continuous Deployment... 6 2 Chapter 1: Introduction Apcera Support Team October 2015
More informationAccelrys Pipeline Pilot and HP ProLiant servers
Accelrys Pipeline Pilot and HP ProLiant servers A performance overview Technical white paper Table of contents Introduction... 2 Accelrys Pipeline Pilot benchmarks on HP ProLiant servers... 2 NGS Collection
More informationSelenium Testing Course Content
Selenium Testing Course Content Introduction What is automation testing? What is the use of automation testing? What we need to Automate? What is Selenium? Advantages of Selenium What is the difference
More informationls /data/atrnaseq/ egrep "(fastq fasta fq fa)\.gz" ls /data/atrnaseq/ egrep "(cn ts)[1-3]ln[^3a-za-z]\."
Command line tools - bash, awk and sed We can only explore a small fraction of the capabilities of the bash shell and command-line utilities in Linux during this course. An entire course could be taught
More information[MS10961]: Automating Administration with Windows PowerShell
[MS10961]: Automating Administration with Windows PowerShell Length : 5 Days Audience(s) : IT Professionals Level : 200 Technology : Windows Server Delivery Method : Instructor-led (Classroom) Course Overview
More informationCycleServer Grid Engine Support Install Guide. version
CycleServer Grid Engine Support Install Guide version 1.34.4 Contents CycleServer Grid Engine Guide 1 Administration 1 Requirements 1 Installation 1 Monitoring Additional Grid Engine Clusters 3 Monitoring
More informationAutomating Administration with Windows PowerShell
Automating Administration with Windows PowerShell Course 10961C - Five Days - Instructor-led - Hands on Introduction This five-day, instructor-led course provides students with the fundamental knowledge
More informationHalvade: scalable sequence analysis with MapReduce
Bioinformatics Advance Access published March 26, 2015 Halvade: scalable sequence analysis with MapReduce Dries Decap 1,5, Joke Reumers 2,5, Charlotte Herzeel 3,5, Pascal Costanza, 4,5 and Jan Fostier
More informationcalled Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil
Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The
More information"Charting the Course... MOC C: Automating Administration with Windows PowerShell. Course Summary
Course Summary Description This course provides students with the fundamental knowledge and skills to use Windows PowerShell for administering and automating administration of Windows servers. This course
More informationSGE Roll: Users Guide. Version Edition
SGE Roll: Users Guide Version 4.2.1 Edition SGE Roll: Users Guide : Version 4.2.1 Edition Published Sep 2006 Copyright 2006 University of California and Scalable Systems This document is subject to the
More informationUsing Scala for building DSL s
Using Scala for building DSL s Abhijit Sharma Innovation Lab, BMC Software 1 What is a DSL? Domain Specific Language Appropriate abstraction level for domain - uses precise concepts and semantics of domain
More informationCOURSE OUTLINE: OD10961B Automating Administration with Windows PowerShell
Course Name OD10961B Automating Administration with Windows Course Duration 2 Days Course Structure Online Course Overview Learn how with Windows 4.0, you can remotely manage multiple Windows based servers
More informationImplementing the Twelve-Factor App Methodology for Developing Cloud- Native Applications
Implementing the Twelve-Factor App Methodology for Developing Cloud- Native Applications By, Janakiram MSV Executive Summary Application development has gone through a fundamental shift in the recent past.
More informationGrid Engine - A Batch System for DESY. Andreas Haupt, Peter Wegner DESY Zeuthen
Grid Engine - A Batch System for DESY Andreas Haupt, Peter Wegner 15.6.2005 DESY Zeuthen Introduction Motivations for using a batch system more effective usage of available computers (e.g. reduce idle
More informationVariation among genomes
Variation among genomes Comparing genomes The reference genome http://www.ncbi.nlm.nih.gov/nuccore/26556996 Arabidopsis thaliana, a model plant Col-0 variety is from Landsberg, Germany Ler is a mutant
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What
More informationSentieon Documentation
Sentieon Documentation Release 201808.03 Sentieon, Inc Dec 21, 2018 Sentieon Manual 1 Introduction 1 1.1 Description.............................................. 1 1.2 Benefits and Value..........................................
More informationX Grid Engine. Where X stands for Oracle Univa Open Son of more to come...?!?
X Grid Engine Where X stands for Oracle Univa Open Son of more to come...?!? Carsten Preuss on behalf of Scientific Computing High Performance Computing Scheduler candidates LSF too expensive PBS / Torque
More informationThe Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research
The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research Dr Paul Calleja Director of Research Computing University of Cambridge Global leader in science & technology
More information