Prof. Konstantinos Krampis Office: Rm. 467F Belfer Research Building Phone: (212) Fax: (212)

Similar documents
Introduction to High-Performance Computing (HPC)

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Miseq spec, process and turnaround times

Introduction to High-Performance Computing (HPC)

These will serve as a basic guideline for read prep. This assumes you have demultiplexed Illumina data.

High Performance Computing Cluster Advanced course

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

Sherlock for IBIIS. William Law Stanford Research Computing

Introduction to BioHPC New User Training

High Performance Computing Cluster Basic course

Omega: an Overlap-graph de novo Assembler for Metagenomics

Bright Cluster Manager

GPFS for Life Sciences at NERSC

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Introduction to SLURM & SLURM batch scripts

RNA-Seq in Galaxy: Tuxedo protocol. Igor Makunin, UQ RCC, QCIF

Choosing Resources Wisely. What is Research Computing?

Introduction to BioHPC New User Training

Next Generation Science and Infrastructure Support

Introduction to SLURM & SLURM batch scripts

Terabases of long-read sequence data, analysed in real time

Introduction to BioHPC

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

Performance analysis of parallel de novo genome assembly in shared memory system

SEASHORE / SARUMAN. Short Read Matching using GPU Programming. Tobias Jakobi

Introduction to RCC. September 14, 2016 Research Computing Center

Introduction to BioHPC

How to Use a Supercomputer - A Boot Camp

WinBioinfTools: Bioinformatics Tools for Windows High Performance Computing Server Mohamed Abouelhoda Nile University

Introduction to RCC. January 18, 2017 Research Computing Center

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

HIGH PERFORMANCE COMPUTING (PLATFORMS) SECURITY AND OPERATIONS

SuperMike-II Launch Workshop. System Overview and Allocations

Lustre usages and experiences

Automatic Dependency Management for Scientific Applications on Clusters. Ben Tovar*, Nicholas Hazekamp, Nathaniel Kremer-Herman, Douglas Thain

Introduction to Visualization on Stampede

Galaxy workshop at the Winter School Igor Makunin

The rcuda middleware and applications

2017 Resource Allocations Competition Results

SCALABLE HYBRID PROTOTYPE

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

Parallel Merge Sort Using MPI

Workstations & Thin Clients

Using CLC Genomics Workbench on Turing

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

How to store and visualize RNA-seq data

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

Introduction to BioHPC

Extending SLURM with Support for GPU Ranges

Experiences with HP SFS / Lustre in HPC Production

Slurm basics. Summer Kickstart June slide 1 of 49

NGI-RNAseq. Processing RNA-seq data at the National Genomics Infrastructure. NGI stockholm

GridION X5 IT and Site Requirements

RobinHood Project Status

Monash High Performance Computing

Genomic Data Analysis Services Available for PL-Grid Users

Mapping NGS reads for genomics studies

Heterogeneous Job Support

Peter Schweitzer, Director, DNA Sequencing and Genotyping Lab

1 Bull, 2011 Bull Extreme Computing

Introduction to High-Performance Computing (HPC)

Exercises: Abel/Colossus and SLURM

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

Smart Trading with Cray Systems: Making Smarter Models + Better Decisions in Algorithmic Trading

Scheduling By Trackable Resources

Two hours - online. The exam will be taken on line. This paper version is made available as a backup

Graham vs legacy systems

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

PromethION Alpha-Beta IT requirements

LBRN - HPC systems : CCT, LSU

Duke Compute Cluster Workshop. 10/04/2018 Tom Milledge rc.duke.edu

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

How to run a job on a Cluster?

SLURM Operation on Cray XT and XE

Tutorial. OTU Clustering Step by Step. Sample to Insight. March 2, 2017

Federated Cluster Support

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Compiling applications for the Cray XC

UPPMAX Introduction Martin Dahlö Valentin Georgiev

Review of Recent NGS Short Reads Alignment Tools BMI-231 final project, Chenxi Chen Spring 2014

NGS NEXT GENERATION SEQUENCING

Remote & Collaborative Visualization. Texas Advanced Computing Center

NetVault Backup Client and Server Sizing Guide 2.1

Filesystems on SSCK's HP XC6000

Sequence Analysis Pipeline

CLC Genomics Workbench. Setup and User Guide

NetVault Backup Client and Server Sizing Guide 3.0

UB CCR's Industry Cluster

INTRODUCTION TO NEXTFLOW

Colorado State University Bioinformatics Algorithms Assignment 6: Analysis of High- Throughput Biological Data Hamidreza Chitsaz, Ali Sharifi- Zarchi

Bioinformatics Services for HT Sequencing

VT EGI-ELIXIR : French node

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

The RAMDISK Storage Accelerator

Introduction to UBELIX

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Windows Compute Cluster Server 2003 allows MATLAB users to quickly and easily get up and running with distributed computing tools.

Headline in Arial Bold 30pt. Visualisation using the Grid Jeff Adie Principal Systems Engineer, SAPK July 2008

Transcription:

Director: Prof. Konstantinos Krampis agbiotec@gmail.com Office: Rm. 467F Belfer Research Building Phone: (212) 396-6930 Fax: (212) 650 3565 Facility Consultant:Carlos Lijeron 1/8

carlos@carotech.com Office: Rm. 400 Belfer Research Building Phone: (212) 396-7018 Bioinformatician:Baekdoo Kim Baekdoo.Kim59@myhunter.cuny.edu Office: Rm. 400 Belfer Research Building 2/8

Bioinformatician:Thahmina Ali Thahmina.Ali62@myhunter.cuny.edu Office: Rm. 400 Belfer Research Building Network Advisor Main Campus Dr. Lloyd Williams williams@genectr.hunter.cuny.edu Description of the Facility Background Overview The Hunter College/CTBR Bioinformatics resources is located on the 4th floor of the Belfer 3/8

Research Building at 69th Street and York Ave. The facility affords access to researchers and faculty, a high-performance computer cluster with a large range of bioinformatics software and data analysis pipelines. The facility provides cutting-edge bioinformatics technology for translational and basic research on health disparities. We also host a web-accessible bioinformatics platform based on Galaxy, (http://galaxy.hunter.cuny.edu:8080) to support genomic sequencing analysis. Additionally, the facility offers Illumina Sequencing using the Illumina MiSeq sequencing platform and Nanopore sequencing using Oxford Nanopore MinIon sequencer. Both these instruments are capable of sequencing entire complement of DNA, or genome, of many animal, plant, and microbial species for basic biological and medical research. A detailed description of our services and available equipment is given below Services - RNAseq and variation discovery small RNAs sequencing de novo bacterial genomes RNAseq Analysis Targeted amplicon sequencing Computational Capacity Scalable Storage Bioinformatics and Sequencing Resources and Equipment 4/8

Illumina MiSeq MiSeq desktop sequencer: Allows narrowly focused applications such as targeted gene sequencing, metagenetics, metagenomics, Oxford Nanopore MinIon sequencer Nanopore(real-time sequencing): MinIon portable sequencer: provides a rapid and portable, real-time sequencing platform that includes s Agilent Technologies, 2100 Electrophoresis Bioanalyzer The Agilent 2100 Bioanalyzer is a microfluidics-based platform that provides sizing, quantitation and qua Two assay principles - electrophoresis and flow cytometry Galaxy Web-accessible Bioinformatics Platform 5/8

We are running an instalation of Galaxy. a web based platform for data intensive BioMedical research http://galaxy.hunter.cuny.edu:8080 Silicon Mechanics, HPC Cluster System The high performance computing cluster provides 800 CPU cores, 3TB of high-speed RAM, a GPU Nod - Redundant Head Node, 12 CPU Cores, 64 GB RAM 10 Compute Nodes, 20 CPU Cores each, 128 GB RAM 1 Medium Memory Node, 32 CPU Cores, 512 MB RAM 1 High Memory Node, 32 CPU Cores, 1 Terabyte RAM 1 GPU Node, K80, 2 CPU Hyper-Threaded / 128 GB RA 6/8

Seagate Lustre CS1500 ClusterStor 1500 solutions feature scale-out storage building blocks, the Lustre parallel filesystem and - 362TB of parallel storage - 5GB/s throughput - Seagate Enterprise Lustre - Parallel based storage Belfer E-Box The Belfer E-box provides storage for data backup and project archiving. - 200TB of high availability storage - 5GB/s throughput Procedure For Submitting Jobs to the Cluster SLURM, Work Load Manager Jobs must be submitted to the cluster using Using SLURM, our job management platform. Vitual environments can be loaded using Conda our package manager. Once you activate your environment and you are sure your application is available, you can simply request srun or sbatch. Follow these steps to use SLURM Get a compute node assigned to you: [username@ctbr-cluster-hn1 env ]$ salloc salloc: Granted job allocation 48 7/8

Salloc requests resources form the cluster. The number 48 in the example above is the allocation n er provided to you. You can submit jobs directly to your allocation or run a batch script. Submitting a job using the command SRUN that will make use of the resources allocated. For this example we will use Bowtie2. srun bowtie2 -x Bowtie_2/hg38-1 RNAseq_sample_dat6a/adrenal_1.fastq -2 RNAseq_sample_da srun = Directs the job to the compute node assigned by salloc. bowtie2 = Main binary inside the virtual environment recently activated. Bowtie_2/hg38 = This is the reference human genome RNAseq_sample_data/adrenal_1.fastq = First input strand RNAseq_sample_data/adrenal_2.fastq = Second input strand alignment.sam = The output file will be placed in the current directory A guide on using conda and SLURM can be obtained by clicking here The Bioinformatics and Sequencing Resources is supported by a Research Centers in Minority Institutions Program grant from the National Institute on Minority Health and Health Disparities (MD007599) of the National Institutes of Health. Joomla SEO powered by JoomSEF 8/8