Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town

Size: px
Start display at page:

Download "Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town"

Transcription

1 Advanced Genomic data manipulation and Quality Control with plink Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town

2 Outlines: 1.Introduction to Cluster Server 2.Introduction to plink 3.Genomics Data Quality Control

3 Introduction to Cluster Server Opening a terminal to connect to linux system or PBS server: 1.Mac OS X includes a Terminal application (located in the Applications >> Utilities folder), which can be used to connect to other systems. 1.From Ubuntu launch Terminal (Ctrl + Alt + T) and at the command prompt. Use dash board to search for a particular software 1.On Windows systems you can use a variety of programs to connect to a Linux system. PuTTY is free and the most used. By default the terminal prompts at your home folder. Connecting remotely to Linux Cluster Server, you will be prompted to your home directory (folder).

4 Introduction to Cluster Server

5 Introduction to Cluster Server Proxy server: is a dedicated computer acting as an intermediary between an endpoint device, such as a computer, and another server from which a user or client is requesting a service. Example: echimusa@lengau.chpc.ac.za echimusa@scp.chpc.ac.za echimusa@gmail.com Username Hostname:Domain or proxy address. How to connect to PBS server: >$ sss Username@proxy_address Example: >$ ssh echimusa@lengau.chpc.ac.za >$ ssh -X echimusa@lengau.chpc.ac.za

6 Introduction to Cluster Server When you sign in you will be located in your home directory. To see where this directory is located in the file system, use the pwd command: For example: ~]$ pwd /home/echimusa Now you should be in the home directory. To see what is inside of this directory, use the ls command (ls stands for list): ~]$ ls get-pip.py hapfuse MarViN1 soft supportmix vcftools To change to a different directory, use the cd command (cd means change directory): [echimusa@login2 ~]$ cd /mnt/lustre/users/echimusa/ You can supply certain alias terms to the cd command. One of these is the character, which represents your home directory (/home/echimusa/). Another is.., which represents the directory above the current directory.

7 Introduction to Cluster Server To create your own directories use the mkdir (make directory) command: ~]$ mkdir proteins ~]$ cd proteins/ proteins]$ ls proteins]$ proteins]$ pwd /home/echimusa/proteins To create new file, let use touch and nano see who else is signed in to the same system, use the who command: proteins]$ touch my_sequence.sh proteins]$ ls -l my_sequence.sh -rw-rw-r-- 1 echimusa echimusa 0 May 6 22:49 my_sequence.sh

8 AGe II. Getting Started: Basic commandc proteins]$ nano my_sequence.sh

9 Introduction to Cluster Server CHPC uses the GNU modules utility, which manipulates your environment, to provide access to the supported software in /apps/. For a list of available modules: [echimusa@login2 ~]$ module avail To see currently loaded modules: [echimusa@login2 proteins]$ module list To remove all modules: [echimusa@login2 proteins]$ module list To load a modules: [echimusa@login2 proteins]$ module load name_module

10 my_sequence.sh Introduction to Cluster Server #!/bin/bash #PBS -N Xchr #PBS -q smp #PBS -P CBBI0818 #PBS -l select=1:ncpus=24 #PBS -l walltime=48:00:00 #PBS -M module load chpc/biomodules module load chpc/python/ qstat: View queued jobs. (eg. qstat -u user_name), or to see what are on each queue (qstat -Q). qsub:submit a job to the scheduler. qdel :Delete one of your jobs from queue (qdel ID_of_your_job).

11 From both Mac and Ubuntu, we use the terminal to transfer the data from local to remote computer or from a remote to local machine. We commonly use scp, rsync, wget (curl) Synthax: rsync options source destination a) Introduction to Cluster Server :Transferring files -au: update files that are newer in the original directory b) scp options source destination -r: if copying folder From the ftp or internet source such as c) wget options source -o path destination -nc => --no-clobber -N => Turn on time-stamping -r => Turn on recursive retrieving Is optional if copying in to current folder

12 Introduction to Cluster Server :Transferring files Transferring data from Windows: we can use winscp software: To use WinSCP, launch the program and enter the appropriate information into the Host name, User name, and Password text areas. Click Login to connect to the remote system. Once you are connected you should be able to transfer files and directories between systems using the simple graphical interface by dragging file to.

13 Introduction to Cluster Server :Transferring files Transferring data from Windows: we can use winscp software: Explore folder Explore folder Local machine Once you are connected you should be able to transfer files and directories between systems by dragging files or folder in between.

14 Connecting to CHPC and Downloading the Tutorial data

15 Connecting to CHPC and Downloading the Tutorial data 1. Connect to CHPC (a) windows users open PuTTY and use the given CHPC login details Please. (b) Linux or mac, just open the terminal and type Ssh (and type your password) 2. Once connected, change directory as follows > cd /mnt/lustre/users/your_username/ (press enter) Download Tutorial from by > wget > tar xvf Tutorial.tar.gz > cd Tutorial > ls

16 Tutorial data and Script to run jobs at CHPC In side Tutorial folder: A. SHELL: folder containing some linux scripts to be use at HPC 1. For PCA: run_pca.sh (this script uses the prepared data in step 1, and calls two python scripts to run smartpca to conduct PCA. (Again #PBSs on top of the file specify the allocation for the Server and following by Working, data and software directory variables) etc. 2. Admixture (population structure):qsub_admixture.sh and runcontinent2.sh. This is a clustering method that needs you to per-specify the number of possible clusters in you data. Will be running just for K=2, 3,4 see (qsub_admixture.sh ) which will submit runcontinent2.sh to run admixture software to the server. B. Genesis_tutorial : This folder contains the software Genesis and basic data that I demonstrated in the last class. Once you have your results from both PCA and admixture, you will use Genesis for plotting.

17 Tutorial data and Script to run jobs at CHPC In side Tutorial folder: C. population_structure_data (in we have the follows: Africa55K_10Pops.fam,.bed,.bim): Folder containing the Africa data (remember our target data are HAZDA and SADAWE (Tanzania)) Will try to investigate their population structure again other populations in the whole dataset. D. software : Contains all you software, except (smartpca) E. GWAS_data: has the gwas data (GWAS.ped,.pedind,.map for ~100 cases and 874 controls). This folder has also run_gwas.sh script that contains script lines to run GWAS (pre-gwas(qc) and association test and some adjustment), it contains also an R script to plot q-q plot (qqplot.r), and Mahanatha.py (to plot the Mahanatha plot). In addition, the way to run them can be found in run_gwas.sh.

18 Introduction to plink Get plink run 1.Download/Install/Run PLINK: 1.Windows users, then unzip the downloaded file. Copy the Application file plink.exe and paste it in a folder called "Plink" (or whatever name you give) in whatever location in your computer (convenient if you create a folder plink in C: drive). 2.Clink Start > Run (or, Start> Search Programs and Files) and then type "cmd" and hit Enter to open command mode. 3.Go to the directory (folder) called Plink in command mode (where you have pasted the application file plink.exe. ). If it on C:\plink 4.To go back to parent directory, type cd.. until you reach to C: drive

19 Introduction to plink Popular Genomics data format Encoded data T/A G/C G/A A/T C/G A/G emile AA CC AA Annie AT CG AA Gaston AA CG AA => Jacqui TT GG GG Ephie TT CC GG Imani TA CG GG Annotation A good ranking strategy would produce SNP3, SNP1, SNP2 coded based on count of minor allele

20 1. Standard format: map and ped files (ped file is very wide if there are much more SNP than individuals as SNP goes in columns). 2. Binary format: bed, bim, fam files (compact files, size about 1/10th of original map/ped files). 3. VCF (.gz) file. 4. Oxford format gen (.gz). Introduction to plink Popular Genomics data format

21 Popular Genomics data format Introduction to plink Format Input option Output Option PED/MAP --file --recode --out BED/BIM/FAM --bffile --make-bed --out TPED/TFAM --tfile --recode --transpose RAW (coded on count of minor allele) None --recodea LGEN/MAP/FAM --lfile --recode-lgen VCF (.gz) --vcf --recode vcf Note that for the PED format, alleles can be encoded as ACGT or The --alleleacgt, allele12 and --allele1234 options can be used to do conversion you have to use the recode or --make-bed too plink --file filename --make-bed --options More detail at

22 1. Convert data from bed, bim, fam files to VCF filet: plink bfile example recode vcf example thus vcf back to bed, bim, fam plink --vcf example.vcf --double-id --vcf-require-gt --biallelic-only strict --missing-genotype 0 allow-extra-chr recode make-bed example2 2. Convert bed, bim, fam to tped Introduction to plink Popular Genomics data format : Examples plink bfile example2 recode --transpose tpexample 3. Convert ped/map to bed, bim, fam to tped plink file example recode make-bed example3 plink tfile tpexample recode --transpose example

23 Introduction to plink Popular Genomics data format : Slicing, dicing,... Inserting the plink below parameter to previous command Data of a particular chromosome --chr (extracting data of a specific chromosome) --maf (extract data, where minor allele frequency SNPs > to a specified values) --mind (remove of samples data with % of missing ) --geno (removal of genotypes with specified % error rate) --hwe (removal of with deviation from HWE) Get subset of SNPs --snps ( to extract a SNPs or range of SNPs --extract --exclude Get subset of Samples --keep sample.txt --remove sample.txt Example: plink bfile example chr 22 recode example.22

24 Introduction to plink Popular Genomics data format : Slicing, dicing, Subsetting the data consisting of chromosome 22: > plink --bfile example --recode --chr 22 --out hap.chr22 2. Subsetting the data consisting of only males: > plink --bfile example --recode --filter-males --out hap_males 3. Subsetting the data consisting of only females: > plink --bfile example --recode --filter-females --out hap_females 4. Subsetting the data consisting of only cases: > plink --bfile example --recode --filter-cases --out hap_cases 5. Subsetting the data consisting of only controls: > plink --bfile example --recode --filter-controls --out hap_controls

25 Introduction to plink Popular Genomics data format : Exercise 1. Use the example data example.bed, example.bim,example.fam to convert to VCF file, and retain only data of chromosome 10 to the output 2. Use the example data example.bed, example.bim,example.fam to convert to tped file, retain only (a) sample in subsample_extract.txt (b) exclude samples in subsample.txt, write these (a) and (b) into different file where genotypes are coded as 1234 for (a) and 12 for (b) 3. Use the example data example.bed, example.bim,example.fam to (a) extract data from rs to rs and write the output into a ped/map format (b) extract SNPS in file SNP_extract.txt and write the output to bed/bim/fam format (c) exclude SNPS in file SNP_exclude.tx and write into vcf (d) write into VCF only common SNP (MAF= 0.05) of chromosome 1.

26 Quality Control Genomics Data Quality Control Removing bad SNPs and individuals: First, remove any individuals who have less than, say, 95% genotype data (--mind 0.05); and then remove SNPs that have less than, say 1% minor allele frequencies (--maf 0.01); and then remove SNPs that have less than, say, < 90% genotype call rate or >10% genotype error rate (--geno 0.1). removing individuals with genotyping error >5% and SNPs with maf <1% and genotype missing data <90% and SNPs with pvalues < 0.05 of deviation from HWE : > plink --bfile example --make-bed --mind maf geno 0.05 hwe 0.05 out Dclean

27 Work is done, relax on beach?

Introduction to Linux and PBS server

Introduction to Linux and PBS server Introduction to Linux and PBS server Emile R. Chimusa (emile.chimusa@uct.ac.za) Division of Human Genetics Department of Pathology University of Cape Town Outlines: 1.Ubuntu. 2.Getting started. 3.Transferring

More information

Rice Imputation Server tutorial

Rice Imputation Server tutorial Rice Imputation Server tutorial Updated: March 30, 2018 Overview The Rice Imputation Server (RIS) takes in rice genomic datasets and imputes data out to >5.2M Single Nucleotide Polymorphisms (SNPs). It

More information

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin High Performance Computing (HPC) Club Training Session Xinsheng (Shawn) Qin Outline HPC Club The Hyak Supercomputer Logging in to Hyak Basic Linux Commands Transferring Files Between Your PC and Hyak Submitting

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2 ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................

More information

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011) UoW HPC Quick Start Information Technology Services University of Wollongong ( Last updated on October 10, 2011) 1 Contents 1 Logging into the HPC Cluster 3 1.1 From within the UoW campus.......................

More information

Polymorphism and Variant Analysis Lab

Polymorphism and Variant Analysis Lab Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity

More information

New User Tutorial. OSU High Performance Computing Center

New User Tutorial. OSU High Performance Computing Center New User Tutorial OSU High Performance Computing Center TABLE OF CONTENTS Logging In... 3-5 Windows... 3-4 Linux... 4 Mac... 4-5 Changing Password... 5 Using Linux Commands... 6 File Systems... 7 File

More information

PLATO User Guide. Current version: PLATO 2.1. Last modified: September Ritchie Lab, Geisinger Health System

PLATO User Guide. Current version: PLATO 2.1. Last modified: September Ritchie Lab, Geisinger Health System PLATO User Guide Current version: PLATO 2.1 Last modified: September 2017 Ritchie Lab, Geisinger Health System Email: software@ritchielab.psu.edu 1 Table of Contents Overview... 3 PLATO Quick Reference...

More information

Sharpen Exercise: Using HPC resources and running parallel applications

Sharpen Exercise: Using HPC resources and running parallel applications Sharpen Exercise: Using HPC resources and running parallel applications Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into

More information

Session 1: Accessing MUGrid and Command Line Basics

Session 1: Accessing MUGrid and Command Line Basics Session 1: Accessing MUGrid and Command Line Basics Craig A. Struble, Ph.D. July 14, 2010 1 Introduction The Marquette University Grid (MUGrid) is a collection of dedicated and opportunistic resources

More information

CS CS Tutorial 2 2 Winter 2018

CS CS Tutorial 2 2 Winter 2018 CS CS 230 - Tutorial 2 2 Winter 2018 Sections 1. Unix Basics and connecting to CS environment 2. MIPS Introduction & CS230 Interface 3. Connecting Remotely If you haven t set up a CS environment password,

More information

Unit: Making a move (using FTP)

Unit: Making a move (using FTP) Data Introduction to Unix and HPC (HPC for Wimps) Unit: Making a move (using FTP) Goals: Can login via Secure FTP and see home directory. Can transfer a file from local machine via FTP to home directory.

More information

Handling sam and vcf data, quality control

Handling sam and vcf data, quality control Handling sam and vcf data, quality control We continue with the earlier analyses and get some new data: cd ~/session_3 wget http://wasabiapp.org/vbox/data/session_4/file3.tgz tar xzf file3.tgz wget http://wasabiapp.org/vbox/data/session_4/file4.tgz

More information

Genetic type 1 Error Calculator (GEC)

Genetic type 1 Error Calculator (GEC) Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development

More information

HPC Course Session 3 Running Applications

HPC Course Session 3 Running Applications HPC Course Session 3 Running Applications Checkpointing long jobs on Iceberg 1.1 Checkpointing long jobs to safeguard intermediate results For long running jobs we recommend using checkpointing this allows

More information

GMDR User Manual Version 1.0

GMDR User Manual Version 1.0 GMDR User Manual Version 1.0 Oct 30, 2011 1 GMDR is a free, open-source interaction analysis tool, aimed to perform gene-gene interaction with generalized multifactor dimensionality methods. GMDR is being

More information

Parallel Programming Pre-Assignment. Setting up the Software Environment

Parallel Programming Pre-Assignment. Setting up the Software Environment Parallel Programming Pre-Assignment Setting up the Software Environment Authors: B. Wilkinson and C. Ferner. Modification date: Aug 21, 2014 (Minor correction Aug 27, 2014.) Software The purpose of this

More information

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim Using ISMLL Cluster Tutorial Lec 5 1 Agenda Hardware Useful command Submitting job 2 Computing Cluster http://www.admin-magazine.com/hpc/articles/building-an-hpc-cluster Any problem or query regarding

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

The fgwas software. Version 1.0. Pennsylvannia State University

The fgwas software. Version 1.0. Pennsylvannia State University The fgwas software Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction Genome-wide association studies

More information

PARALLEL COMPUTING IN R USING WESTGRID CLUSTERS STATGEN GROUP MEETING 10/30/2017

PARALLEL COMPUTING IN R USING WESTGRID CLUSTERS STATGEN GROUP MEETING 10/30/2017 PARALLEL COMPUTING IN R USING WESTGRID CLUSTERS STATGEN GROUP MEETING 10/30/2017 PARALLEL COMPUTING Dataset 1 Processor Dataset 2 Dataset 3 Dataset 4 R script Processor Processor Processor WHAT IS ADVANCED

More information

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Linux/Unix basic commands Basic command structure:

More information

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version... Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing

More information

Logging in to the CRAY

Logging in to the CRAY Logging in to the CRAY 1. Open Terminal Cray Hostname: cray2.colostate.edu Cray IP address: 129.82.103.183 On a Mac 2. type ssh username@cray2.colostate.edu where username is your account name 3. enter

More information

Short Read Sequencing Analysis Workshop

Short Read Sequencing Analysis Workshop Short Read Sequencing Analysis Workshop Day 2 Learning the Linux Compute Environment In-class Slides Matt Hynes-Grace Manager of IT Operations, BioFrontiers Institute Review of Day 2 Videos Video 1 Introduction

More information

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Contents User access, logging in Linux/Unix

More information

Calling variants in diploid or multiploid genomes

Calling variants in diploid or multiploid genomes Calling variants in diploid or multiploid genomes Diploid genomes The initial steps in calling variants for diploid or multi-ploid organisms with NGS data are the same as what we've already seen: 1. 2.

More information

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0 Helsinki 19 Jan 2017 529028 Practical course in genome bioinformatics DAY 0 This document can be downloaded at: http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017/exercises_day0.pdf The

More information

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System

Image Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line

More information

CS 215 Fundamentals of Programming II Spring 2019 Very Basic UNIX

CS 215 Fundamentals of Programming II Spring 2019 Very Basic UNIX CS 215 Fundamentals of Programming II Spring 2019 Very Basic UNIX This handout very briefly describes how to use Unix and how to use the Linux server and client machines in the EECS labs that dual boot

More information

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th Unix Essentials BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th 2016 http://barc.wi.mit.edu/hot_topics/ 1 Outline Unix overview Logging in to tak Directory structure

More information

CENG 334 Computer Networks. Laboratory I Linux Tutorial

CENG 334 Computer Networks. Laboratory I Linux Tutorial CENG 334 Computer Networks Laboratory I Linux Tutorial Contents 1. Logging In and Starting Session 2. Using Commands 1. Basic Commands 2. Working With Files and Directories 3. Permission Bits 3. Introduction

More information

Introduction to HPC Resources and Linux

Introduction to HPC Resources and Linux Introduction to HPC Resources and Linux Burak Himmetoglu Enterprise Technology Services & Center for Scientific Computing e-mail: bhimmetoglu@ucsb.edu Paul Weakliem California Nanosystems Institute & Center

More information

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen 1 NBIC PBS Tutorial This part is an introduction to clusters and the PBS

More information

PRSice: Polygenic Risk Score software - Vignette

PRSice: Polygenic Risk Score software - Vignette PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic

More information

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs Frauke Bösert, SCC, KIT 1 Material: Slides & Scripts https://indico.scc.kit.edu/indico/event/263/ @bwunicluster/forhlr I/ForHLR

More information

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing A Hands-On Tutorial: RNA Sequencing Using Computing February 11th and 12th, 2016 1st session (Thursday) Preliminaries: Linux, HPC, command line interface Using HPC: modules, queuing system Presented by:

More information

CS Fundamentals of Programming II Fall Very Basic UNIX

CS Fundamentals of Programming II Fall Very Basic UNIX CS 215 - Fundamentals of Programming II Fall 2012 - Very Basic UNIX This handout very briefly describes how to use Unix and how to use the Linux server and client machines in the CS (Project) Lab (KC-265)

More information

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen Introduction to Unix The Windows User perspective Wes Frisby Kyle Horne Todd Johansen What is Unix? Portable, multi-tasking, and multi-user operating system Software development environment Hardware independent

More information

Batch Systems. Running calculations on HPC resources

Batch Systems. Running calculations on HPC resources Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between

More information

Using CLC Genomics Workbench on Turing

Using CLC Genomics Workbench on Turing Using CLC Genomics Workbench on Turing Table of Contents Introduction...2 Accessing CLC Genomics Workbench...2 Launching CLC Genomics Workbench from your workstation...2 Launching CLC Genomics Workbench

More information

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs Frauke Bösert, SCC, KIT 1 Material: Slides & Scripts https://indico.scc.kit.edu/indico/event/263/ @bwunicluster/forhlr I/ForHLR

More information

CS 261 Recitation 1 Compiling C on UNIX

CS 261 Recitation 1 Compiling C on UNIX Oregon State University School of Electrical Engineering and Computer Science CS 261 Recitation 1 Compiling C on UNIX Winter 2017 Outline Secure Shell Basic UNIX commands Editing text The GNU Compiler

More information

Galaxy How To Remote Desktop Connection and SSH

Galaxy How To Remote Desktop Connection and SSH Galaxy How To Remote Desktop Connection and SSH The host name for the econ Linux server is galaxy.econ.jhu.edu. It is running Ubuntu 18.04 LTS. Please note: you need to be on the VPN to be able to use

More information

Siemens PLM Software. HEEDS MDO Setting up a Windows-to- Linux Compute Resource.

Siemens PLM Software. HEEDS MDO Setting up a Windows-to- Linux Compute Resource. Siemens PLM Software HEEDS MDO 2018.04 Setting up a Windows-to- Linux Compute Resource www.redcedartech.com. Contents Introduction 1 On Remote Machine B 2 Installing the SSH Server 2 Configuring the SSH

More information

User Guide Version 2.0

User Guide Version 2.0 User Guide Version 2.0 Page 2 of 8 Summary Contents 1 INTRODUCTION... 3 2 SECURESHELL (SSH)... 4 2.1 ENABLING SSH... 4 2.2 DISABLING SSH... 4 2.2.1 Change Password... 4 2.2.2 Secure Shell Connection Information...

More information

Sharpen Exercise: Using HPC resources and running parallel applications

Sharpen Exercise: Using HPC resources and running parallel applications Sharpen Exercise: Using HPC resources and running parallel applications Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into ARCHER frontend nodes and run commands.... 3 3.2 Download and extract

More information

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS FVGWAS- 3.0 Manual Hongtu Zhu @ UNC BIAS Chao Huang @ UNC BIAS Nov 8, 2015 More and more large- scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical

More information

Data transfer and RDS for HPC

Data transfer and RDS for HPC Course Docs at https://goo.gl/7d2yfn Data transfer and RDS for HPC Hayim Dar and Nathaniel Butterworth sih.info@sydney.edu.au Sydney Informatics Hub A Core Research Facility HPC Access Example: ssh -Y

More information

Bitnami MEAN for Huawei Enterprise Cloud

Bitnami MEAN for Huawei Enterprise Cloud Bitnami MEAN for Huawei Enterprise Cloud Description Bitnami MEAN Stack provides a complete development environment for mongodb and Node.js that can be deployed in one click. It includes the latest stable

More information

Introduction to UNIX. SURF Research Boot Camp April Jeroen Engelberts Consultant Supercomputing

Introduction to UNIX. SURF Research Boot Camp April Jeroen Engelberts Consultant Supercomputing Introduction to UNIX SURF Research Boot Camp April 2018 Jeroen Engelberts jeroen.engelberts@surfsara.nl Consultant Supercomputing Outline Introduction to UNIX What is UNIX? (Short) history of UNIX Cartesius

More information

ChIP-seq Analysis Practical

ChIP-seq Analysis Practical ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how

More information

The fgwas Package. Version 1.0. Pennsylvannia State University

The fgwas Package. Version 1.0. Pennsylvannia State University The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional

More information

Small example of use of OmicABEL

Small example of use of OmicABEL Small example of use of OmicABEL Yurii Aulchenko for the OmicABEL developers July 1, 2013 Contents 1 Important note on data format for OmicABEL 1 2 Outline of the example 2 3 Prepare the data for analysis

More information

REAP Software Documentation

REAP Software Documentation REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities

More information

ICS-ACI System Basics

ICS-ACI System Basics ICS-ACI System Basics Adam W. Lavely, Ph.D. Fall 2017 Slides available: goo.gl/ss9itf awl5173 ICS@PSU 1 Contents 1 Overview 2 HPC Overview 3 Getting Started on ACI 4 Moving On awl5173 ICS@PSU 2 Contents

More information

Author A.Kishore/Sachin WinSCP

Author A.Kishore/Sachin   WinSCP WinSCP WinSCP is a freeware windows client for the SCP (secure copy protocol), a way to transfer files across the network using the ssh (secure shell) encrypted protocol. It replaces other FTP programs

More information

Cheat Sheet on using Electric for Design and Simulations

Cheat Sheet on using Electric for Design and Simulations Cheat Sheet on using Electric for Design and Simulations By Sai Kashyap Nutulapati Revised - 04 October 2010 10/4/2010 1 Instructions before Starting Wherever you see the word , replace it with

More information

NBIC TechTrack PBS Tutorial

NBIC TechTrack PBS Tutorial NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen Visit our webpage at: http://www.nbic.nl/support/brs 1 NBIC PBS Tutorial

More information

The Command Shell. Fundamentals of Computer Science

The Command Shell. Fundamentals of Computer Science The Command Shell Fundamentals of Computer Science Outline Starting the Command Shell Locally Remote Host Directory Structure Moving around the directories Displaying File Contents Compiling and Running

More information

Supercomputing environment TMA4280 Introduction to Supercomputing

Supercomputing environment TMA4280 Introduction to Supercomputing Supercomputing environment TMA4280 Introduction to Supercomputing NTNU, IMF February 21. 2018 1 Supercomputing environment Supercomputers use UNIX-type operating systems. Predominantly Linux. Using a shell

More information

Introduction in Unix. Linus Torvalds Ken Thompson & Dennis Ritchie

Introduction in Unix. Linus Torvalds Ken Thompson & Dennis Ritchie Introduction in Unix Linus Torvalds Ken Thompson & Dennis Ritchie My name: John Donners John.Donners@surfsara.nl Consultant at SURFsara And Cedric Nugteren Cedric.Nugteren@surfsara.nl Consultant at SURFsara

More information

CMSC 201 Spring 2018 Lab 01 Hello World

CMSC 201 Spring 2018 Lab 01 Hello World CMSC 201 Spring 2018 Lab 01 Hello World Assignment: Lab 01 Hello World Due Date: Sunday, February 4th by 8:59:59 PM Value: 10 points At UMBC, the GL system is designed to grant students the privileges

More information

Datathon 2018 Connecting to MicroStrategy on AWS Cloud

Datathon 2018 Connecting to MicroStrategy on AWS Cloud Datathon 2018 Connecting to MicroStrategy on AWS Cloud Introduction This document describes how to connect to MicroStrategy on AWS cloud. The first part will show screenshots and introduction to the MicroStrategy

More information

Introduction to Linux. Fundamentals of Computer Science

Introduction to Linux. Fundamentals of Computer Science Introduction to Linux Fundamentals of Computer Science Outline Operating Systems Linux History Linux Architecture Logging in to Linux Command Format Linux Filesystem Directory and File Commands Wildcard

More information

No Food or Drink in this room. Logon to Windows machine

No Food or Drink in this room. Logon to Windows machine While you are waiting No Food or Drink in this room Logon to Windows machine Username/password on right-hand monitor Not the username/password I gave you earlier We will walk through connecting to the

More information

Working with Basic Linux. Daniel Balagué

Working with Basic Linux. Daniel Balagué Working with Basic Linux Daniel Balagué How Linux Works? Everything in Linux is either a file or a process. A process is an executing program identified with a PID number. It runs in short or long duration

More information

Using Sapelo2 Cluster at the GACRC

Using Sapelo2 Cluster at the GACRC Using Sapelo2 Cluster at the GACRC New User Training Workshop Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Sapelo2 Cluster Diagram

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

Bitnami Apache Solr for Huawei Enterprise Cloud

Bitnami Apache Solr for Huawei Enterprise Cloud Bitnami Apache Solr for Huawei Enterprise Cloud Description Apache Solr is an open source enterprise search platform from the Apache Lucene project. It includes powerful full-text search, highlighting,

More information

Ftp Command Line Commands Linux Example Windows Put

Ftp Command Line Commands Linux Example Windows Put Ftp Command Line Commands Linux Example Windows Put Examples of typical uses of the command ftp. This lists the commands that you can use to show the directory contents, transfer files, and delete files.

More information

Setting up my Dev Environment ECS 030

Setting up my Dev Environment ECS 030 Setting up my Dev Environment ECS 030 1 Command for SSHing into a CSIF Machine If you already have a terminal and already have a working ssh program (That is, you type ssh into the terminal and it doesn

More information

Introduction to the Linux Command Line

Introduction to the Linux Command Line Introduction to the Linux Command Line May, 2015 How to Connect (securely) ssh sftp scp Basic Unix or Linux Commands Files & directories Environment variables Not necessarily in this order.? Getting Connected

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department

More information

Using a Linux System 6

Using a Linux System 6 Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

Introduction: What is Unix?

Introduction: What is Unix? Introduction Introduction: What is Unix? An operating system Developed at AT&T Bell Labs in the 1960 s Command Line Interpreter GUIs (Window systems) are now available Introduction: Unix vs. Linux Unix

More information

A short manual for LFMM (command-line version)

A short manual for LFMM (command-line version) A short manual for LFMM (command-line version) Eric Frichot efrichot@gmail.com April 16, 2013 Please, print this reference manual only if it is necessary. This short manual aims to help users to run LFMM

More information

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working

More information

Quick Guide for the Torque Cluster Manager

Quick Guide for the Torque Cluster Manager Quick Guide for the Torque Cluster Manager Introduction: One of the main purposes of the Aries Cluster is to accommodate especially long-running programs. Users who run long jobs (which take hours or days

More information

HPC Introductory Course - Exercises

HPC Introductory Course - Exercises HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands

More information

Linux for Biologists Part 2

Linux for Biologists Part 2 Linux for Biologists Part 2 Robert Bukowski Institute of Biotechnology Bioinformatics Facility (aka Computational Biology Service Unit - CBSU) http://cbsu.tc.cornell.edu/lab/doc/linux_workshop_part2.pdf

More information

Please include the following sentence in any works using center resources.

Please include the following sentence in any works using center resources. The TCU High-Performance Computing Center The TCU HPCC currently maintains a cluster environment hpcl1.chm.tcu.edu. Work on a second cluster environment is underway. This document details using hpcl1.

More information

Computing with the Moore Cluster

Computing with the Moore Cluster Computing with the Moore Cluster Edward Walter An overview of data management and job processing in the Moore compute cluster. Overview Getting access to the cluster Data management Submitting jobs (MPI

More information

WinSCP. Author A.Kishore/Sachin

WinSCP. Author A.Kishore/Sachin WinSCP WinSCP is a freeware windows client for the SCP (secure copy protocol), a way to transfer files across the network using the ssh (secure shell) encrypted protocol. It replaces other FTP programs

More information

Linux Training. for New Users of Cluster. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala

Linux Training. for New Users of Cluster. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala Linux Training for New Users of Cluster Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu 1 Overview GACRC Linux Operating System Shell, Filesystem, and Common

More information

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011 Oregon State University School of Electrical Engineering and Computer Science CS 261 Recitation 1 Spring 2011 Outline Using Secure Shell Clients GCC Some Examples Intro to C * * Windows File transfer client:

More information

CHE3935. Lecture 1. Introduction to Linux

CHE3935. Lecture 1. Introduction to Linux CHE3935 Lecture 1 Introduction to Linux 1 Logging In PuTTY is a free telnet/ssh client that can be run without installing it within Windows. It will only give you a terminal interface, but used with a

More information

Introduction to Linux Environment. Yun-Wen Chen

Introduction to Linux Environment. Yun-Wen Chen Introduction to Linux Environment Yun-Wen Chen 1 The Text (Command) Mode in Linux Environment 2 The Main Operating Systems We May Meet 1. Windows 2. Mac 3. Linux (Unix) 3 Windows Command Mode and DOS Type

More information

For Dr Landau s PHYS8602 course

For Dr Landau s PHYS8602 course For Dr Landau s PHYS8602 course Shan-Ho Tsai (shtsai@uga.edu) Georgia Advanced Computing Resource Center - GACRC January 7, 2019 You will be given a student account on the GACRC s Teaching cluster. Your

More information

Mills HPC Tutorial Series. Linux Basics II

Mills HPC Tutorial Series. Linux Basics II Mills HPC Tutorial Series Linux Basics II Objectives Bash Shell Script Basics Script Project This project is based on using the Gnuplot program which reads a command file, a data file and writes an image

More information

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group The cluster system Introduction 22th February 2018 Jan Saalbach Scientific Computing Group cluster-help@luis.uni-hannover.de Contents 1 General information about the compute cluster 2 Available computing

More information

RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL

RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL RUNNING MOLECULAR DYNAMICS SIMULATIONS WITH CHARMM: A BRIEF TUTORIAL While you can probably write a reasonable program that carries out molecular dynamics (MD) simulations, it s sometimes more efficient

More information

Introduction to Discovery.

Introduction to Discovery. Introduction to Discovery http://discovery.dartmouth.edu The Discovery Cluster 2 Agenda What is a cluster and why use it Overview of computer hardware in cluster Help Available to Discovery Users Logging

More information

High Performance Computing (HPC) Using zcluster at GACRC

High Performance Computing (HPC) Using zcluster at GACRC High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?

More information