Analytical Processing of Data of statistical genetics research in UNIX like Systems

Similar documents
Introduction to UNIX/Linux

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

Introduction to UNIX command-line II

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

Unix Basics. Benjamin S. Skrainka University College London. July 17, 2010

Computer Systems and Architecture

INTRODUCTION TO BIOINFORMATICS

Introduction to UNIX command-line

Computer Systems and Architecture

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Linux at the Command Line Don Johnson of BU IS&T

CSE Linux VM. For Microsoft Windows. Based on opensuse Leap 42.2

Linux Command Line Primer. By: Scott Marshall

Operating Systems. Copyleft 2005, Binnur Kurt

Operating Systems 3. Operating Systems. Content. What is an Operating System? What is an Operating System? Resource Abstraction and Sharing

Introduction to the Linux Command Line

Introduction: What is Unix?

Introduction to Linux

CS 261 Recitation 1 Compiling C on UNIX

Command-line interpreters

Introduction To Linux. Rob Thomas - ACRC

LOG ON TO LINUX AND LOG OFF

System Administration

Chapter-3. Introduction to Unix: Fundamental Commands

Unix Workshop Aug 2014

UNIX. The Very 10 Short Howto for beginners. Soon-Hyung Yook. March 27, Soon-Hyung Yook UNIX March 27, / 29

Session 1: Accessing MUGrid and Command Line Basics

BIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description:

Linux/Cygwin Practice Computer Architecture

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Introduction to Unix: Fundamental Commands

CS CS Tutorial 2 2 Winter 2018

Connecting to ICS Server, Shell, Vim CS238P Operating Systems fall 18

CSE 390a Lecture 3. bash shell continued: processes; multi-user systems; remote login; editors

Unix Guide. Meher Krishna Patel. Created on : Octorber, 2017 Last updated : December, More documents are freely available at PythonDSP

CS370 Operating Systems

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

EECS Software Tools. Lab 2 Tutorial: Introduction to UNIX/Linux. Tilemachos Pechlivanoglou

Arkansas High Performance Computing Center at the University of Arkansas

STA 303 / 1002 Using SAS on CQUEST

Shell Programming Overview

Virtual Machine. Linux flavor : Debian. Everything (except slides) preinstalled for you.

Introduction to the Linux Command Line January Presentation Topics

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

CHAPTER 1 UNIX FOR NONPROGRAMMERS

Research. We make it happen. Unix Basics. User Support Group help-line: personal:

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

IMPORTANT: Logging Off LOGGING IN

Unit 10. Linux Operating System

An Introduction to Unix

Unix Basics. Systems Programming Concepts

CS4350 Unix Programming. Outline

A Brief Introduction to Unix

Outline. Structure of a UNIX command

Practical Computing-II. Programming in the Linux Environment. 0. An Introduction. B.W.Gore. March 20, 2015

Linux Shell Script. J. K. Mandal

Unix Tools / Command Line

Tutorial-1b: The Linux CLI

Introduction to UNIX. Introduction EECS l UNIX is an operating system (OS). l Our goals:

First of all, these notes will cover only a small subset of the available commands and utilities, and will cover most of those in a shallow fashion.

Read the relevant material in Sobell! If you want to follow along with the examples that follow, and you do, open a Linux terminal.

CSCE 212H, Spring 2008, Matthews Lab Assignment 1: Representation of Integers Assigned: January 17 Due: January 22

Embedded Linux Systems. Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

Lab 1 Introduction to UNIX and C

Introduction to Linux

CS 460 Linux Tutorial

What are some common categories of system calls? What are common ways of structuring an OS? What are the principles behind OS design and

Introduction to Linux for BlueBEAR. January

Introduction to Linux. Fundamentals of Computer Science

Introduction to Linux

*nix Crash Course. Presented by: Virginia Tech Linux / Unix Users Group VTLUUG

Working with Basic Linux. Daniel Balagué

Unix Introduction to UNIX

Introduction To. Barry Grant

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

An Introduction to Cluster Computing Using Newton

Chapter 4. Unix Tutorial. Unix Shell

Introduction to Linux Basics

Unit 13. Linux Operating System Debugging Programs

(7) Get the total energy from the output (1 Hartree = kcal/mol) and label the structure with the energy.

Linux Introduction Martin Dahlö Valentin Georgiev

CS Fundamentals of Programming II Fall Very Basic UNIX

Introduction of Linux

Introduction to Cygwin Operating Environment

This lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one]

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Unix L555. Dept. of Linguistics, Indiana University Fall Unix. Unix. Directories. Files. Useful Commands. Permissions. tar.

Linux Essentials Objectives Topics:

Bash Programming for Geophysicists

New User Tutorial. OSU High Performance Computing Center

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

Part 1: Basic Commands/U3li3es

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Perl and R Scripting for Biologists

CSE 391 Lecture 3. bash shell continued: processes; multi-user systems; remote login; editors

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Lezione 8. Shell command language Introduction. Sommario. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

Introduction to Linux Part 1. Anita Orendt and Wim Cardoen Center for High Performance Computing 24 May 2017

Transcription:

Survival Skills for Analytical Processing of Data of statistical genetics research in UNIX like Systems robert yu :: March 2011

anote UNIX like? Traditional/classical UNIX, e.g. System V (Solaris), BSD (SunOS), etc. Various variants of Linux Windows based cygwin Other types, e.g. MacIntosh?

Analytical Processing of Data Exploring Data Center Descriptive data Supporting data dt Main data sets Outputting tti Analyzing Data Center Results

Analytical Processing of Data Exploring visualization descriptive stats data mining cleaning/preparing Data Center Descriptive data Supporting data dt Main data sets Outputting tti Plotting presentation etc. Analyzing Data Center Results subsetting annotation integration re analyzation

Analytical Processing of Data data of various sources and types Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data Genotyping data epidemiology epidemiology data data epidemiology data epidemiology data epidemiology data epidemiology data patient data patient data patient t data dt patient data patient data patient data patient data Gene info data Pathway info data

Analytical Processing of Data Powerful computing facility Exploring Data Center Descriptive data Supporting data dt Main data sets Outputting tti Analyzing Data Center Results

Analytical Processing of Data a heterogeneous computing facility Powerful computing facility Windows based personal computer Windows based personal computer Windows based personal computer UNIX like operating systembased high performance computer (HPC), e.g. Linux clusters custes Windows based personal computer Windows based personal computer Windows based personal computer Windows based personal computer non Windows based personal computer non Windows based personal computer

Graph of supercomputer OS market share from around 1994 to 2010 according to TOP500.

Analytical Processing of Data Software oriented computing facility Windows Stata PHASE Outlook SAS SPSS Word R SAGE PowerPoint Excel gnuplot MatLab Photoshop PLINK Access Structure Adobe Acrobat HaploView GAP etc Allegro Pathway Studio etc HelixTree etc etc PedCheck FastLink ACT PBAT simwalk merlin genehunter etc etc UNIX/Linux Communication & Analytical Exploration Production extensive & intensive crunch

Analytical Processing of Data Software oriented computing facility Windows More than 75% of the time, we are not doing real analyses but cleaning and preparing data. Efficient and effective tools are essential. Besides analytical software, programming tools are critical. UNIX/Linux

Analytical Processing of Data Software oriented computing facility Windows A good analyst or research scientist invests his/her skills using integrated computing system where all tools are handy. Windows are accumulating more tools. EXPLORING! UNIX is still the major scientific computing platform! UNIX/Linux

Various Computing Platforms Personal Computer (Windows, etc) vs Server based Computer (UNIX/Linux HPC) Windows based personal computer is a more complicated system. more heterogeneous of processes (programs) more burdensome (complicated installation, memory management less tolerable to continuously stable running single user, frequently interrupted t UNIX like computing system is more powerful and efficient. extremely stable, constantly running homogeneous computing / programming extremely friendly less visual, less social (less Internet attacked) multi user, multi terminals

a case example of using UNIX collecting files in Windows Step 0 Receiving or generating data. data in Excel sheets, text files, zip packages, etc. small files: sample info, epidemiology data large files: genotype data other supporting files, e.g. matching thi information, etc. Preparation for analyses. preliminary exploration andvisualization of data descriptive statistics collection formation of basic analyses transferring data to UNIX / LINUX

a case example of using UNIX getting into UNIX, transferring data from Windows to UNIX Step 1a SSH Client Programs WinSCP Client Programs X Win Terminal Program

a case example p of usingg UNIX getting into UNIX, transferring data from Windows to UNIX Step 1b L h WinSCP Wi SCP from f Wi d D kt Launch Windows Desktop Drag files from Windows to UNIX Check transferred files in UNIX Using editor vi to check data command vi

a Windows based UNIX-cygwin obtaining cygwin

a Windows based UNIX-cygwin launching and exploring cygwin

a Windows based UNIX-cygwin compiling allegro, C++ source code

a Windows based UNIX-cygwin compiling allegro, C++ source code

a Windows based UNIX cygwin yg working and running analysis in xterm

a Windows based UNIX-cygwin viewing linkage files for allegro

a Windows based UNIX-cygwin checking and plotting linkage analysis results

a Windows based UNIX-cygwin using gnuplot to make a comprehensive plot

a Windows based UNIX-cygwin viewing result files in Windows

UNIX Commands Summary a starting set of UNIX commands exploring directories and files, running a program cd (change directory): cd../ (go back upper level) el) ls l l (listing files in a directory)./my own program & (run a program in the current directory, the program you created, in background) perl my_perl_script.pl (run a perl script) df (show disk usage) du (show dir space) nohup program_name argument (running a program in background to avoid terminated) Ctrl+c (suspendcurrentcommand) command) Ctrl+z (stopcurrentcommand) command)!! Repeat the last command viewing and editing files, changing file mode, delete files more, less, head, tail, cat filename (view a file) vi (most powerful text editor) emacs (text editor) egrep pattern *.txt (extract pattern from all *.txt files) mkdir dirname create a dir chmod 777 filename (make this file accessible to public) mv f1 f2 (rename f1 to f2) rm *.txt (delete all files of txt extension in the current directory) tar / gzip (packaging and zip files) rm rf directory_name (force delete a directory with all files in it without question) ln s file link create a shortcut (symbolic link) link to file. cp f1 f2 (copy f1 to f2) cp r d1 d2 (copy dirs) line / word counting wc lw filename(counting lines and word in the file) wc l *.dat (counting lines in each file ) merging files paste f1 f2 > file merged (merge two files (by column)) cat f1 f2 > file appended (append f2 below f1) checking upon system top (checking currenting running processes (programs)) pwd (checking current directory position) ps (view current running process (id)) kill pid (terminate running of a process with pid uname a (checking info on node name, operating system and its version, platform, etc.) which program name (checkingifthis if programisinstalled/available): installed/available): which perl most useful tools man program name (open help file of the program, to learn detailed usage of the program) GOOGLE your request of any UNIX programs and commands,, to become a UNIX guru

a hands on case a case/control association study of disease UNIX sick 1. Receive SNP data in Excel & text files 2. Check data files 3. Plan of analyses: using PLINK 4. Transfer data into UNIX 5. Data preparation 6. Run PLINK analyses 1. raw data verification 2. data cleaning 3. data assembly in PLINK format 4. preliminary run of PLINK 5. modification 7. View results 8. Get result out of UNIX 9. Result annotation and further study SNP annotation, gene pathway analysis, meta analysis, Beyond this scope.

Q & A Thank You.