Linux Introduction to Linux

Similar documents
Introduction: What is Unix?

Introduction to Linux

A Brief Introduction to the Linux Shell for Data Science

Introduction To Linux. Rob Thomas - ACRC

Cryptography - SSH. Network Security Workshop May 2017 Phnom Penh, Cambodia

Cryptography - SSH. Network Security Workshop. 3-5 October 2017 Port Moresby, Papua New Guinea

Cryptography Application : SSH. Cyber Security & Network Security March, 2017 Dhaka, Bangladesh

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Unix Tutorial Haverford Astronomy 2014/2015

EECS Software Tools. Lab 2 Tutorial: Introduction to UNIX/Linux. Tilemachos Pechlivanoglou

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

Perl and R Scripting for Biologists

Linux at the Command Line Don Johnson of BU IS&T

What is Secure. Authenticated I know who I am talking to. Our communication is Encrypted

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

CENG 334 Computer Networks. Laboratory I Linux Tutorial

Network Monitoring & Management. A few Linux basics

SSH. What is Safely 6/19/ June 2018 PacNOG 22, Honiara, Solomon Islands Supported by:

An Introduction to Cluster Computing Using Newton

Unix basics exercise MBV-INFX410

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

The Unix Shell. Pipes and Filters

Computer Systems and Architecture

Introduction to remote command line Linux. Research Computing Team University of Birmingham

Linux Essentials. Programming and Data Structures Lab M Tech CS First Year, First Semester

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016

Chapter 4. Unix Tutorial. Unix Shell

FEPS. SSH Access with Two-Factor Authentication. RSA Key-pairs

Week 2. Exp 2 (a) (b): Introduction to LINUX OS, Installation of LINUX OS, Basic DOS commands

Introduction to UNIX command-line

Cryptography Application : SSH. 7 Sept 2017, Taichung, Taiwan

Getting Started With Linux and Fortran Part 3

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

Shell Programming Overview

CS197U: A Hands on Introduction to Unix

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

Using LINUX a BCMB/CHEM 8190 Tutorial Updated (1/17/12)

Intro to Linux & Command Line

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

Tutorial 1: Unix Basics

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

Overview LEARN. History of Linux Linux Architecture Linux File System Linux Access Linux Commands File Permission Editors Conclusion and Questions

Using the Zoo Workstations

No Food or Drink in this room. Logon to Windows machine

CSC209. Software Tools and Systems Programming.

Introduction to UNIX command-line II

This lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one]

Version Control with Git

UNIX. The Very 10 Short Howto for beginners. Soon-Hyung Yook. March 27, Soon-Hyung Yook UNIX March 27, / 29

Introduction to Linux Environment. Yun-Wen Chen

EECS 470 Lab 5. Linux Shell Scripting. Friday, 1 st February, 2018

1 Installation (briefly)

UNIX, GNU/Linux and simple tools for data manipulation

Laboratory 1 Semester 1 11/12

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

CS 460 Linux Tutorial

CS CS Tutorial 2 2 Winter 2018

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

Scripting Languages Course 1. Diana Trandabăț

Introduction to the Linux Command Line

CSE115 Lab exercises for week 1 of recitations Spring 2011

Linux Command Line Interface. December 27, 2017

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

ssh and handson Matsuzaki maz Yoshinobu 1

CENG393 Computer Networks Labwork 1

Introduction... 3 Introduction... 4

Introduction to Linux for BlueBEAR. January

Introduction to Linux (Part I) BUPT/QMUL 2018/03/14

The Unix Family. Linux 101. GNU/Linux distributions. Even more. MULTICS MIT/GE/Bell Labs. UNIX AT&T (Bell Labs) Mach, Carnegie Mellon

Intro to Linux. this will open up a new terminal window for you is super convenient on the computers in the lab

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Utilities. September 8, 2015

Useful Unix Commands Cheat Sheet

CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C

TNM093 Practical Data Visualization and Virtual Reality Laboratory Platform

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Introduction to Linux

Introduction. File System. Note. Achtung!

Course Wiki. Today s Topics. Web Resources. Amazon EC2. Linux. Apache PHP. Workflow and Tools. Extensible Networking Platform 1

Brief Linux Presentation. July 10th, 2006 Elan Borenstein

Session 1: Accessing MUGrid and Command Line Basics

LAB #5 Intro to Linux and Python on ENGR

Exploring UNIX: Session 3

Parts of this tutorial has been adapted from M. Stonebank s UNIX Tutorial for Beginners (

LAB 0: LINUX COMMAND LINE AND SVN

Introduction to Linux

COMPUTER FOR BEGINNERS

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0

Getting Started With Linux and Fortran Part 2

Computer Architecture Lab 1 (Starting with Linux)

Modern C++ for Computer Vision and Image Processing. Igor Bogoslavskyi and Cyrill Stachniss

Introduction To. Barry Grant

The kernel is the low-level software that manages hardware, multitasks programs, etc.

Computer Systems and Architecture

Workshop on Genomics 2018

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

CSE 391 Lecture 1. introduction to Linux/Unix environment

Transcription:

Linux Introduction to Linux Most computational biologists use either Apple Macs or Linux machines. There are a couple of reasons for this: * Much of the software is free * Many of the tools require a command line to use * Many of the tools require large compute resources, either a cluster or lots of memory It is probably the last reason that dominates many of the discussions: many of the software tools you will want to use require more resources than your laptop or desktop machine can provide. You can definitely run the BLAST suite of programs on your laptop, but for large data sets like a whole metagenome, you probably want to run that on a server so that you can go and do other things while the program is running. Most servers that you will access run the operating system linux, or a similar variant of the Unix operating system. Linux was started in 1991 as a PhD project by Linus Torvalds, and the initial release was not very powerful, however Linus released his software using an open source license that allows anyone to use and modify the source code, either commercially or not. This meant that other developers could pick up on his work, and several groups and companies developed operating systems based on the linux kernel (the piece that does all the work). RedHat has had the most success in the so-called enterprise sphere (selling servers to companies), and because RedHat s Fedora software is built on open-source software, it also releases all of its changes and updates. There is a free version of RedHat s operating system, called CentOS, that many servers use (especially those in academic settings!). CentOS is designed with performance in mind, and is designed to be run on servers that are very rarely rebooted. Therefore, only essential system changes are made between releases of the operating system. Another flavor of Linux is called Debian (after founder Ian Murdock and his then-girlfriend Debra) developed some elegant mechanisms to keep the software up-to-date. Again, Debian was built on free software, and a company (Canonical) built a smooth graphical user interface, and easy, graphical installation system, and released the software as Ubuntu. There are many connections between Ubuntu, Debian, CentOS, and Fedora, and most users can t really tell which operating system they are using they are all canonically called Linux! The reason that we are using Linux for the course is because it is open and free, there is lots of software available for it, and it will help you become familiar with the servers you will doubtless use for high-performance computing. In addition, we can provide you with a server that you can start and it already has everything installed for you! 1

Public and Private SSH keys You are probably used to using passwords to access websites, servers, and computers. Passwords are common because they are simple, but they are not very good. Even complex passwords can be guessed or cracked. We can, for example, look up passwords on websites like SecLists, we can use rainbow tables we can write software that attempts to crack passwords like John the Ripper, THC Hydra, Cain and Abel. A much more secure way of accessing computers is using keys. The concept is quite simple, there are two keys: a private key which is super-secret, and a public key, which you can do anything with. One of my public keys is shown a little further on in this document, but that s OK, because a public key on its own is useless. The way that the keys work is that when you open a new connection to the server, your connection has an ID and asks the server to use the specific public key you ve defined (typically, there is a default key that is used). The server generates a random string and encrypts it with the public key you are using. It sends that encrypted string back. The string can only be properly decrypted with the private key - it cannot be decrypted with just the public key. If you have the right private key, your computer can decrypt the string. Rather than sending the string back, your computer generates an MD5 hash of that random string and the session ID and returns the hash. Since both your computer and the server now have both the random string and the session ID, they can compare their notes and make sure you are allowed in. However, you don t need to worry about that part of it! Once I have made the keys, I end up with two files: 1. The private key: This is private and secret and you should never share it with anyone or put it anywhere that anyone else can see it. You put this on your laptop and leave it there. 2. The public key: You put this one on the server that you want to access. Now when I try and access the server, my computer opens up the secret private key, reads the contents, and calculates the public key based on the private key. It sends that public key and says this is what I have, do you accept it. The server compares the public key with what it has in its file of public keys, and either lets you in, or not! Once you have successfully set up public/private key security, you never need to enter a password again! Here s how to set up public and private keys. MacOS/Linux Open a terminal and type: 2

ssh-keygen Accept the default values, and you have public keys in ~/.ssh/id_rsa.pub and private keys in ~/.ssh/id_rsa You don t need to do anything to use them, they will be used by ssh by default. Windows Download PuTTY. Note: For PuTTY, there are some hacked versions of PuTTY floating around the internet and you don t want to inadvertently download one of those as it may give the bad guys access to your machine. There are also versions with advertisements, and other crap added on. I always start at https://www.putty.org/ and go from there. Do not Google for it, just go direct to the website and click the download link. You want either the 32-bit or 64-bit MSI Windows Installer, depending on whether your computer has 32- or 64-bits. Figure 1: 32 or 64 bits Note sure how many bits? This is from the PuTTY FAQ: Figure 2: how many bits Once you have downloaded PuTTY, you want to open the application called PuTTYgen. This will open a screen like the one shown below: 3

Figure 3: putty gen 4

Figure 4: randomness generator 5

Click the Generate button and create some randomness: This will create the key for you. Figure 5: the key MAKE SURE YOU SAVE THE KEYS! Save the public key as, for example, key.pub, and the private key as, for example, key.ppk. We will use them later! To use the keys, right click on the private key, and choose Load into Pageant Pageant is the PuTTY key agent, and several programs can access the keys from there. 6

Accessing Your Linux Server Windows Accessing Your Server via PuTTY Figure 6: accessing the server To access your server using PuTTY, load the private key into Pageant as shown above and then start PuTTY. Enter the IP address of your computer and click Open. On the next screen, where it says Login as: enter your username. You will be logged into your server and you can continue with the course. Usernames: If you are using the standard AWS image, the username is most likely ec2-user. If you are using an Ubuntu AWS image the username is most likely ubuntu. If you are not sure, you should contact whoever made your image. 7

Password There should be no password because we are using ssh keys! Once you log in you should see a screen like this: Figure 7: logged in Putting and Retrieving Files using Filezilla To get files onto and off of the server, you should probably use FileZilla on Windows. Start by heading to the filezilla website and click on the download link for the client: Figure 8: download filezilla On the next screen, just choose the regular (free) FileZilla. Install filezilla using the default options, and then open it. In the boxes at the top, enter your IP address, username, and password. In this example my IP address was 54.206.35.210. Enter port 22 as we will use secure copying, and then click Quickconnect. You will be asked a couple of questions that you can agree to. You probably want to save the passwords. I don t worry about a master password, but if you are accessing sensitive information you should. and we want to remember the server so we don t get asked each time: 8

Figure 9: filezilla quickconnect Figure 10: remember passwords? Figure 11: remember server? 9

And then you can drag and drop files from the left side which is your local computer and the right side which is the server: Figure 12: split screen filezilla view Accessing Your Server via VcXsrv [Note: You only need to do this if you want to run things under x-windows] We will also want to access your server using an X11 emulator (don t worry what that means, basically it means we can use Google Chrome on the server!) Before we start, make sure to right click on your private key and choose Load into Pageant. Next, we are going to install VcXsrv for windows. Go to this link, and download the latest version. Click and run the installer that downloads: Now start a program on the client: We start by entering the main information, your username (mine is robe) and the IP address of your computer (in the example here, my IP address is 129.114.17.72). Leave extra settings as the default 10

Figure 13: installer 11

Figure 14: start a program on the client 12

Figure 15: start a program2 13

Figure 16: default extra settings 14

Click Finish, and OK on the next screens, and after a few seconds you will be logged in. The screen will look something like this: Figure 17: VCXsrv running Using Linux Linux has many of the same tools that you already use in MacOS or Windows. For example, there is an office suite of applications including LibreOffice Writer (a replacement for MS Word), LibreOffice Calc (a replacement for MS Excel), and LibreOffice Impress (a replacement for MS PowerPoint). There are calculators and games. However, the real strength in using Linux comes from using the command line, as we will do during the course. Using the command line means you can work from anywhere, and you are not wasting computing effort checking email and watching movies. You can access super computers and work on remote machines around the world. You can even do it from your cellphone (e.g. connectbot for Android or Terminus for iphone. It is faster and more efficient way of working. Here we ll take a tour around the command line to get you started or to act as a refresher. To move around in the command line, you type commands and press return. 15

There is a cheat-sheet of common commands below. Linux exercises for practice working with the command line Here are some exercises that you can try: * List the files in the directory. * Copy one file to another file * Look at the content of one file from the top * Look at the content of one file from the bottom * Check if two files are identical. * Delete a file * Make a test directory * Move some files to your new directory * Remove test directory (and the files) * Make five new directories one inside the other like level1/level2/level3/level4/level5 in one command. * Move a file to the innermost directory. * Copy a file to your home directory * Search for a word in a file Linux cheat sheet Command Meaning cd Change directory [to the directory parent directory, the last [.. directory you were in, or - your home directory] ~] mkdir Make directory directory mv Move file from filepath1 filepath1 to location at filepath2 filepath2 cp Copy file from filepath1 filepath1 to location at filepath2 filepath2 pwd Show current directory rm remove a file (once removed, it is gone!) ls List contents of directory directory (leave directory blank for current) ls List contents of directory -l in long-listing format directory (leave directory blank for current) cat Print the entire contents file of a file head Print out first 10 lines of file file 16

Command Meaning tail Print out last 10 lines of file file less View file in Terminal. file Use arrows to browse. Press q to quit. df Show disk space usage -h information wc Display the number of -l lines in the file. file diff file1 file2 grep word file sort uniq file cut -f 1 file xargs -n 1 command Display the differences between file1 and file2 Search for word in file Sort the input Look for duplicate words in the file Split the contents of a file, e.g. if it is in columns Use each entry as a parameter to pass to command Pipes One of the strengths of unix is the ability to pipe commands together. You can take the output of one command, and use it as the input to another command. This means that you can join arbitrary things together to generate new results. This is one of the reasons that unix is so popular among data analysts. The method for doing this is called piping, and we use the vertical line symbol to pipe the input. For example, to sort the lines in a file we can combine two commands, cat and sort. The first command prints the entire contents of a file and the second sorts the lines on the input. For example, if we have a file called poem.txt, we can sort the lines using this command: cat poem.txt sort 17

You can sort things either alphabetically or numerically by adding a -n to the command line. There is another command, uniq, that looks for duplicated lines, and has an option -c to count the duplicates, and a command cut that allows you to print one column of a file. We can tie all these commands together to sort the contents of a file, count recurrent lines, and sort them numerically to generate a list of lines in the file. For example, if we have a text file containing subsystems in metagenomes, we can split the list into a specific column using a tab as a separator (the shortcut for a tab is ), sort those lines, identify and count duplicates, and then sort the output numerically. cut -f 2 -d$ \t subsystems.txt sort uniq -c sort -nr less stdout and stderr Linux machines have two output streams, places where they print things. One is designed for normal output and is called stdout (standard output), and the other is designed to print error messages and is called stderr. Normally, both of these print to the screen, however we can redirect one or both of them to print to a file. To redirect stdout, we use a greater than sign ( > ). For example, we can take the output from the command above and write it to the file subsystem_counts.txt: cut -f 2 -d$ \t subsystems.txt sort uniq -c sort -nr > subsystem_counts.txt Warning: This will overwrite any existing file called subsystem_counts.txt and you will loose all the content. Counting fasta sequences One of the challenges that we often face with bioinformatics is to count the number of sequences in a fasta file. You might be tempted to use a command like this: grep -c > dna.fna but remember the stdout section above! This will overwrite the file called dna.fna with an empty file and you will loose your DNA sequences! If you want to do that, you can escape the greater than sign. That means it is not really recognized at a redirection of stdout. grep -c \> dna.fna When we look at Python, later in the course, we ll write some (safer) programs for counting fasta sequences. 18

Getting Help There are lots of places that you can get help, but you should consider these sources first: 1. The applications help menu. If you are running a command line application, usually (but not always) adding -h after the name will print a help menu. For example, one of the commands that we are going to use later is called bowtie2. You can access the help menu by typing: bowtie2 -h Notice, however, that prints a lot of information and the top part goes off the top of the screen. We can control that by piping it to one of the commands that we have already seen, less: bowtie2 -h less This will paginate the output, press space to see the next page or q to quit. Sometimes developers write the help pages to standard error. This is generally bad practice, but they do it any way. As we saw above, there are two outputs on linux (and all machines), one for error codes and one for normal output. You can join those two outputs so that they appear in the same place with this weird construct: 2>&1. Therefore you can redirect the output from the standard error (stderr) to standard output (stdout). For example, compare the output of asking help from metabat using these two approaches: metabat -h less and metabat -h 2>&1 less 2. Man pages. Most applications come with a detailed manual that is already installed on your computer. For example, to see the manual for less, the application we use above, type: man less you can try man for any application, most have manual pages, but not all. 3. Google searches Google for any application with tutorial or how-to after the name and you will usually find a helpful description of what is wrong. The Stack Overflow website is particularly useful for linux help. 19