Intro to UNIX
Using Linux as a Virtual Machine We will use the VMware Player to run a Virtual Machine which is a way of having more than one Operating System (OS) running at once. Your Virtual OS (Linux) will run within your primary OS (Windows). There will be shared resources between your primary OS and your Virtual Machine such as memory and CPUs. We will use the Linux distribution Ubuntu 10.04 as our Virtual OS. Start VMware and load your Virtual Machine from the USB drive. User Name Password genomics genomics7620
Finding your way around the UNIX directory structure / root tmp etc home sarah genomics scripts genomics_lab var log www / /root /tmp /etc /home /home/sarah /home/genomics /home/genomics/scripts /home/genomics/genomics_lab /var /var/log /var/www
Unix Commands pwd ls # show the name of your present working directory # list files and directories in the present working directory mkdir scripts # make a directory called scripts File and Directory Names are Case Sensitive. suggestions: Use underscore instead of spaces in file names and directories. Use lower case characters for file names and directories
Gedit text editor From the UNIX terminal command line enter this command to start gedit gedit my_favorite_food.txt & the & will detach gedit from the terminal so you can continue to use the terminal Naming files: use all lower case characters separate words with an underscore make the file name very descriptive of what is in the file even if its rather long common file extensions:.pl a Perl script.fa FASTA formatted sequences.fastq genome sequencing data files.gz a file that has been compressed (zipped) to reduce file size (.zip).txt a generic text file.tsv values are separated by a tab (tab separated values).csv comma separated values.bed genomic data file format for genome browsers
Types of File Paths pwd cd scripts mkdir ws{1..15} cd../genomics_lab # show the present working directory # change pwd to the scripts directory # make 15 directories, one for each workshop # type: cd../gen then hit the tab key Relative path cd../downloads/ # relative paths do not begin with / cd /home/genomics/downloads # absolute path begin with / /ho<tab>gen<tab>down<tab> Absolute path cd /home/genomics cd ~/ cd All of these will take you to your home directory
Unix Commands have Arguments Arguments are like options not verbal conflicts. Search for arguments in the ls manual page man ls /all n N q # browse the manual page for ls # search for the text all # next found text # previous found text # quit double dash means there is a single argument which is usually a descriptive word ls all ls al # has one argument; show hidden files hidden files begin with.. current directory.. back one directory # has two arguments; show hidden files and show details single dash means each character is an argument
Search the man page of the ls command ls lh ls lt # list file details as human readable # list file details and sort by modification time Getting around in the man page: /human n up-arrow key down-arrow key page-up key page-down key q # search for the term 'human' # next found search term # move up in the man page # move down in the man page # move up one page # move down one page # quit
wget ftp://ftp.ncbi.nih.gov/genomes/h_sapiens/protein/protein.fa.gz wget ftp://ftp.ncbi.nih.gov/genomes/h_sapiens/rna/rna.fa.gz mkdir genomes/human man mkdir # search how to make parent directories as necessary mkdir p genomes/human mv protein.fa.gz genomes/human/ # move file to a different directory cd genomes/human/ ls../.. cp../../rna.fa.gz. rm../../rna.fa.gz # list files two directories back # copy file to current directory # remove a file using relative path
What is the size of the file protein.fa.gz? Preview a zipped file without opening entire file into memory: zmore protein.fa.gz <spacebar> = next screen full q = quit gunzip protein.fa.gz more protein.fa # unzip a zipped file # preview a non-zipped file less protein.fa gzip protein.fa # searchable file preview <spacebar> = next screen full u = up one screen full / = search n = find next N = find previous q = quit # zip 'compress' a file
Make a directory in your home directory called states and go to the states directory Move the states.csv from your Downloads directory to your states directory.csv means all lines in the file have comma separated values wc states.csv wc l states.csv cat states.csv # show number of lines,characters and bytes # for non zipped files only # show just the number of lines in a file # output the entire file to screen clear head states.csv head 25 states.csv tail states.csv # clear the terminal screen: Ctrl+l # show first 10 lines of file # show first 25 lines of file # show last 10 lines of file
The grep command print lines matching a pattern grep New states.csv grep c New states.csv # show all lines matching the text 'New' # count the number of lines matching New > # save output to a file; overwrite if it already exists >> # append output to an existing file; create file if it doesn't exist grep New states.csv > new_states.csv # pipe used to process output again # create a new file that has all New states grep New states.csv grep Pine # show all lines that match the text 'New', then take the results and show all lines that match the text 'Pine'
Deleting (removing) files rm../my_favorite_food.txt rm states.csv new_states.csv cd.. rmdir states # remove 'delete' a file # hint: type: rm../my<tab> # always double check what you are deleting # remove multiple files at once # go back one directory # remove an empty directory
View your commands history list Your commands are saved even when you exit your terminal session history # show all your past commands used history grep wget # show all commands where you used wget
exit # exit the terminal session; Ctrl+d
Tape Archive Files (.tar) Consists of a single file which contains of a group of files and/or directories ncbi blast 2.2.25+ x64 linux.tar 346 Mb ncbi-blast-2.2.25+/ ncbi-blast-2.2.25+/changelog ncbi-blast-2.2.25+/license ncbi-blast-2.2.25+/ncbi_package_info ncbi-blast-2.2.25+/readme ncbi-blast-2.2.25+/doc/ ncbi-blast-2.2.25+/doc/readme.txt ncbi-blast-2.2.25+/bin/....tgz files (.tar.gz) Consists of a tar file that has been zipped (compressed) to make it smaller ncbi blast 2.2.25+ x64 linux.tar.gz 110 Mb
Download Blast (command line version) 1. Go to the Downloads directory in your home directory 2. Use Firefox to Search the web for a download Linux version of Blast and use wget to download file from the command line. Right click the link and select: Copy Link Location then paste in your terminal at the command line wget ftp://ftp /ncbi blast 2.2.25+ x64 linux.tar.gz 3. Decompress and open the tar file tar xzvf ncbi<tab> ls cd ncbi<tab> ls # look for files: INSTALL or README and read 4. View the README file gedit README & 5. View the user manual pdf xpdf doc/user_manual.pdf # q=quit
Review for next week Make sure you understand the following UNIX commands: pwd ls cd grep head tail wget cp mv rm mkdir rmdir cat more less man and the following directories:... ~/ and the following UNIX redirection operators: > >> This is the main list of UNIX commands that you will most likely use.