Introduction Into Linux Lecture 1 Johannes Werner WS 2017 Table of contents Introduction Operating systems Command line Programming Take home messages Introduction Lecturers Johannes Werner (j.werner@dkfz-heidelberg.de) Matthias Bieg (m.bieg@dkfz-heidelberg.de) Stephen Krämer (s.kraemer@dkfz-heidelberg.de) Dr. Matthias Schlesner (m.schlesner@dkfz-heidelberg.de) Computational Oncology Group Division of Theoretical Bioinformatics German Cancer Research Center (DKFZ) Course procedure Lecture 9 units, Oct 23 - Nov 17, 2 pm - 3 pm Exercises subsequent to lecture, 3 pm - 6 pm Exam duration: 30 minutes, date will be announced Lecture content 1) Introduction into Linux 2) Introduction into Python 3) Lists and loops 4) File handling 5) Functions 6) Dictionaries & sorting 7) Regular expressions I 8) Modules I 9) Modules II 1
Operating systems What is an operating system? An operating system (OS) is system software that manages computer hardware and software resources and provides common services for computer programs. The operating system is a component of the system software in a computer system. Application programs usually require an operating system to function. (... ) Wikipedia 2
What is an operating system? Linux free operating system primarily oriented at command line 498 of TOP500 run with Linux around 600 Linux distributions 3
Command line Insights Characteristics stable and independent from GUI fast lots of tasks can only be performed from the command line functionality directory navigation, search and filter files, extract columns, file conversion,... working with large files combination of different commands simple design of reproducible workflows 4
Getting used to the command line $ pwd $ ls -la $ tail -n 50 /var/log/apache2/error_log $ grep -c ^> sequence.fasta $ blastn -db nt -query sequence.fasta \ -out results.out $ grep -v ^> sequence.fasta while read line; \ do echo -n $line wc -c; done paste -sd+ bc Directory tree Folders. bin boot dev. etc home lib. media mnt opt. proc root sbin. srv tmp usr. var.. Working with directories changing directories listing content of directory showing path of directory $ cd /home/user/data $ cd.. # no arg -> home $ pwd /home/user $ cd data $ pwd /home/user/data $ ls $ ls -a $ ls -R Creating and deleting directories creating directories deleting directories 5
# create folder sequences in current dir $ mkdir sequences # remove empty_folder (if empty) $ rmdir empty_folder # delete non_empty_folder # DANGEROUS! There is no un-rm! $ rm -rf non_empty_folder Getting help $ man gzip $ gzip --help $ gzip -h $ gzip built-in help manual pages searching manual pages $ apropos python ipython python python2 python2.7 python3 python3.5 Creating, copying and deleting files creating files copying and moving files deleting files $ touch empty_file $ cp sequence.fastq sequence_copy.fastq $ mv sequence_copy.fastq backup/ $ cd backup $ mv sequence_copy.fastq sequence.fastq.old $ rm old_file Permissions -rwxr-xr-x 1 sheldon nerd 269 14. Sep 09:30 start.py 6
Permissions -rw-r--r-- -> 644 drwxrwxr-x -> 775 lrwxr-xr-x -> 755 -rw------- -> 600 $ chmod 600 secret.cfg $ chmod -R 755 data_dir $ chmod u=rw file $ chmod g-rx file $ chmod o+r file Output redirections redirection into files output as input for following commands $./analyze_data.sh data.txt > results.txt $./analyze_weather.sh >> weather_stats.txt $ cat unsorted_results.txt sort $./analyze_data.sh data.txt tee results.txt Investigating files concatenate (cat, tac) pager (less, more) view top/end of files $ cat file $ cat file1 file2 > largefile $ tac file $ cat file less $ cat file more $ head -n 50 long_file $ tail -n 50 /var/log/apache2/messages.log Getting enhanced information about files character and line counts (wc) powerful search tool (grep) 7
advanced tools (sed, awk) $ wc -l file $ wc -c sequence.fasta $ grep > sequence.fasta $ grep -c ^@PANCAN reads.fastq $ sed /^#/d script.sh wc -l $ history awk {print $2} sort uniq -c sort -rn head Bash programming #!/bin/bash LOG_DIR=/var/log cd $LOG_DIR cat /dev/null > messages cat /dev/null > wtmp echo "Logs cleaned up." exit 0 Programming Why programming? Programming in bioinformatics development of bioinformatic software data analysis sequence analysis (SNVs, Indel, structural variations) assembly, alignment taxonomic and functional profiling (metagenomics) 16S analysis statistical evaluation Python relatively easy to learn source code readibility wide range of modules interactive shell strongly used in bioinformatics applications 8
Take home messages Take home messages why using Linux in bioinformatics benefits of command line get familiar with command line working with directories getting help copying/moving/deleting files and directories permissions redirections pagers investigating files (cat, head, tail, wc) search tools with regular expressions (grep) 9