UNIX, GNU/Linux and simple tools for data manipulation

Similar documents
A Brief Introduction to the Linux Shell for Data Science

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

Introduction To Linux. Rob Thomas - ACRC

Introduction to UNIX command-line

Unix System Architecture, File System, and Shell Commands

Introduction: What is Unix?

Chapter-3. Introduction to Unix: Fundamental Commands

Shell Programming Overview

Chapter Two. Lesson A. Objectives. Exploring the UNIX File System and File Security. Understanding Files and Directories

Introduction to Linux

Shells and Shell Programming

Chapter 1 - Introduction. September 8, 2016

Shells and Shell Programming

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

Computer Systems and Architecture

Perl and R Scripting for Biologists

Scripting Languages Course 1. Diana Trandabăț

Overview LEARN. History of Linux Linux Architecture Linux File System Linux Access Linux Commands File Permission Editors Conclusion and Questions

Unix Introduction to UNIX

CSC209H Lecture 1. Dan Zingaro. January 7, 2015

Operating Systems. Copyleft 2005, Binnur Kurt

Operating Systems 3. Operating Systems. Content. What is an Operating System? What is an Operating System? Resource Abstraction and Sharing

Introduction to Unix: Fundamental Commands

Introduction to UNIX command-line II

ITST Searching, Extracting & Archiving Data

Std: XI CHAPTER-3 LINUX

Introduction of Linux

Examples: Directory pathname: File pathname: /home/username/ics124/assignments/ /home/username/ops224/assignments/assn1.txt

Introduction. Let s start with the first set of slides

Computer Systems and Architecture

Module 8 Pipes, Redirection and REGEX

Shells. A shell is a command line interpreter that is the interface between the user and the OS. The shell:

INTRODUCTION TO LINUX

Introduction to Linux. Woo-Yeong Jeong Computer Systems Laboratory Sungkyunkwan University

Reading and manipulating files

Getting to grips with Unix and the Linux family

Introduction to Linux. Fundamentals of Computer Science

Linux for Beginners. Windows users should download putty or bitvise:

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Mineração de Dados Aplicada

Introduction to Linux

Operating System Interaction via bash

Essential Unix and Linux! Perl for Bioinformatics, ! F. Pineda

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University

Linux & Shell Programming 2014

EECS2301. Lab 1 Winter 2016

Linux & Shell Programming 2014

Virtual Machine. Linux flavor : Debian. Everything (except slides) preinstalled for you.

UNIX COMMANDS AND SHELLS. UNIX Programming 2015 Fall by Euiseong Seo

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file

Linux Command Line Primer. By: Scott Marshall

Introduction to Linux

Filesystem Hierarchy and Permissions

CENG 334 Computer Networks. Laboratory I Linux Tutorial

*nix Crash Course. Presented by: Virginia Tech Linux / Unix Users Group VTLUUG

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Basic Unix Command. It is used to see the manual of the various command. It helps in selecting the correct options

Chap2: Operating-System Structures

Introduction to Linux Basics

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois

5/20/2007. Touring Essential Programs

COMS 6100 Class Notes 3

Linux Essentials Objectives Topics:

- c list The list specifies character positions.

GNU/Linux 101. Casey McLaughlin. Research Computing Center Spring Workshop Series 2018

Filesystem Hierarchy and Permissions

Linux Essentials. Programming and Data Structures Lab M Tech CS First Year, First Semester

CS370 Operating Systems

Lecture 5. Essential skills for bioinformatics: Unix/Linux

UNIX files searching, and other interrogation techniques

DATA 301 Introduction to Data Analytics Command Line. Dr. Ramon Lawrence University of British Columbia Okanagan

Why learn the Command Line? The command line is the text interface to the computer. DATA 301 Introduction to Data Analytics Command Line

Introduction to Linux

Overview of the UNIX File System

UNIX. The Very 10 Short Howto for beginners. Soon-Hyung Yook. March 27, Soon-Hyung Yook UNIX March 27, / 29

Basic Linux Command Line Interface Guide

Introduction. File System. Note. Achtung!

Unix Filesystem. January 26 th, 2004 Class Meeting 2

Basic Survival UNIX.

Introduction to Linux (Part I) BUPT/QMUL 2018/03/14

The input can also be taken from a file and similarly the output can be redirected to another file.

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

File Commands. Objectives

TNM093 Practical Data Visualization and Virtual Reality Laboratory Platform

Introduction to Linux

The Unix Family. Linux 101. GNU/Linux distributions. Even more. MULTICS MIT/GE/Bell Labs. UNIX AT&T (Bell Labs) Mach, Carnegie Mellon

The Online Unix Manual

Introduction to the Shell

Chapter 4. Unix Tutorial. Unix Shell

Introduction to Unix and Linux. Workshop 1: Directories and Files

Introduction to Linux. Roman Cheplyaka

Welcome to getting started with Ubuntu Server. This System Administrator Manual. guide to be simple to follow, with step by step instructions

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Basics. I think that the later is better.

Getting your department account

Introduction to Linux (and the terminal)

Basic Linux Command Line Interface Guide

Files and Directories

Transcription:

UNIX, GNU/Linux and simple tools for data manipulation Dr Jean-Baka DOMELEVO ENTFELLNER BecA-ILRI Hub Basic Bioinformatics Training Workshop @ILRI Addis Ababa Wednesday December 13 th 2017 Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 1 / 37

1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 2 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 3 / 37

UNIX & GNU/Linux: introduction GNU/Linux is an operating system (OS). GNU/Linux fully belongs to a broad family of OSes, the UNIX family. Operating system: definition unique interface between the computer (hardware) and the different programs (software) users run on it allows different programs and different users to use concurrently the same machine implements a filesystem, a console environment, a graphical environment, drivers for keyboard and mouse, etc examples of operating systems: Windows (Microsoft), Mac OS X (Apple), Android (Google), GNU/Linux, FreeBSD, etc Linux is only the kernel of GNU/Linux systems, responsible for granting access to the resources on the host and for time-sharing between processes. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 4 / 37

UNIX & GNU/Linux systems: timeline GNU/Linux: a fairly recent member of an old and huge family (see http://www.levenez.com/unix/) 1969: UNICS 1971: UNIX Time-Sharing System V1 1982: SunOS 1.0 1983: UNIX System V 1991: GNU project (GNU/Hurd) ; Linux 0.01 1994: Linux 1.0 1999: Darwin 0.1 ; Mac OS X Server 1.0 2008: Android 1.0 (derived from Linux 2.6.23) 2013: Linux 3.9 Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 5 / 37

Linux distributions: different flavours of the same OS The GNU/Linux operatring system comes in different distributions. Three distributions have ever been true beacons and gave many offsprings: 1 Debian (1993) Ubuntu, 2004 and Linux Mint, 2010 2 Slackware (1993), from SLS (1992) SuSE, 1998 3 RedHat (late 1994) CentOS and Fedora, both 2003 For a full account, see http://futurist.se/gldt Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 6 / 37

What makes UNIX systems superior to the Windows family UNIX gives you more control over your computer (no hidden actions, no undesired pieces of software). Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 7 / 37

What makes UNIX systems superior to the Windows family UNIX gives you more control over your computer (no hidden actions, no undesired pieces of software). UNIX environments are free from viruses. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 7 / 37

What makes UNIX systems superior to the Windows family UNIX gives you more control over your computer (no hidden actions, no undesired pieces of software). UNIX environments are free from viruses. UNIX enables you to harness the full computational power of your machine. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 7 / 37

What makes UNIX systems superior to the Windows family UNIX gives you more control over your computer (no hidden actions, no undesired pieces of software). UNIX environments are free from viruses. UNIX enables you to harness the full computational power of your machine. UNIX systems have been designed from their origin to be massively multi-user and multi-process systems. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 7 / 37

What makes UNIX systems superior to the Windows family UNIX gives you more control over your computer (no hidden actions, no undesired pieces of software). UNIX environments are free from viruses. UNIX enables you to harness the full computational power of your machine. UNIX systems have been designed from their origin to be massively multi-user and multi-process systems. UNIX systems are much more secure than any Windows. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 7 / 37

What makes UNIX systems superior to the Windows family UNIX gives you more control over your computer (no hidden actions, no undesired pieces of software). UNIX environments are free from viruses. UNIX enables you to harness the full computational power of your machine. UNIX systems have been designed from their origin to be massively multi-user and multi-process systems. UNIX systems are much more secure than any Windows. Take-home message The true power of UNIX (and so of GNU/Linux) lies in its commandline interface. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 7 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 8 / 37

Bash: a shell environment Bash is the most popular shell environment on GNU/Linux systems. It stands for "Bourne Again Shell". Shell environments are designed to: interact with the host filesystem (browse and create directories, see the content of files, etc), Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 9 / 37

Bash: a shell environment Bash is the most popular shell environment on GNU/Linux systems. It stands for "Bourne Again Shell". Shell environments are designed to: interact with the host filesystem (browse and create directories, see the content of files, etc), interact with the installed software (install, run, etc), Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 9 / 37

Bash: a shell environment Bash is the most popular shell environment on GNU/Linux systems. It stands for "Bourne Again Shell". Shell environments are designed to: interact with the host filesystem (browse and create directories, see the content of files, etc), interact with the installed software (install, run, etc), login to distant hosts (telnet, ssh), Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 9 / 37

Bash: a shell environment Bash is the most popular shell environment on GNU/Linux systems. It stands for "Bourne Again Shell". Shell environments are designed to: interact with the host filesystem (browse and create directories, see the content of files, etc), interact with the installed software (install, run, etc), login to distant hosts (telnet, ssh), perform all of the above through automated processes scripts. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 9 / 37

Bash: a shell environment Bash is the most popular shell environment on GNU/Linux systems. It stands for "Bourne Again Shell". Shell environments are designed to: interact with the host filesystem (browse and create directories, see the content of files, etc), interact with the installed software (install, run, etc), login to distant hosts (telnet, ssh), perform all of the above through automated processes scripts. Shells are at the same time commandline environments (run one command at a time) and scripting environments (write and run scripts). Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 9 / 37

Bash: a shell environment Bash is the most popular shell environment on GNU/Linux systems. It stands for "Bourne Again Shell". Shell environments are designed to: interact with the host filesystem (browse and create directories, see the content of files, etc), interact with the installed software (install, run, etc), login to distant hosts (telnet, ssh), perform all of the above through automated processes scripts. Shells are at the same time commandline environments (run one command at a time) and scripting environments (write and run scripts). On most GNU/Linux distributions, Bash is accessible through the "Terminal" icon. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 9 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 10 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) ls -l (command plus an option) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) ls -l (command plus an option) ls -l -h h3a* (command, two options and one object) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) ls -l (command plus an option) ls -l -h h3a* (command, two options and one object) ls -lh h3a* (single-letter options can be concatenated) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) ls -l (command plus an option) ls -l -h h3a* (command, two options and one object) ls -lh h3a* (single-letter options can be concatenated) cp one two (command and two objects) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) ls -l (command plus an option) ls -l -h h3a* (command, two options and one object) ls -lh h3a* (single-letter options can be concatenated) cp one two (command and two objects) man head (command and one object) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) ls -l (command plus an option) ls -l -h h3a* (command, two options and one object) ls -lh h3a* (single-letter options can be concatenated) cp one two (command and two objects) man head (command and one object) head -n 2 one (an option with a value) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Standard structure of a UNIX command Synopsis of a command <command> <options> <objects> For example: ls (only the command) ls -l (command plus an option) ls -l -h h3a* (command, two options and one object) ls -lh h3a* (single-letter options can be concatenated) cp one two (command and two objects) man head (command and one object) head -n 2 one (an option with a value) head --lines=2 one (same command, POSIX-style long option) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 11 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 12 / 37

UNIX filesystems Filesystems are hierarchies. The filesystem of a UNIX machine is standardized. Under the root (/) are: /bin essential command binairies /boot static files of the boot loader /dev device files (special files to access your devices) /etc host-specific system configuration files /home user home directories (e.g. /home/peter, /home/sarah, etc) /lib essential shared librairies and kernel modules /media mount point for removable media (e.g. CD-ROMs & flash disks) /mnt old-style mount point for any media /tmp system-wide temporary folder, writable by anyone Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 13 / 37

File permissions Three (four) types of rights: right to read from a file (r) right to write to it (w) right to execute a binary file or a script (x) right to traverse a directory (x) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 14 / 37

File permissions Three (four) types of rights: right to read from a file (r) right to write to it (w) right to execute a binary file or a script (x) right to traverse a directory (x) Three types of people: the owner of a file (u) the other members of the user s group (g) the rest of the world, the others (o) Typical line of output from ls -l -rw-r--r-- 1 jbde jbde 171104 juil. 6 12:48 awk.dvi Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 14 / 37

File permissions explained Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 15 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 16 / 37

Why it is often necessary to quote strings or escape chars Some characters have a special meaning for the tools you use, e.g. the commandline interpreter Bash: spaces or tabs are logical separators between elements on the commandline: cd /tmp a dollar sign introduces Bash variables: echo $PATH a star means all the files (wildcard): cat * the greater than sign is interpreted as a redirection: cat * > listing.txt the vertical bar pipes the output of some command into the input of another: grep h3a long_course.htm wc -l... Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 17 / 37

Why it is often necessary to quote strings or escape chars Some characters have a special meaning for the tools you use, e.g. the commandline interpreter Bash: spaces or tabs are logical separators between elements on the commandline: cd /tmp a dollar sign introduces Bash variables: echo $PATH a star means all the files (wildcard): cat * the greater than sign is interpreted as a redirection: cat * > listing.txt the vertical bar pipes the output of some command into the input of another: grep h3a long_course.htm wc -l... escaping or quoting prevents these characters from being interpreted by the shell. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 17 / 37

Escaping a single character In Unix, prepending a backslash (\) escapes the character following the backslash. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 18 / 37

Escaping a single character In Unix, prepending a backslash (\) escapes the character following the backslash. > echo $PATH /home/jbde/bin:/usr/local/bin:/usr/bin:/bin > echo \$PATH $PATH Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 18 / 37

Escaping a single character In Unix, prepending a backslash (\) escapes the character following the backslash. > echo $PATH /home/jbde/bin:/usr/local/bin:/usr/bin:/bin > echo \$PATH $PATH And if a filename contains spaces, e.g. named with spaces.txt: > cat named with spaces.txt cat: named: No such file or directory cat: with: No such file or directory cat: spaces.txt: No such file or directory > cat named\ with\ spaces.txt <produces the content of the file> Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 18 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 19 / 37

Strong quoting with single quotes You can also quote a string to prevent included spaces to be interpreted: > cat 'named with spaces.txt' <produces the content of the file> Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 20 / 37

Strong quoting with single quotes You can also quote a string to prevent included spaces to be interpreted: > cat 'named with spaces.txt' <produces the content of the file> Generally speaking, simple quote do not allow any kind of interpretation/substitution/expansion. > echo 'Your PATH variable contains $PATH' Your PATH variable contains $PATH Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 20 / 37

Weak quoting with double quotes While preventing included spaces to be interpreted, double quotes allow expansion of Bash variables: > cat "named with spaces.txt" <produces the content of the file> > echo "Your PATH variable contains $PATH" Your PATH variable contains /home/jbde/bin:/usr/local/bin:/usr/b Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 21 / 37

Using Bash every day Bash has nice features you should use to work efficiently: the history of previous commands (browse vith,, Ctrl+R) autocompletion with the <TAB> key everywhere you can (commands, filenames, etc) wildcards and regexps use quoting appropriately pipe commands into each other ( ) redirect output (> erases previous file, >> appends) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 22 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 23 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 24 / 37

Asking for help on a command: man This is the absolute basic command, to learn first! man ls To browse within the manpage: <Space>: next page b: previous page G: goto the bottom g: goto the beginning /: search an expression (indicate pattern or string and press <Enter>) q: quit and return to commandline Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 25 / 37

Sectioning of a manpage Manpages are all written using the same format/sectioning: 1 NAME: the name of the command 2 SYNOPSIS: the syntax of the command (sometimes several lines to describe several ways of using the command) square brackets ([...]) indicate optional components pipes ( ) within a construct separates alternatives ellipsis (...) usually indicate that the previous object is repeatable 3 DESCRIPTION and OPTIONS: meaning and behaviour of the different options and objects to give on the commandline 4 EXAMPLES: the most useful section, provides real-world examples along with some explanation of what they do 5 EXIT STATUS: useful in scripts, to monitor automatically whether the command execution produced and error 6 SEE ALSO: also useful when you don t know exactly the name of a command but know a similar/sister one (e.g. uniq and join are cross-referenced) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 26 / 37

Outline 1 UNIX & GNU/Linux: brief history and introduction 2 Using the Bash shell Your first commands Filesystems and permissions Bash special characters and features Quoting in Bash 3 So many tools You CANNOT live without your man Data manipulation commandline tools Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 27 / 37

Reading files: cat and less cat produces the full content of file(s) to the standard output can concatenate several files: cat FILE1 FILE2 > FILE3 is non-interactive: prints all and quits Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 28 / 37

Reading files: cat and less cat produces the full content of file(s) to the standard output can concatenate several files: cat FILE1 FILE2 > FILE3 is non-interactive: prints all and quits less is a pager produces the full content of file(s) to the standard output, one page at a time several files are processed one after the other: less FILE1 FILE2 and then :n (next) and :p (previous) to browse is fully interactive: <space> for next page, b for the previous, / to search, q to quit, etc useful option: -S not to have your lines automatically wrapped (preserves column alignment on long lines) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 28 / 37

Count the numbers of chars, words or lines: wc wc stands for "word count" wc -l FILE number of lines wc -c FILE number of bytes ( chars) wc -w FILE number of words wc -L FILE length of longest line in file Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 29 / 37

Select columns from a file: cut Simplified syntax cut -f <fields> -d <delimiter> FILE be sure you quote the delimiter, e.g. ``;'' <fields> can be a comma-separated list (ranges indicated with hyphens) Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 30 / 37

Select columns from a file: cut Simplified syntax cut -f <fields> -d <delimiter> FILE be sure you quote the delimiter, e.g. ``;'' <fields> can be a comma-separated list (ranges indicated with hyphens) Example: select fields 2 and 5 from a semicolon-separated file cut -f 2,5 -d ';' cut_example.csv Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 30 / 37

Select columns from a file: cut Simplified syntax cut -f <fields> -d <delimiter> FILE be sure you quote the delimiter, e.g. ``;'' <fields> can be a comma-separated list (ranges indicated with hyphens) Example: select fields 2 and 5 from a semicolon-separated file cut -f 2,5 -d ';' cut_example.csv Example: specify output separator cut -f 1-3 -d ';' --output-separator=$'\t' cut_example.csv Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 30 / 37

Select columns from a file: cut Simplified syntax cut -f <fields> -d <delimiter> FILE be sure you quote the delimiter, e.g. ``;'' <fields> can be a comma-separated list (ranges indicated with hyphens) Example: select fields 2 and 5 from a semicolon-separated file cut -f 2,5 -d ';' cut_example.csv Example: specify output separator cut -f 1-3 -d ';' --output-separator=$'\t' cut_example.csv Example: extract only the first three characters of each line cut -c 1-3 cut_example.csv Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 30 / 37

Sort a file according to some rules: sort sort sorts text files according to the content of some fields, called keys. Example: sorting lines alphabetically sort cut_example.csv Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 31 / 37

Sort a file according to some rules: sort sort sorts text files according to the content of some fields, called keys. Example: sorting lines alphabetically sort cut_example.csv But it s usually not a good idea not to control the way sort sorts. Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 31 / 37

Sort a file according to some rules: sort sort sorts text files according to the content of some fields, called keys. Example: sorting lines alphabetically sort cut_example.csv But it s usually not a good idea not to control the way sort sorts. Example: sort according to 2 nd and then 3 rd field (semicol-separated fields) sort -t ';' -k 2,3 cut_example.csv Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 31 / 37

Sort a file according to some rules: sort sort sorts text files according to the content of some fields, called keys. Example: sorting lines alphabetically sort cut_example.csv But it s usually not a good idea not to control the way sort sorts. Example: sort according to 2 nd and then 3 rd field (semicol-separated fields) sort -t ';' -k 2,3 cut_example.csv Example: sort numerically (-n) according to 9 th field only sort -t ';' -n -k 9,9 cut_example.csv # to check results: sort -t ';' -n -k 9,9 cut_example.csv cut -f 9 -d ';' Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 31 / 37

sort, continued -g option to sort numerical fields containing scientific notation: sort -k 2,2 -n with_sci_notation # unexpected result sort -k 2,2 -g with_sci_notation # GOOD! Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 32 / 37

sort, continued -g option to sort numerical fields containing scientific notation: sort -k 2,2 -n with_sci_notation # unexpected result sort -k 2,2 -g with_sci_notation # GOOD! WARNING!! sort relies heavily on your locale setting! Try: LC_ALL=fr_FR.utf8 sort -k 2,2 -g with_sci_notation Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 32 / 37

sort, continued -g option to sort numerical fields containing scientific notation: sort -k 2,2 -n with_sci_notation # unexpected result sort -k 2,2 -g with_sci_notation # GOOD! WARNING!! sort relies heavily on your locale setting! Try: LC_ALL=fr_FR.utf8 sort -k 2,2 -g with_sci_notation One-letter sorting options can be used as flags, and several fields specified: Ascending order on the 5 th field, descending on the 6 th and then alphabetically on the 1 st field sort -k 5,5g -k 6,6nr -k 1,1 hmmsearch_raw_output less -S Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 32 / 37

sort, some caveats WARNING!! by default, sort separates fields on blank to non-blank transitions. careful with empty fields! One should specify the delimiter. A precise delimiter to prevent sort from merging delimiters sort -k 11,11 -t $'\t' CDS_top_100.txt cut -f 11 less Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 33 / 37

join lines of two files sharing a common field join allows you to perform the relational join operation on two files. Example: I want to select the lines of FILE2 whose 11 th field corresponds to an entry in FILE1. join -1 1-2 11 -t $'\t' dg_top_100.txt CDS_top_100.txt WARNING!! join operates on files already sorted on the join field! Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 34 / 37

Produce only the n last lines of a file: tail Convenient to cut parts you are not interested in, for instance because: the final lines of a log file contain the error that matters to you the header (first few lines) of the file is of no interest for the next tool in the pipeline the file is sorted and the last lines contain the samples of interest: you set a cutoff Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 35 / 37

Produce only the n last lines of a file: tail Convenient to cut parts you are not interested in, for instance because: the final lines of a log file contain the error that matters to you the header (first few lines) of the file is of no interest for the next tool in the pipeline the file is sorted and the last lines contain the samples of interest: you set a cutoff Produce the last 30 lines of a file tail -n 30 input_file or simply: tail -30 input_file Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 35 / 37

Produce only the n last lines of a file: tail Convenient to cut parts you are not interested in, for instance because: the final lines of a log file contain the error that matters to you the header (first few lines) of the file is of no interest for the next tool in the pipeline the file is sorted and the last lines contain the samples of interest: you set a cutoff Produce the last 30 lines of a file tail -n 30 input_file or simply: tail -30 input_file Produce all the lines from the 30th tail -n +30 input_file Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 35 / 37

Symmetrical to tail: head Produce the first 30 lines of a file head -n 30 input_file or simply: head -30 input_file Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 36 / 37

Symmetrical to tail: head Produce the first 30 lines of a file head -n 30 input_file or simply: head -30 input_file Produce all but the last 30 lines head -n -30 input_file Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 36 / 37

Translate chars with tr tr helps you change any occurrence of a character into another: Translating Windows end-of-lines into UNIX ones cat Win_formatted_file tr '\r' '\n' > UNIX_formatted_file Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 37 / 37

Translate chars with tr tr helps you change any occurrence of a character into another: Translating Windows end-of-lines into UNIX ones cat Win_formatted_file tr '\r' '\n' > UNIX_formatted_file Warning! tr only processes its standard input! Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 37 / 37

Translate chars with tr tr helps you change any occurrence of a character into another: Translating Windows end-of-lines into UNIX ones cat Win_formatted_file tr '\r' '\n' > UNIX_formatted_file Warning! tr only processes its standard input! But tr also comes handy to change separators in a CSV file: Translating semicols into tabulations cat example_mj.txt tr ';' '\t' Dr Jean-Baka DOMELEVO ENTFELLNER UNIX, GNU/Linux and simple tools for data manipulation 37 / 37