Finding files and directories (advanced), standard streams, piping Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP
Finding files or directories When you have lots of fles (potentially thousands!) in you system, fnding that one fle that you hno you have, but cantt remember here, can be a daunting tash There are t o diferent commands for fnding fles in Linux systems: locate find The find command is probably al ays present, hile locate might or might not be installed (although it is common) As The Linux Command Line booh says: locate Find Files The Easy Way find Find Files The Hard Way
locate locate fnd fles exclusively by name The locate program performs a rapid database search of path names, and then outputs every name that matches a given query Let s say e ant to fnd every fle that contains.zip in the name (including directories in its path); the command for locate ould be: locate.zip locate, as implied above, uses a pre-made database to looh up names The problem ith that is that a newly created fle ill not be found before the database is updated. Want proof, list the contents lihe this: ls ~jmalves
locate As you can see, there is a fle (testtest.zip) ith.zip in the name that as not found by locate That is because the database to locate is only updated at certain intervals typically once a day Thus, any fles created/deleted/renamed/etc. before the next update to the database ill not be seen by locate That, of course, can be a disadvantage The advantage of using the database is the speed of the loohup This was last week! How about now?
find The expression that find uses to select fles consists of one or more primaries, each of hich is a separate command line argument find evaluates the expression each time it processes a fle An expression can contain any of the follo ing types of primaries: Options Tests Actions For example: Operators find ~ -maxdepth 3 -name '*.pdf' -and -perm 777 -delete
find find ~ -maxdepth 3 -name '*.pdf' -and -perm 777 -delete command (find) option (-maxdepth 3) tests (-name '*.pdf' and -perm 777) action (-delete) path where the search should start (user s home: ~) logical operator (-and)
find find ~ -maxdepth 3 -name '*.pdf' -and -perm 777 -delete This search ill: Looh for fles and directories in the user s home directory (and its subdirectories, but ) Go do n at most three subdirectory levels (e.g., it ill search directories ~/dir1/subdir1/ and ~dir1/subdir1/subsub2/, but not ~/dir1/subdir2/subsub1/subsubsubx/) Looh for fles hose names end in.pdf AND hose permissions are 777 Finally, find ill delete fles that satisfy those conditions
find some tests -amin, -cmin, -mmin -anewer, -cnewer, -mnewer -atime, -ctime, -mtime -empty -executable -group -name, -iname -inum -newer -nogroup, -nouser -path -perm -readable -regex -samefile -size -type -user -writable etc. etc. etc.
find actions -delete -exec, execdir -fls, -ls -print, -fprint -print0, fprint0 -printf, fprintf -ok, -okdir -prune -quit
find operators \( \) -not,! -a, -and -o, -or
find Let s try! First, log into the remote server (200.144.244.172) Use find to perform the follo ing searches: find /data/genomas -name '*contigs*' find /data/genomas -iname '*contigs*' find /data/genomas -name 'Try*' find /data/genomas -iname 'Try*' -type f find /data/genomas -iname 'Try*' -type f -exec ls -l {} \; find /data/genomas -iname 'Try*' -type f -exec ls -l {} + Characters lihe ; and ( and ) have special meaning to the shell, so they must be escaped ith a preceding \ (bachslash): \; \( \)
find We can group diferent tests in a search command To do that, e use the logical operators mentioned earlier (-and, -or, -not) If no operator is given, -and is applied by default in most cases Grouping is performed ith parantheses, hich have to be escaped (since they have special meaning for the shell) by using a backslash Examples: find /data/ -name '*contigs*' -and -type d find /data/ -name '*contigs*' -type d (same as previous one) find /data/ \( -name 'Try*' -or -name 'Lc*' \) -and -type f find /data/ -iname 'Try*' -not -type d https://www.gnu.org/software/findutils/manual/html_mono/find.html
Quiz time! Go to the Moodle site and choose Quiz 17 (beware time limits!)
Now you do it! Go to the Moodle site, Practical Exercise 17 Follo the instructions to ans er the questions in the exercise (and beware time limits!) Remember: in the PE, you should do things in practice before ans ering the question!
Standard streams Standard streams (for data fo ) are automatically connected input and output communication channels bet een a computer program and its environment, available hen the program begins execution They are considered a special hind of virtual text fles One of the most important concepts in the use of the command line! There are three such streams: Standard input (stdin) Standard output (stdout) Standard error (stderr)
Standard streams In most operating systems before UNIX, programs had to be explicitly connected (by the programmer) to the appropriate input and output devices One of the advances introduced by UNIX ere abstract devices, hich eliminated the need for the program to hno (or going into) here data as coming from UNIX also implemented automatic connection of each running program to the standard data streams ( hich tie the program to actual physical devices) Standard input (stdin): here data comes from to enter the program Standard output (stdout): here data goes hen it gets out of the program Standard error (stderr): here program errors (or arnings or diagnostic messages) go to hen issued
Standard streams Some programs do not require standard input, e.g., ls, pwd Some others do not require standard output, e.g., mkdir, cd Standard input is represented by number 0 Standard output is represented by number 1 Standard error is represented by number 2
By default: Standard streams Standard input (stdin): Standard output (stdout): heyboard screen Standard error (stderr): #0 stdin screen Text terminal Keyboard Process #1 stdout #2 stderr Display
Standard streams Let's try! Log into the remote server, in case you ere not there already If you ere in class previously, you must have a program called average hose fle has been placed in your $HOME/bin directory If you dontt have that fle, copy it from ~dummy/bin/ to your home directory The average program accepts data from the standard input and sends its output to standard output Start the program (remember: if itts not in a directory from your $PATH, you must give the relative or absolute path to be able to run it! Also mahe sure your copy of the program has execute permissions for your user) average (or./average or bin/average etc.) Notice that the program started, and it is no aiting for data!
Data is coming from the standard input, hich is the keyboard, by default So, type a number, and then ENTER Keep doing that until you are done To signal the end of fle (after all, STDIN is a virtual fle ), press Ctrl+d by itself, at the start of a ne line Since the average program aits for the hole input before performing any calculation, no output appears until the end of input Other programs could behave diferently; for example: head This program return the top 10 (by default) lines of a fle. Try it with STDIN! Enter lines until the program exits
Standard streams Just by existing, standard streams are already very useful But the capability of redirection mahes them even more versatile That ay, data can come from (or go to) diferent places than just the heyboard (or the screen) Redirection can be done between a program and input and output fles or between diferent programs This is the main enabler of the modularity displayed by UNIX, specially at the command line! Remember the frst lecture? The Unix philosophy: combining small programs that each do only one thing (but do it ell), instead of having large programs that do a lot of things (but not as ell done): the po er of a system comes more from the relationships among programs than from the programs themselves Brian W. Kernighan e Rob Pihe, 1984
Quiz time! Go to the Moodle site and choose Quiz 18 (beware time limits!)
Redirecting to (and from) files Typing a lot of data for the program ould not be practical Reading a large amount of output on the screen ouldn t either actually, it ould be impossible in many cases To redirect the streams, e use redirection operators The redirection operators are: > >> < << <<< To determine hich stream you are redirecting, you can prepend its number to the operator; e.g., 2> ( ill redirect STDERR)
Redirecting to (and from) files The redirection operators are: > : redirect STDOUT to fle named on the right 2> : redirect STDERR to fle named on the right >> : redirect, appending STDOUT to fle named on the right 2>> : redirect, appending STDERR to fle named on the right < : redirect STDIN from fle named on the right << : redirect STDIN as a here-document <<< : redirect STDIN as a here-string The operators that redirect to STDOUT and STDERR create the fle if it does not exist Careful! The single version of the operator (e.g., 2>) ill al ays over rite the fle named on the right (if it exists)!
Redirecting to (and from) files Notice that it is not necessary to use the fle handle numbers for the STDIN and STDOUT streams If nothing is given, these are the default choices for the < and > redirection operators It is possible to merge STDOUT and STDERR and send them to the same fle by using the construct: command > file 2>&1 command &> file Both versions do the same thing: send the t o streams to the same fle (overwriting the fle!) To add to the fle ithout over riting, use the >> file 2>&1 and &>> versions
Redirecting to (and from) files Let s try! In the remote server, run the follo ing: ls -l /usr/local/bin/ /usr/blah If you typed correctly, you should see one error and the listings of a directory No run: ls -l /usr/local/bin/ /usr/blah > ls_f1 2> ls_f2 Where did all the stuf go? List the contents of your directory and see you no have t o ne fles ls -ltr (That is: list ith long format, sorting by time, ith ne est fles last)
No, try: Redirecting to (and from) files ls -l /usr/local/bin/ > ls_f3 Use the more command to see hat is inside of the fle ls_f3 that you just created No run: ls -l /usr/local/lib/ > ls_f3 Looh again at the contents of fle ls_f3 Where did all the data from the frst run go? We actually anted to append the results from the second run to the fle! ls -l /usr/local/bin > ls_f3 ls -l /usr/local/lib >> ls_f3
Here-documents Here-documents are multi-line string literals That is, they are a ay of passing multiple lines of text to standard input The << operator specifes that a here-document is about to start Here-docs are of the follo ing general format command << MARK MARK Everything bet een the t o instances of the ord MARK (or hatever you choose) ill be redirected to the STDIN of the command to the left of << Variables can be expanded inside the bloch of text, or not (depending on hether e use quotes around the delimiter ord)
Here-documents Example: wc << EOF A quick brown fox jumps over the lazy dog $PATH EOF Since the delimiting identifer (in this case, EOF) appearing by itself on a line marhs the end of the here-doc, it is a good idea to choose something that is not a real word The output of the command above ill be something lihe: 1 10 300 No, put quotations marks (single or double, it does not matter) around the frst EOF and see hat happens No more expansion!
Here-strings Here-strings are a shortened version of a here-doc Here-strings are limited to one line (containing one or more ords) The <<< operator specifes that a here-string follo s Here-strings have a very simple format command <<< STRING Variables can be expanded inside the string, or not (depending on what kind of quotes, single or double, e use around the string) For example: wc <<< "A quick brown fox jumps over the lazy dog $PATH" Run the command lihe that, ith double quotes, and then ith single quotes. Diferent output? Why?
Quiz time! Go to the Moodle site and choose Quiz 19 (beware time limits!)
Now you do it! Go to the Moodle site, Practical Exercise 18 Follo the instructions to ans er the questions in the exercise (and beware time limits!) Remember: in the PE, you should do things in practice before ans ering the question!
Piping Another pioneering UNIX concept, the pipeline is a sequence of processes chained together by their standard streams The standard output of the frst process goes directly into the standard input of the second process, then the STDOUT of the second goes into the STDIN of the third, and so on and so forth
Piping The output of one process......becomes the input to another
Piping Another pioneering UNIX concept, the pipeline is a sequence of processes chained together by their standard streams The standard output of the frst process goes directly into the standard input of the second process, then the STDOUT of the second goes into the STDIN of the third, and so on and so forth The operator for the pipe is the vertical bar: command1 command2 command3 The STDERR does not get in the pipe, by default To have STDERR go along ith STDOUT in the pipe, use the & construct: command1 & command2 & command3 No space bet een and & there! This construct is not used much though
Piping The pipeline is the crucial feature enabling the UNIX philosophy The chaining allo ed by standard streams and pipe redirection is hat leads to the combination of small, generic, single purpose command line tools into very specifc, sophisticated commands
Piping The t o diferent hinds of redirection can be used in the same chained command, of course For example: ls -l /usr/bin wc -l > out_file This command ill: List all fles from /usr/bin (left side of the pipe) Count ho many fles there are (right side of the pipe) Save the results in fle out_file (redirection of STDOUT on the right) Right no, e haven t seen enough data-munching commands to be able to explore the full po er of piping That s for after the midterm exam!
Now you do it! Go to the Moodle site, Practical Exercise 19 Follo the instructions to ans er the questions in the exercise (and beware time limits!) Remember: in the PE, you should do things in practice before ans ering the question!
Recap The find program is very po erful and can fnd fles based on a large number of criteria, and also includes logical operators for greater fexibility Standard streams (STDIN, STDOUT, and STDERR) are an essential feature of UNIX, and mahe it very easy to redirect data fo s bet een programs and fles or programs and other programs The main redirection operators are <, >, 2>, >>, and 2>> Redirection of standard streams bet een programs, called piping, allo s us to concatenate diferent programs to create more specifc ones The pipe character,, redirects STDOUT from the command to its left to STDIN of the command to its right (STDERR by default goes to the screen) Standard streams and piping are responsible for most of the UNIX philosophy