Mastering Modern Linux by Paul S. Wang Appendix: Pattern Processing with awk

Similar documents
Language Reference Manual

Control Flow Statements. Execute all the statements grouped in the brackets. Execute statement with variable set to each subscript in array in turn

5/8/2012. Exploring Utilities Chapter 5

Awk. 1 What is AWK? 2 Why use AWK instead of Perl? 3 Uses Of AWK. 4 Basic Structure Of AWK Programs. 5 Running AWK programs

Awk A Pattern Scanning and Processing Language (Second Edition)

sed Stream Editor Checks for address match, one line at a time, and performs instruction if address matched

Session: Shell Programming Topic: Advanced Commands

Lecture #13 AWK Part I (Chapter 6)

44 - Text formating with awk Syntax: awk -FFieldSeperator /SearchPatern/ {command} File z.b. awk '/ftp/ {print $0}' /etc/services

Full file at

NAME nawk pattern-directed scanning and processing language. SYNOPSIS nawk [ F fs ][ v var=value ][ prog f progfile ][file... ]

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Overview of C, Part 2. CSE 130: Introduction to Programming in C Stony Brook University

UNIX II:grep, awk, sed. October 30, 2017

Chapter 2, Part I Introduction to C Programming

BASH SHELL SCRIPT 1- Introduction to Shell

CSC Web Programming. Introduction to JavaScript

UNIX Shell Programming

York University Faculty Science and Engineering Fall 2008

Intro to Computer Programming (ICP) Rab Nawaz Jadoon

MAWK(1) USER COMMANDS MAWK(1)

Flow Control. CSC215 Lecture

Ordinary Differential Equation Solver Language (ODESL) Reference Manual

The e switch allows Perl to execute Perl statements at the command line instead of from a script.

SSOL Language Reference Manual

The Warhol Language Reference Manual

Getting to grips with Unix and the Linux family

Awk APattern Scanning and Processing Language (Second Edition)

Introduction to C Programming

Object oriented programming. Instructor: Masoud Asghari Web page: Ch: 3

1 Lexical Considerations

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Programming for Engineers Introduction to C

User Commands sed ( 1 )

Data Analysis in Geophysics ESCI Bob Smalley Room 103 in 3892 (long building), x Tu/Th - 13:00-14:30 CERI MAC (or STUDENT) LAB

Creating a Shell or Command Interperter Program CSCI411 Lab

x = 3 * y + 1; // x becomes 3 * y + 1 a = b = 0; // multiple assignment: a and b both get the value 0

Typescript on LLVM Language Reference Manual

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

There are algorithms, however, that need to execute statements in some other kind of ordering depending on certain conditions.

Essentials for Scientific Computing: Bash Shell Scripting Day 3

IB047. Unix Text Tools. Pavel Rychlý Mar 3.

Appendix A GLOSSARY. SYS-ED/ Computer Education Techniques, Inc.

Essentials for Scientific Computing: Stream editing with sed and awk

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

EECS2301. Example. Testing 3/22/2017. Linux/Unix Part 3. for SCRIPT in /path/to/scripts/dir/* do if [ -f $SCRIPT -a -x $SCRIPT ] then $SCRIPT fi done

Mastering Linux by Paul S. Wang Appendix: The emacs Editor

Fall 2006 Shell programming, part 3. touch

8. Control statements

Sprite an animation manipulation language Language Reference Manual

Fundamentals of Programming Session 4

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Awk & Regular Expressions

JME Language Reference Manual

Unleashing the Shell Hands-On UNIX System Administration DeCal Week 6 28 February 2011

A Linux Shell Script to Automatically Compare Results of Statistical Analyses

Shells & Shell Programming (Part B)

MELODY. Language Reference Manual. Music Programming Language

Mirage. Language Reference Manual. Image drawn using Mirage 1.1. Columbia University COMS W4115 Programming Languages and Translators Fall 2006

CLIP - A Crytographic Language with Irritating Parentheses

Introduction to Supercomputing

AN OVERVIEW OF C. CSE 130: Introduction to Programming in C Stony Brook University

Linux shell & shell scripting - II

Spoke. Language Reference Manual* CS4118 PROGRAMMING LANGUAGES AND TRANSLATORS. William Yang Wang, Chia-che Tsai, Zhou Yu, Xin Chen 2010/11/03

Manual Shell Script Linux If Not Equals. Statement >>>CLICK HERE<<<

CS434 Compiler Construction. Lecture 3 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Considerations

BASH and command line utilities Variables Conditional Commands Loop Commands BASH scripts


\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

UNIT - I. Introduction to C Programming. BY A. Vijay Bharath

INTRODUCTION 1 AND REVIEW

Conditional Control Structures. Dr.T.Logeswari

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

Part III. Shell Config. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Use of AWK / SED Prof. Navrati Saxena TA: Rochak Sachan

3. Except for strings, double quotes, identifiers, and keywords, C++ ignores all white space.

Lexical Considerations

Objectives. In this chapter, you will:

String Computation Program

JVM (java) compiler. A Java program is either a library of static methods (functions) or a data type definition

Petros: A Multi-purpose Text File Manipulation Language

CS4120/4121/5120/5121 Spring 2016 Xi Language Specification Cornell University Version of May 11, 2016

Exam 1 Prep. Dr. Demetrios Glinos University of Central Florida. COP3330 Object Oriented Programming

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays

Should you know scanf and printf?

Scripting Languages Course 1. Diana Trandabăț

Lecture Programming in C++ PART 1. By Assistant Professor Dr. Ali Kattan

Practical Linux examples: Exercises

Cisco IOS Shell. Finding Feature Information. Prerequisites for Cisco IOS.sh. Last Updated: December 14, 2012

Java Bytecode (binary file)

Basic Linux (Bash) Commands

Pace University. Fundamental Concepts of CS121 1

Language Reference Manual

BASIC ELEMENTS OF A COMPUTER PROGRAM

Final Exam 1 /12 2 /12 3 /10 4 /7 5 /4 6 /10 7 /8 8 /9 9 /8 10 /11 11 /8 12 /10 13 /9 14 /13 15 /10 16 /10 17 /12. Faculty of Computer Science

Shell scripting and system variables. HORT Lecture 5 Instructor: Kranthi Varala

DaMPL. Language Reference Manual. Henrique Grando

Chapter 4. Unix Tutorial. Unix Shell

Transcription:

Mastering Modern Linux by Paul S. Wang Appendix: Pattern Processing with awk The awk program is a powerful yet simple filter. It processes its input one line at a time, applying user-specified awk pattern actions to each line. The awk program is similar to, but more powerful than, sed. The awk mechanisms are based more on the C programming language than on a text editor, allowing for variables, arrays, conditionals, expressions, iteration controls, formatted output, and so on. The awk program can perform operations not possible with sed, such as joining adjacent lines and comparing parts of different lines. The general form of the awk command is awk [-Fc] script [file]... The -F option specifies a character c to be the field separator (default white space). The argument script is an awk script given on the command line or in a file with the -f filename convention. The files are are processed in the order given. If no files are given, standard input is used. If a dash (-) is given as a file name, it is taken to mean standard input. The awk processing cycle is as follows: 1. If there are no more input lines, terminate. Otherwise, read the next input line. 2. Apply all awk pattern commands sequentially as specified in script to the current line. 3. Go to step (1). Note that unlike sed, awk does not write lines to the standard output automatically. An awk script consists of one or more pattern actions given on different lines or separated by semicolons. Each pattern action takes the form pattern {action} If the current line matches the pattern, the action is taken. A missing pattern matches every line, and a missing action outputs the line. Thus, ls -l awk is the same as ls -l sed -n /Linux/ /Linux/p Pattern and action are described more fully in the following subsections. The concept of a field here is the same as that used for sort: awk delineates each of its input lines into fields separated by white space or by a field separator character specified with the -F option. In an awk action, the fields are denoted $1, $2, and so on. The entire line is denoted by $0. While it is hard to rearrange the order of fields using sed, it is easy with awk. For instance, the output of ls -l has eight fields: -rw-rw---- 2 jsmith 512 Apr 23 21:44 report.tex -rw-rw---- 1 jsmith 79 Feb 9 15:13 Makefile -rw-rw---- 2 jsmith 1024 Feb 25 00:13 pipe.c When the preceding lines are piped through awk, 1

ls -l awk {print $8,$4,$5,$6} The following output is produced: report.tex 512 Apr 23 Makefile 79 Feb 9 pipe.c 1024 Feb 25 Note that, in this example, the pattern action contains no pattern. Also, actions are always enclosed between braces ({ and }). awk Patterns As with sed, the pattern determines whether or not awk takes an action on the current line. In fact, a sed address, specified with one or two match expressions, also will work as an awk pattern. If you are familiar with sed, you already know many useful patterns. For instance, /^first/ /last$/ /^$/ /[ ][ ]*/ /begin/,/end/ (first at the beginning of a line) (last at the end of a line) (an empty line) (a line with a string of one or more blanks) (all lines between a begin match and an end match) are valid patterns in both sed and awk. In awk, a pattern is an arbitrary Boolean expression involving regular expressions and relational expressions. Boolean expressions are formed with && (and), (or),! (not), and parentheses. A regular expression in awk must begin and end with a slash (/) and otherwise is defined the same as that for egrep (Table??). Relational expressions are formed using C-like operators >, >=, <, <=, == (equal), and!= (not equal). In addition, a relational expression can be: expression ~ re expression!~ re where ~ means contains and!~ means does not contain. For example, the pattern $1 ~ /GNU/ && $2 ~ /Linux/ is true if the first field contains the string GNU and the second field contains the string Linux. A pattern may contain two patterns separated by a comma, in which case the action is applied to all lines beginning with a line matching the first pattern up to and including the line matching the second pattern (the same as in sed). Thus, awk NR==14,NR==30 file outputs lines 14-30 of file, because awk keeps a running line count in the built-in variable NR. Other useful built-in variables are listed in Table 0.2. The special patterns BEGIN and END in BEGIN {action} END {action} specify actions executed before the first input line and after the last input line, respectively. They are used for initialization and postprocessing when needed. 2

TABLE 0.2 Built-in awk Variables Variable Meaning NF Total number of fields on current line NR Sequence number of current line FS Input field separator character (default blanks) RS Input record separator (default newline) OFS Output field separator string (default space) ORS Output record separator string (default newline) OFMT Output format for numbers (default %g as in printf) awk Actions Now let s turn to the question of how actions are specified. An action contains a sequence of statements given on different lines or separated by semicolons. Possible statements are as follows: Assignment: var = expression Output: print expression [, expression]... printf(... ) (as in C) Flow control if ( conditional ) statement [else statement] (as in C): for(expression; conditional; expression) statement while (conditional) statement, break, continue Additional flow control: next (skip remaining commands, start next awk cycle) exit (exit awk) In the preceding definitions, a statement can be a compound statement in the form {statement, statement,... } The output statements use the standard output. However, they can be followed by > "filename" to redirect the output into a file. awk Expressions Expressions in awk statements can be constants, variables, arrays, fields, or any combinations of these using the following C operators: +, -, *, /, %, ++, --, +=, -=, *=, %= Numerical constants in awk statements are the same as in C. String constants are placed in double quotation marks ("string") and variables are initialized to the null string. An array element is denoted as a[i], where i can be an integer or any string. A blank between two expressions concatenates them into a string. Thus, for example, awk {print $2 ":" $1} file outputs field2:field1 of each line from the given file. Built-in functions (Table 0.3 lists a few) can also be used in expressions. In awk, conditional expressions use C notation and may involve awk-defined relational expressions. In Table 0.3, e is an expression, c is a character, s is a string, and i and j are integers. 3

Index Preparation: An Example The awk pattern processing program is powerful and involved. The best way to learn it is through use and experimentation. In this section, we present an example of awk usage to prepare an index for a document (Ex: ex04/index.awk). Suppose you have several index files, each containing entries such as (Ex: ex04/index.data) bash:99 regular expression:155 bash:123 pipe:101 gnome:163 socket:415 pipe:23 where each line has two fields: an index item and a page number separated by the :. Your goal is to produce an overall index file in alphabetical order with lines such as (Ex: ex04/index.file) bash 99,123 gnome 163 pipe 23,101 regular expression 155 socket 415 The first step is to order the entries alphabetically and by page number, which can be done with sort -t: --key=1,2.0f --key=2n index.data > index.tmp in which the following sort keys are used: 1,2.0f 2n (first key, field one ignoring case) (second key, field two with numerical comparison) It then remains to collect repeated index items to form lines with multiple page numbers. Since repeated items will be on consecutive lines, the awk script index.awk (Figure 4) can be used. To apply the script use awk -f index.awk index.tmp > index.file There are four pattern commands in index.awk. The first command sets the variable i (used for initialization) to zero. The second command compares $1 with the variable pre, TABLE 0.3 Built-in awk Functions Function Meaning int(e), length(s) Integer (floor), length of string s gsub(re, s, t) Replaces matches of re in t with s index(s1,s2) Position of string s2 in s1, zero if s2 not in s1 sprintf(...) Format conversion, same as in the C language substr(s,i,j) Substring of s of length j from position i split(s,a,c) Cuts s into substrings a[1] to a[i] at char c; returns i getline() Inputs next line, returns 0 on end of file, otherwise 1 4

which stands for the previous index item and is initially null. If $1 is equal to pre (field one is the same as the previous index item), then output the page number ($2), preceded by a comma. If $1 is not equal to pre (a new index item), then output newline, $1, space, and $2 except for the very first line where the leading newline is not needed. The conditional output is performed in the if of the third command which also records the index item ($1) in the variable pre. At the end of the input file, a final newline is output. FIGURE 4 Program index.awk for Index Processing BEGIN { i = 0; } $1 == pre { printf(",%s", $2); } $1!= pre { if (i > 0) { printf("\n%s %s",$1,$2); } else { printf("%s %s",$1,$2); i = 1; } pre = $1; } END { printf("\n"); } 5