Awk. 1 What is AWK? 2 Why use AWK instead of Perl? 3 Uses Of AWK. 4 Basic Structure Of AWK Programs. 5 Running AWK programs

Similar documents
Pathologically Eclectic Rubbish Lister

Control Flow Statements. Execute all the statements grouped in the brackets. Execute statement with variable set to each subscript in array in turn

44 - Text formating with awk Syntax: awk -FFieldSeperator /SearchPatern/ {command} File z.b. awk '/ftp/ {print $0}' /etc/services

Language Reference Manual

Mastering Modern Linux by Paul S. Wang Appendix: Pattern Processing with awk

UNIX II:grep, awk, sed. October 30, 2017

UNIX Shell Programming

Variables and literals

Introduction to Perl. c Sanjiv K. Bhatia. Department of Mathematics & Computer Science University of Missouri St. Louis St.

Will introduce various operators supported by C language Identify supported operations Present some of terms characterizing operators

UNIT - I. Introduction to C Programming. BY A. Vijay Bharath

Fundamental Data Types. CSE 130: Introduction to Programming in C Stony Brook University

C OVERVIEW BASIC C PROGRAM STRUCTURE. C Overview. Basic C Program Structure

C OVERVIEW. C Overview. Goals speed portability allow access to features of the architecture speed

1 Lexical Considerations

GE PROBLEM SOVING AND PYTHON PROGRAMMING. Question Bank UNIT 1 - ALGORITHMIC PROBLEM SOLVING

Unit 3. Operators. School of Science and Technology INTRODUCTION

CS313D: ADVANCED PROGRAMMING LANGUAGE

Computers Programming Course 6. Iulian Năstac

A complex expression to evaluate we need to reduce it to a series of simple expressions. E.g * 7 =>2+ 35 => 37. E.g.

SSOL Language Reference Manual

Computer Programming

Lecture 3. More About C

GridLang: Grid Based Game Development Language Language Reference Manual. Programming Language and Translators - Spring 2017 Prof.

Kurt Schmidt. October 19, 2017

GNU ccscript Scripting Guide IV

Java Language Basics: Introduction To Java, Basic Features, Java Virtual Machine Concepts, Primitive Data Type And Variables, Java Operators,

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Language Reference Manual simplicity

CSCI 4152/6509 Natural Language Processing. Perl Tutorial CSCI 4152/6509. CSCI 4152/6509, Perl Tutorial 1

Conversion Between Number Bases

Getting Started. Office Hours. CSE 231, Rich Enbody. After class By appointment send an . Michigan State University CSE 231, Fall 2013

EL2310 Scientific Programming

Sir Muhammad Naveed. Arslan Ahmed Shaad ( ) Muhammad Bilal ( )

B.V. Patel Institute of Business Management, Computer & Information Technology, Uka Tarsadia University

Introduction to C Language

DETAILED SYLLABUS INTRODUCTION TO C LANGUAGE

Awk A Pattern Scanning and Processing Language (Second Edition)

2. Numbers In, Numbers Out

Babu Madhav Institute of Information Technology, UTU 2015

ls /data/atrnaseq/ egrep "(fastq fasta fq fa)\.gz" ls /data/atrnaseq/ egrep "(cn ts)[1-3]ln[^3a-za-z]\."

Room 3P16 Telephone: extension ~irjohnson/uqc146s1.html

C-LANGUAGE CURRICULAM

VARIABLES AND CONSTANTS

Overview of C, Part 2. CSE 130: Introduction to Programming in C Stony Brook University

Review of the C Programming Language for Principles of Operating Systems

CS 25200: Systems Programming. Lecture 10: Shell Scripting in Bash

Variables Data types Variable I/O. C introduction. Variables. Variables 1 / 14

Work relative to other classes

Lexical Considerations

cis20.1 design and implementation of software applications I fall 2007 lecture # I.2 topics: introduction to java, part 1

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

Binghamton University. CS-211 Fall Syntax. What the Compiler needs to understand your program

C Programming


COMP519 Web Programming Lecture 17: Python (Part 1) Handouts

CS4120/4121/5120/5121 Spring 2016 Xi Language Specification Cornell University Version of May 11, 2016

ME 461 C review Session Fall 2009 S. Keres

Fundamental of Programming (C)

CSE 374 Programming Concepts & Tools. Laura Campbell (thanks to Hal Perkins) Winter 2014 Lecture 6 sed, command-line tools wrapup

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #29 Arrays in C

COSC2031 Software Tools Introduction to C

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

2. Numbers In, Numbers Out

Preface... (vii) CHAPTER 1 INTRODUCTION TO COMPUTERS

Review of the C Programming Language

CS & IT Conversions. Magnitude 10,000 1,

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

A Fast Review of C Essentials Part I

3. Types of Algorithmic and Program Instructions

Language Reference Manual

Expressions and Precedence. Last updated 12/10/18

UNIT- 3 Introduction to C++

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Ordinary Differential Equation Solver Language (ODESL) Reference Manual

Awk APattern Scanning and Processing Language (Second Edition)

Course Outline Introduction to C-Programming

Linux shell & shell scripting - II

c++ keywords: ( all lowercase ) Note: cin and cout are NOT keywords.

VLC : Language Reference Manual

Assignment: 1. (Unit-1 Flowchart and Algorithm)

bc an arbitrary precision calculator language version 1.06

AN OVERVIEW OF C. CSE 130: Introduction to Programming in C Stony Brook University

Expressions. Arithmetic expressions. Logical expressions. Assignment expression. n Variables and constants linked with operators

The Arithmetic Operators. Unary Operators. Relational Operators. Examples of use of ++ and

The Arithmetic Operators

Day06 A. Young W. Lim Wed. Young W. Lim Day06 A Wed 1 / 26

AWK: The Duct Tape of Computer Science Research. Tim Sherwood UC San Diego

Flow Control. CSC215 Lecture

Simple Java Reference

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Computers Programming Course 5. Iulian Năstac

C Programming for Electronic Engineers


sed Stream Editor Checks for address match, one line at a time, and performs instruction if address matched

FRAC: Language Reference Manual

Unit 1: Introduction to C Language. Saurabh Khatri Lecturer Department of Computer Technology VIT, Pune

CISC-124. This week we continued to look at some aspects of Java and how they relate to building reliable software.

DECLARATIONS. Character Set, Keywords, Identifiers, Constants, Variables. Designed by Parul Khurana, LIECA.

Transcription:

Awk Author: Reuben Francis Cornel 1 What is AWK? AWK is a programable filter developed with the aim of using it to generate formatted reports from data. Althought is uses addresses like sed to perform operations it differs from sed in the way data is handled. 2 Why use AWK instead of Perl? Perl can be used to handle most of the operations which AWK can do. To be precise AWK is a subset of perl. The main reason why AWK comes in more handy than perl is because of its ease of use and clean syntax. 3 Uses Of AWK Apart from report generation, AWK can be used as a pseudo C interpreter. As a result it can used to prototype small tools. A less known aspect of AWK is that it can be used in AI programming, specially because of its ability to act on patterns. 4 Basic Structure Of AWK Programs The basic organisation of awk programs follows the pattern shown below. /pattern/ {actions} The pattern could be a regular expression or a numeric comparison. Either the pattern or the action can be left out. If the pattern is left out. The action is applied to every line Can you guess what happens if the pattern is left out? There are two special cases for patterns BEGIN { }: This statement is executed at the start of processing END { }: This statement is executed at end of processing 5 Running AWK programs There are 3 major ways of running AWK programs. One Shot Programs: Example shown before was a one shot program Long AWK programs: Here we use the -f option of awk. Executable AWK programs: We set executable permission of the awk program and add the interpreter line. 1

6 Syntax Elements Of Awk Programs 6.1 Awk Variables Awk gives us three types of variables User Defined Variables Positional Variables Special Variables 6.1.1 User Defined Variables This is the variable you create. Creating these variables require no prior declaration like C for example you want to create a variable all you have to do is assign a value to that variable name. For example test=1 But what happens if you refer to a variable which you have not defined previously? Now what would happen if we define a variable which has a name of a inbuilt function 6.1.2 Positional Variables A positional variable is one which is prefixed with the $ sign. This represents a particular field in the record. Whitespaces are used as delimiters for fields There is a special positional variable $0. This represents the entire line read in by awk. When you do a print $0 you print the whole line. But is there another way to print the same? 6.2 s Awk provides us with three types of operators Math s Relational s Regular Expression s The table below is the list of mathematical operators + Addition - Subtraction * Multiplication / Division % Modulo ++ Auto increment -- Auto decrement += Add result to variable 2 continued on the next page

Math s (continued) -= Subtract result from variable *= Multiply variable by result /= Divide variable by result %= Apply modulo to variable The table below is a list of relational operators == Is equal to!= Is not equal to > Greater than >= Greater than < Lesser than <= Lesser than Finally we have operators for regular expressions 6.3 Awk Keywords ~ Pattern matches String!~ Pattern Does not match Most awk commands have been taken over from C. Therefore most of you would be familiar with most of the basic syntax. Nonetheless the table below is a reference to the syntax. Syntax if(conditional) statement [ else statment ] while (conditional) statement for ( expression ; conditional ; expression) statement for (variable in array) statement If Else Construct While Construct Typical For Construct This is a variation of the above for, similar to the shell for command Break a loop Get on with the next iteration break continue { [ statement ]...} Blocks variable=expression Assignment print [ expression-list ] Standard Print printf format [, expression-list ] formatted output exit Exits the interpreter 3

7 Special Variables Awk provides us with special variables using which we can change certian properties of awk, such as the delimiter that seperates fields or records. 7.1 FS Field Seperator This variable represents the string which seperates fields in a record. By default this is a single space. Since not all records use spaces for delimiters changing this field lets you change the delimiter. 7.2 RS Record Seperator This by default is a newline character. Sometime we have records which span across multiple lines. So in a we have to change the string which acts like a delimiter. The RS varible helps us do this. 7.3 NF Number of Fields This is variable represents the number of fields. This comes in helpful when you have to change operation of the program based on the number of fields avaliable. 7.4 NR Number of Records This variable represents the number of records that have been processed. 7.5 OFS Output Field Seperator & ORS Output Record Seperator Sometimes a program has to provide a input to a filter and that filter which uses a different output field and record seperator. In suc a case you can use awk to act like an adaptor. OFS - is the character which is printed as the field delimiter when you print a number of fields ORS - is the character which is printed as the record delimiter. 8 Associative Arrays Instead of the regular arrays, awk provides us with associative arrays. Associative arrays unlike conventional arrays can use anything as subscripts. This comes in helpful when you have to perform count operations to generate reports. 9 Formatted Output using printf Awk provides us with printf so that our output can be formatted to fit, certain specifications. Below shown is a list of format specifiers 4

Format Specifier %c ASCII Character %d Decimal integer %e Floating Point number (engineering format) %f Floating Point number (fixed point format) %o Octal %s String %x Hexadecimal % Literal % The basic format specifier syntax is as shown below %[+-]?[0-9]*.?[0-9]*[specifier] Below are some example how to use the specifiers Statement x = Baryshnikov printf( [%16s],x) [ Baryshnikov] printf( [%-16s],x) [Baryshnikov ] printf( [%.3s],x) [Bar] printf( [%16.3s],x) [ Bar] printf( [%-16.3s],x) [Bar ] printf( [%016s],x) [00000Baryshnikov] printf( [%-016s],x) [Baryshnikov ] x = 312 printf( [%8d],x) [ 312] printf( [%-8d],x) [312 ] printf( [%08d],x) [00000312] printf( [%-08d],x) [312 ] x = 251.673209 printf( [%16f],x) [ 251.67309] printf( [%-16f],x) [251.67309 ] printf( [%.3f],x) [251.673] printf( [%16.3f],x) [ 251.673] printf( [%016.3f],x) [00000000251.673] printf and print also allow you to write the output into a file using the > and the >> symbol. 10 String Functions It is important to remember that strings in awk are handled as a full data type and not like a array of characters in C. Below is a list of commonly used string functions in awk 5

Function index(string,search) length(string) split(string,array,separator) substr(string,position,max) sub(regex,replacement,string) gsub(regex,replacement,string) Returns the location of search string in the given string else 0 Prints the lenght of the string Splits a given string into an array based on supplied seperator extracts substring from the given string starting at mentioned position and for a given lenght of characters substitutes one regular expression instance in string with replacement substitutes all regular expression instances in string with replacement 11 Command Line Options Awk allows you to perform command line initialization of variables, using arguments to the awk command. Some of the most common arguments are given below Option -F Lets you can specify the feild seperator here -f File from which the program is to be executed -v var=val Lets you can initialise a specific variable from the command line The original copy of this document can be obtained from www.geocities.com/reuben cornel. This document is released under GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 6