Awk & Regular Expressions

Similar documents
5/8/2012. Exploring Utilities Chapter 5

Awk A Pattern Scanning and Processing Language (Second Edition)

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

Awk APattern Scanning and Processing Language (Second Edition)

Shell Programming Overview

5/20/2007. Touring Essential Programs

Common File System Commands

Wildcards and Regular Expressions

Regex, Sed, Awk. Arindam Fadikar. December 12, 2017

Pattern Matching. An Introduction to File Globs and Regular Expressions

Lecture #13 AWK Part I (Chapter 6)

find Command as Admin Security Tool

Introduction to UNIX Part II

Today s Lecture. The Unix Shell. Unix Architecture (simplified) Lecture 3: Unix Shell, Pattern Matching, Regular Expressions

CST Lab #5. Student Name: Student Number: Lab section:

sed Stream Editor Checks for address match, one line at a time, and performs instruction if address matched

Mastering Modern Linux by Paul S. Wang Appendix: Pattern Processing with awk

Bashed One Too Many Times. Features of the Bash Shell St. Louis Unix Users Group Jeff Muse, Jan 14, 2009

CSCI 211 UNIX Lab. Shell Programming. Dr. Jiang Li. Jiang Li, Ph.D. Department of Computer Science

Advanced training. Linux components Command shell. LiLux a.s.b.l.

Getting to grips with Unix and the Linux family

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

1 Lexical Considerations

Pattern Matching. An Introduction to File Globs and Regular Expressions. Adapted from Practical Unix and Programming Hunter College

Basic Unix Command. It is used to see the manual of the various command. It helps in selecting the correct options

ITST Searching, Extracting & Archiving Data

Today. Review. Unix as an OS case study Intro to Shell Scripting. What is an Operating System? What are its goals? How do we evaluate it?

Std: XI CHAPTER-3 LINUX

A A B U n i v e r s i t y

Fall 2006 Shell programming, part 3. touch


FILTERS USING REGULAR EXPRESSIONS grep and sed

Objectives. In this chapter, you will:

CSCI 2132 Software Development. Lecture 4: Files and Directories

Introduction of Linux

UNIX files searching, and other interrogation techniques

A Brief Introduction to the Linux Shell for Data Science

Shell Start-up and Configuration Files

Basic Shell Scripting Practice. HPC User Services LSU HPC & LON March 2018

Typescript on LLVM Language Reference Manual

Shells & Shell Programming (Part B)

Regular Expressions. Regular expressions match input within a line Regular expressions are very different than shell meta-characters.

Bash Programming. Student Workbook

Lexical Considerations

Unix for Developers grep, sed, awk

Data Analysis in Geophysics ESCI Bob Smalley Room 103 in 3892 (long building), x Tu/Th - 13:00-14:30 CERI MAC (or STUDENT) LAB

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words

Course Outline. TERM EFFECTIVE: Fall 2016 CURRICULUM APPROVAL DATE: 11/23/2015

CS 25200: Systems Programming. Lecture 10: Shell Scripting in Bash

Pace University. Fundamental Concepts of CS121 1

Lecture 2. Regular Expression Parsing Awk

Lexical Considerations

LING 408/508: Computational Techniques for Linguists. Lecture 5

Regular Expressions. Michael Wrzaczek Dept of Biosciences, Plant Biology Viikki Plant Science Centre (ViPS) University of Helsinki, Finland

Chapter 2: Basic Elements of C++

Objectives. Chapter 2: Basic Elements of C++ Introduction. Objectives (cont d.) A C++ Program (cont d.) A C++ Program

UNIX II:grep, awk, sed. October 30, 2017

5/8/2012. Specifying Instructions to the Shell Chapter 8

Chapter 2: Basic Elements of C++ Objectives. Objectives (cont d.) A C++ Program. Introduction

Basics. I think that the later is better.

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

Shells and Shell Programming

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Introduction to Unix Week 3

Regular Expressions. Regular Expression Syntax in Python. Achtung!

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois

Week Overview. Simple filter commands: head, tail, cut, sort, tr, wc grep utility stdin, stdout, stderr Redirection and piping /dev/null file

Dr. Sarah Abraham University of Texas at Austin Computer Science Department. Regular Expressions. Elements of Graphics CS324e Spring 2017

Answers to AWK problems. Shell-Programming. Future: Using loops to automate tasks. Download and Install: Python (Windows only.) R

Lecture 18 Regular Expressions

UNIX / LINUX - REGULAR EXPRESSIONS WITH SED

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Shells and Shell Programming

Language Reference Manual

Review of Fundamentals

PESIT Bangalore South Campus

1 The Var Shell (vsh)

Reading and manipulating files

Session: Shell Programming Topic: Advanced Commands

Module 8 Pipes, Redirection and REGEX

EECS2301. Lab 1 Winter 2016

Practical 02. Bash & shell scripting

QUESTION BANK ON UNIX & SHELL PROGRAMMING-502 (CORE PAPER-2)

Open up a terminal, make sure you are in your home directory, and run the command.

OPERATING SYSTEMS LAB LAB # 6. I/O Redirection and Shell Programming. Shell Programming( I/O Redirection and if-else Statement)

Essential Linux Shell Commands

CSC UNIX System, Spring 2015

The e switch allows Perl to execute Perl statements at the command line instead of from a script.

Use of AWK / SED Prof. Navrati Saxena TA: Rochak Sachan


Operating System Interaction via bash

IB047. Unix Text Tools. Pavel Rychlý Mar 3.

UNIX shell scripting

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

Unix as a Platform Exercises. Course Code: OS-01-UNXPLAT

COMS 6100 Class Notes 3

You must have a basic understanding of GNU/Linux operating system and shell scripting.

Shell scripting Scripting and Computer Environment - Lecture 5

S E C T I O N O V E R V I E W

Transcription:

Awk & Regular Expressions CSCI-620 Dr. Bill Mihajlovic awk Text Editor awk, named after its developers Aho, Weinberger, and Kernighan. awk is UNIX utility. The awk command uses awk program to scan text files or standard input to: display specific data, change data format, and add text to existing data. 1

Awk awk is a pattern scanning and processing language. awk searches one or more input files to see if they contain lines that match specified patterns and then perform associated actions, such as writing the line to the standard output or incrementing a counter each time it finds a match. awk is a programming language which permits easy manipulation of structured data and the generation of formatted reports. # awk pattern {action} infile Awk Syntax Awk utility may receive instructions as the command line string of text or as a text read form a awk-file: # awk pattern {action} infile The awk utility performs the action on all lines that the pattern selects The pattern selects lines from the input file. Braces nust enclose the action so that awk can differentiate it from the pattern. 2

Awk If a program line does not contain a pattern, awk selects all lines in the input file. There are two rules which occur if either a pattern or action is ommited: # awk {action} infile # awk pattern { } infile If the program line does not contain an action, awk copies the selected lines to its standard output (this is usually the display, if you haven't redirected the output to another program or to a file). awk as Programming Language The capabilities of awk extend the idea of text editing into computation, making it possible to perform a variety of data processing tasks, including: analysis, extraction, and reporting of data. These are, indeed, the most common uses of awk. 3

Regular Expression RE Searching for: Exactly matching patterns or Closely matching patterns in the text is a common problem. Regular expressions make finding character patterns much easier. Regular expression MetaCharacters allows characters to take on a range of values. Regular Expression RE RE is a character pattern which can match numerous similar strings, because it can contain metacharacters that expand the scope of the search beyond a literal string. Metacharacters are special characters that represent more than their literal meanings. Quoting is the means of turning off the special meaning of metacharacters. 4

Editor REs Editor regular expressions are used in editors or shell commands to find character patterns within files. Typical editors and other programs that use meta characters are: vi, ex, edit, view, ed, red, sed,grep, egrep, and expr. Patterns An awk pattern is used to conditionally pass control to an action. An action only executes if its relevant pattern was matched. You can use a regular expression, enclosed within slashes, as a pattern. The ~ operator tests to see if a field or variable matches a regular expression. The!~ operator tests for no match. You can process arithmetic and character relational expressions with the following relational operators. 5

Patterns & Operators You can process arithmetic and character relational expressions with the following relational operators. Operator Meaning < less than <= less than or equal to == equal to!= not equal to >= greater than or equal to > greater than awk Operators You can combine any of the patterns using the Boolean operators (OR) or && (AND). The comma is the range operator. If you separate two patterns with a comma on a single awk progam line, awk selects a range of lines beginning with the first line that contains the first pattern. The last line awk selects is the next subsequent line that contains the second pattern. After awk finds the second pattern, it starts the process over by looking for the first pattern again. 6

Use of Patern-Matching Metacharacters Matching Filenames within directory files. File Name Generation (FNG) Matching strings within text. Editor regular expression Full regular expressions Awk regular expressions? Metacharacter The? Matches any one character but a dot. A dot must be matched explicitly. # echo? F A f b # 7

* Metacharacter The * matches any number of characters but a dot and whie space characters, (blank, tab, newline) # echo * F a f b F11 alexf a1 # ls -x a?* alexf a1 # Character Class Expressio Character class is a group of characters to be matched # ls -x f[abc,+123] fb f+ f1 f2 # ls -x f[!a-za-z0-9] f+ f- # Ranges can be included in character classes by listing by listing 2 characters to define range bounds separated by a dash. 8

Basic awk Command Format The basic format of this command consists of the awk command, the instructions enclosed in quotes and curly braces, and the name of the input file. If an input file is not specified, then standard input is used, for example, the keyboard. The following is a basic awk command. The output of the ls l command is piped to awk. For each line received by awk, the print action is executed, which prints the output to the screen. $ ls -l awk {print $0} awk Arguments When awk reads in a line it automatically breaks the line into fields. Each field is assigned a variable name. Spaces or tabs are used as the default delimiter between fields. The variable names assigned to fields are a dollar sign ($) followed by the number of the field, counting from left to right. The variable name $1 represents the contents of Field 1. The variable name $2 represents the contents of Field 2, and so on. The entire line is represented by the variable name $0. 9

awk Displays Specific Data To instruct awk to display specific data (for example, the file owner, file size, and file name), the fields variable names are used with the action. # ls -l awk {print $3 $5 $9} user154120dante user1368dante_1 user1176dat user1512dir1 user1512dir2 user1512dir3 user1512dir4 user1235file1 user1105file2 user1218file3 user1137file4 # awk Example Selecting Data fstats file contains the data for players PPG - points per game Consider this example: RPG - rebounds per game APG - assists per game $ cat fstats Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 $ awk '$2 > 20 { print $1 }' fstats Smith Jones Davis This command says: "Read each line from fstats Johnson and if the second field is more than 20, print the first $ field." This awk command prints the names of the players who have more than 20 PPG. 10

awk Example Selecting Data fstats file contains the data for players PPG - points per game is $2 RPG - rebounds per game is $3 APG - assists per game is $4 $ cat fstats Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 $ awk '($2 > 20 && $3 < 7) { print $1 }' fstats Smith Jones $ This command says "Read each line from fstats; if the second field is more than 20 and the third field is less than 7, print the first field." awk Example Selecting Data This command prints the names of players that begin with 'J' $ cat fstats This command prints the Smith 26.4 5.5 7.2 names of players that start with Jones 23.7 5.2 6.0 'J' and have more than 5 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 APG. Williams 18.8 6.1 9.9 $ awk '$1 ~ /^J/ { print $1 } fstats Jones $ awk '($1 ~ /^J/ && $4 > 5) { print $1 }' fstats Jones $ 11

awk Operators BEGIN & END Two unique patterns, BEGIN and END, allow you to execute commands before awk starts its processing and after it finishes. The awk utility executes the actions associated with the BEGIN pattern before, and with the END pattern after, it processes all the files for input. awk Example BEGIN This command prints a header and all of the data. You can do things before or after all lines have been read with BEGIN and END! $ cat fstats Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 $ awk 'BEGIN { print "Name PPG RPG APG" } { print }' fstats Name PPG RPG APG Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 $ 12

awk Example END This command prints a header and all of the data. You can do things before or after all lines have been read with BEGIN and END! This command says: "Read each line from fstats, print the whole line, and after the last line, print That is all, folks!'." $ cat fstats Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 $ awk '{ print } END { print "That is all, folks!" }' fstats Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 That is all, folks! $ awk Example Math - awk even does math! This counts the number of players with more than 20 PPG The above command says "Read each line fstats; if the second field is more than maxppg, make maxppg the second field a make player the first field. After all lines been read, print the line 'player had t most PPG, with maxppg'." $ cat fstats Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 $ awk '$2 > 20 { total = total + 1 } \ END { print total, "players had more than 20 PPG" }' fstats 4 players had more than 20 PPG $ awk '$2 > maxppg { maxppg = $2; player = $1 } END { print player, "had the most PPG, with", maxppg }' fstats Smith had the most PPG, with 26.4 $ 13

awk Example Math Read each line from fstats and add all total scores Running an awk program from a file Finally, you can store awk programs in files, so you do not have to re-enter long awk commands. For instance, if you wanted to run the previous command from a file, you would create a file (let's call it ex1.awk) containing the following: $ cat awk_prog { totrpg = totrpg + $3; count = count + 1 } END { print "Average RPG is", totrpg/count } $ awk -f ex1.awk fstats Average RPG is 7.34 $ This counts of players. This command says "Read each line from player.dat; add the third field to totrpg and add 1 to count. After all lines have been read, print the awk line 'Average Example RPG is totrpg / count'." Math - To calculate the average RPG for all players. awk even does math! $ cat fstats Smith 26.4 5.5 7.2 Jones 23.7 5.2 6.0 Davis 21.8 9.4 3.7 Johnson 20.8 10.5 3.0 Williams 18.8 6.1 9.9 $ awk '$2 > 20 { total = total + 1 } \ END { print total, "players had more than 20 PPG" }' fstats 4 players had more than 20 PPG $ awk '{ totrpg = totrpg + $3; count = count + 1 } END { print "Average RPG is", totrpg/count }' player.dat Average RPG is 7.34 $ 14

Homework Homework Edit and save file fstats. Add two more lines with your name or the name of your fiend s name. Repeat all examples shown in the slide presentation. Enclose your screen shots. 15

Cygwin Shell and Awk Utility You may use Cygwin Linux shell to demonstrate awk examples. Place file fstats in the Cygwin home directory. Cygwin Shell and Awk Utility Check the version of your awk utiltiy. 16

Where is your Root Directory? Be careful with directories. Use your home directory. Place fstats File in your Home Directory 17

Awk in Janoshell Differs a Bit Different versions of awk utilitiy may have different option switch flags and may differ. However, all versions must perform the same regular expressions use. Place Inout File fstats in Administrator Directory 18

Do all Slide s Examples Do all slide presentation awk examples in both shells Cygwin and Janotech shell. Observe all differences. The End ==== 19