Overview. Unix/Regex Lab. 1. Setup & Unix review. 2. Count words in a text. 3. Sort a list of words in various ways. 4.

Similar documents
CS 124/LINGUIST 180 From Languages to Information. Unix for Poets Dan Jurafsky

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Informa<on. Unix for Poets (in 2013) Christopher Manning Stanford University

Unix for Poets (in 2016) Christopher Manning Stanford University Linguistics 278

COL100 Lab 2. I semester Week 2, Open the web-browser and visit the page and visit the COL100 course page.

A Brief Introduction to the Linux Shell for Data Science

538 Text processing basics

5/8/2012. Exploring Utilities Chapter 5

commands exercises Linux System Administration and IP Services AfNOG 2015 Linux Commands # Notes

IB047. Unix Text Tools. Pavel Rychlý Mar 3.

Unix L555. Dept. of Linguistics, Indiana University Fall Unix. Unix. Directories. Files. Useful Commands. Permissions. tar.

Introduction To Linux. Rob Thomas - ACRC

CENG 334 Computer Networks. Laboratory I Linux Tutorial

Introduction to Linux

Practical Session 0 Introduction to Linux

Laboratory 1 Semester 1 11/12

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Introduction To. Barry Grant

The Unix Shell. Pipes and Filters

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Lab #2 Physics 91SI Spring 2013

Lab Working with Linux Command Line

Introduction to UNIX command-line II

UNIX files searching, and other interrogation techniques

The Shell. EOAS Software Carpentry Workshop. September 20th, 2016

Introduction to Unix

Perl and R Scripting for Biologists

Introduction. File System. Note. Achtung!

CS 460 Linux Tutorial

Scripting Languages Course 1. Diana Trandabăț

Unix Guide. Meher Krishna Patel. Created on : Octorber, 2017 Last updated : December, More documents are freely available at PythonDSP

ITST Searching, Extracting & Archiving Data

Working With Unix. Scott A. Handley* September 15, *Adapted from UNIX introduction material created by Dr. Julian Catchen

Lab 1 Introduction to UNIX and C

Introduction to Unix


Parts of this tutorial has been adapted from M. Stonebank s UNIX Tutorial for Beginners (

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Introduction to UNIX command-line

CS214-AdvancedUNIX. Lecture 2 Basic commands and regular expressions. Ymir Vigfusson. CS214 p.1


CS160A EXERCISES-FILTERS2 Boyd

Recap From Last Time:

BGGN 213 Working with UNIX Barry Grant

CSE 303 Lecture 2. Introduction to bash shell. read Linux Pocket Guide pp , 58-59, 60, 65-70, 71-72, 77-80

Getting Started With UNIX Lab Exercises

CSE 390a Lecture 2. Exploring Shell Commands, Streams, and Redirection

Unix Tutorial Haverford Astronomy 2014/2015

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

Chapter 4. Unix Tutorial. Unix Shell

The Directory Structure

1. Open VirtualBox and start your linux VM. Boot the machine and log in with the user account you created in Lab #1. Open the Terminal application.

Introduction To. Barry Grant

Unix Tools / Command Line

- c list The list specifies character positions.

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois

COSC UNIX. Textbook. Grading Scheme

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

This lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one]

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

CSC209H Lecture 1. Dan Zingaro. January 7, 2015

Tools for Text. Unix Pipe Fitting for Data Analysis. David A. Smith

CS 3410 Intro to Unix, shell commands, etc... (slides from Hussam Abu-Libdeh and David Slater)

Practical 02. Bash & shell scripting

Unix basics exercise MBV-INFX410

Lab 2: Linux/Unix shell

bash Scripting Introduction COMP2101 Winter 2019

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

6.033 Computer System Engineering

CS Unix Tools. Fall 2010 Lecture 5. Hussam Abu-Libdeh based on slides by David Slater. September 17, 2010

Chapter-3. Introduction to Unix: Fundamental Commands

Introduction to Unix: Fundamental Commands

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University

Cloud Computing and Unix: An Introduction. Dr. Sophie Shaw University of Aberdeen, UK

Introduction to Linux

History. Terminology. Opening a Terminal. Introduction to the Unix command line GNOME

CSCI 4061: Pipes and FIFOs

Principles of Bioinformatics. BIO540/STA569/CSI660 Fall 2010

A Brief Introduction to Unix

University of Windsor : System Programming Winter Midterm 01-1h20mn. Instructor: Dr. A. Habed

Introduction: What is Unix?

CSCI 2132 Software Development. Lecture 4: Files and Directories

CS/CIS 249 SP18 - Intro to Information Security

Recap From Last Time: Setup Checklist BGGN 213. Todays Menu. Introduction to UNIX.

CSCI 2132 Software Development. Lecture 7: Wildcards and Regular Expressions

Unix tutorial. Thanks to Michael Wood-Vasey (UPitt) and Beth Willman (Haverford) for providing Unix tutorials on which this is based.

Mineração de Dados Aplicada

Version Control with Git

If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC

CS CS Tutorial 2 2 Winter 2018

Intro to Linux. this will open up a new terminal window for you is super convenient on the computers in the lab

Shell. SSE2034: System Software Experiment 3, Fall 2018, Jinkyu Jeong

LAB 8 (Aug 4/5) Unix Utilities

Reading and manipulating files

The input can also be taken from a file and similarly the output can be redirected to another file.

Computer Systems and Architecture

Introduction to Unix CHAPTER 6. File Systems. Permissions

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

Transcription:

Overview Unix/Regex Lab CS 341: Natural Language Processing Heather Pon-Barry 1. Setup & Unix review 2. Count words in a text 3. Sort a list of words in various ways 4. Search with grep Based on Unix For Poets (by Ken Church) 5. Two-minute response

Setting Up 1. Setup & Unix Review In your home directory, make a cs341 folder Make a directory called unixforpoets for today s lab activity

Unix Tools pwd ls cd <dirname> cd../ less <filename> head <filename> tail <filename> man <command> piping > < CTRL-C grep: search for a pattern (regular expression) sort uniq c (count duplicates) tr (translate characters) wc (word or line count) cat (send file(s) in stream) sed (edit string -- replacement)

Counting lines, words, characters 2. Count words in a text wc alice.txt 1601 27336 135029 alice.txt

tr command NAME tr - translate or delete characters SYNOPSIS tr [OPTION]... SET1 [SET2] DESCRIPTION Translate, squeeze, and/or delete characters from standard input, writing to standard output. -c complement of SET1 -s, if SET2 is specified, squeezes repeated SET2 characters to a single character --help display this help and exit Counting Words Input: mini-alice.txt; alice.txt Output: list of words with freq counts Algorithm 1. Create a file with one token per line (tr -sc ) 2. Sort (sort) 3. Count duplicates (uniq c) Practice using tr, sort, and uniq incrementally on mini-alice.txt Once you understand each step, run your command on alice.txt

Output head and tail 632 a 1 abide 1 able 94 about 3 above 1 absence 2 absurd 1 acceptance 2 accident 1 accidentally... Solution: tr -sc A-Za-z \n < alice.txt sort (hidden) uniq -c head gives you the first n lines (n=10 by default; can specify n with flag - n) tr -sc A-Za-z \n < alice.txt sort uniq -c head n 5 632 a 1 abide 1 able 94 about 3 above what do you think tail does?

Most Frequent Words Exercise 3. Sort a list of words in various ways Find the 50 most common words in alice.txt Hint: Use sort a second time, then head

grep 4. Search with grep Grep finds patterns specified as regular expressions globally search for regular expression and print

grep Try this: grep cheshire alice.txt it s a cheshire cat said the duchess and that s why pig she said the last word with such sudden violence that alice quite jumped but she saw in another moment that it was addressed to the baby and not to her so she took courage and went on again i didn t know that cheshire cats always grinned in fact i didn t know that cats could grin Next, try grepping other phrases grep Make an intermediary words file: tr -sc A-Za-z \n < alice.txt > alice.words Finding words ending in ing: grep 'ing$' alice.words sort uniq c

grep Take-home Message grep is a filter you keep only some lines of the input Try these on alice.words grep gh keep lines containing gh grep ˆcon keep lines beginning with con grep ing$ keep lines ending with ing grep v gh keep lines NOT containing gh Piping commands together can be simple yet powerful in Unix grep i [aeiou].*[aeiou] keep lines with two or more vowels grep i ˆ[ˆaeiou]*[aeiou][ˆaeiou]*$ keep lines with exactly one vowel

https://xkcd.com/208/ 5. Two-minute response

Two-minute Response In Piazza, post a Note to Instructor only: 1. What is one thing you understand better after today s activity? Extra Exercises 2. What is something that s still unclear on/a question you have?

Sorting exercises Exercises on grep & wc In alice.txt Find the words in alice.txt that end in ling using sorting (and not using grep) Hint: what does this do? tr -sc 'A-Za-z' '\n' < alice.txt sort uniq head rev How many 4-letter words? How many different words are there with no vowels What subtypes do they belong to? How many 1 syllable words are there That is, ones with exactly one vowel Answer these with respect to word types, not word tokens

grep We used the following to keep lines with exactly one vowel grep i ˆ[ˆaeiou]*[aeiou][ˆaeiou]* $ What would happen if we instead used the command? In what contexts is this important? grep i [ˆaeiou]*[aeiou][ˆaeiou]*