Introduction to Unix

Similar documents
Introduction to Unix

Operating System Interaction via bash

When talking about how to launch commands and other things that is to be typed into the terminal, the following syntax is used:

Command Line Interface The basics

ITST Searching, Extracting & Archiving Data

Getting Started With UNIX Lab Exercises

CMPS 12A Introduction to Programming Lab Assignment 7

Unix/Linux Basics. Cpt S 223, Fall 2007 Copyright: Washington State University

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

DATA 301 Introduction to Data Analytics Command Line. Dr. Ramon Lawrence University of British Columbia Okanagan

Why learn the Command Line? The command line is the text interface to the computer. DATA 301 Introduction to Data Analytics Command Line

Introduction to Linux

My Favorite bash Tips and Tricks

Linux Bootcamp Fall 2015

Essential Linux Shell Commands

A Brief Introduction to the Linux Shell for Data Science

Introduction. File System. Note. Achtung!

This lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one]

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

15-122: Principles of Imperative Computation

CENG 334 Computer Networks. Laboratory I Linux Tutorial

1. Open VirtualBox and start your linux VM. Boot the machine and log in with the user account you created in Lab #1. Open the Terminal application.

COL100 Lab 2. I semester Week 2, Open the web-browser and visit the page and visit the COL100 course page.

Crash Course in Unix. For more info check out the Unix man pages -orhttp:// -or- Unix in a Nutshell (an O Reilly book).

Linux File System and Basic Commands

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

6 Stephanie Well. It s six, because there s six towers.

Overview of the UNIX File System. Navigating and Viewing Directories

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

Lab Working with Linux Command Line

Working with Basic Linux. Daniel Balagué

Practical Session 0 Introduction to Linux

The input can also be taken from a file and similarly the output can be redirected to another file.

Tutorial 1: Unix Basics

Unix/Linux Primer. Taras V. Pogorelov and Mike Hallock School of Chemical Sciences, University of Illinois

Overview of the UNIX File System

Shell Programming Overview

Using UNIX. -rwxr--r-- 1 root sys Sep 5 14:15 good_program

Essentials for Scientific Computing: Bash Shell Scripting Day 3

UNIX files searching, and other interrogation techniques

Digital Humanities. Tutorial Regular Expressions. March 10, 2014

CSC209. Software Tools and Systems Programming.

CS CS Tutorial 2 2 Winter 2018

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016

Linux Command Line Interface. December 27, 2017

CS 460 Linux Tutorial

Unix tutorial. Thanks to Michael Wood-Vasey (UPitt) and Beth Willman (Haverford) for providing Unix tutorials on which this is based.

Introduction to UNIX Command Line

Exploring UNIX: Session 5 (optional)

A Big Step. Shell Scripts, I/O Redirection, Ownership and Permission Concepts, and Binary Numbers

bash, part 3 Chris GauthierDickey

CS102: Standard I/O. %<flag(s)><width><precision><size>conversion-code

Utilities. September 8, 2015

Introduction p. 1 Who Should Read This Book? p. 1 What You Need to Know Before Reading This Book p. 2 How This Book Is Organized p.

Review of Fundamentals

Module 8 Pipes, Redirection and REGEX

Laboratory 1 Semester 1 11/12

UNIX COMMANDS AND SHELLS. UNIX Programming 2015 Fall by Euiseong Seo

CSC209. Software Tools and Systems Programming.

Lab: Supplying Inputs to Programs

A Brief Introduction to the Command Line. Hautahi Kingi

1. What statistic did the wc -l command show? (do man wc to get the answer) A. The number of bytes B. The number of lines C. The number of words

Subcontractors. bc math help for the shell. interactive or programatic can accept its commands from stdin can accept an entire bc program s worth

Scripting Languages Course 1. Diana Trandabăț

The Directory Structure

Bioinformatics? Reads, assembly, annotation, comparative genomics and a bit of phylogeny.

An introduction to Linux Part 4

commands exercises Linux System Administration and IP Services AfNOG 2015 Linux Commands # Notes

Chapter 2 Text Processing with the Command Line Interface

Intro to Linux. this will open up a new terminal window for you is super convenient on the computers in the lab

Review of Fundamentals. Todd Kelley CST8207 Todd Kelley 1

Exploring UNIX: Session 3

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

An Illustrated Guide to Shell Magic: Standard I/O & Redirection

Lecture 4. Log into Linux Reminder: Homework 1 due today, 4:30pm Homework 2 out, due next Tuesday Project 1 out, due next Thursday Questions?

Unix Introduction to UNIX

Introduction to the UNIX command line

Reading and manipulating files

Getting to grips with Unix and the Linux family

UNIX, GNU/Linux and simple tools for data manipulation

Lesson 3 Transcript: Part 2 of 2 Tools & Scripting

1 Installation (briefly)

COMS 6100 Class Notes 3

Physics REU Unix Tutorial

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

Text Search & Auto Coding

CSE 390a Lecture 2. Exploring Shell Commands, Streams, Redirection, and Processes

Scripting. Shell Scripts, I/O Redirection, Ownership and Permission Concepts, and Binary Numbers

CSE 391 Lecture 3. bash shell continued: processes; multi-user systems; remote login; editors

I/O and Shell Scripting

THE HONG KONG POLYTECHNIC UNIVERSITY Department of Electronic and Information Engineering

Operating Systems and Using Linux. Topics What is an Operating System? Linux Overview Frequently Used Linux Commands

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0

CSC209H Lecture 1. Dan Zingaro. January 7, 2015

TNM093 Practical Data Visualization and Virtual Reality Laboratory Platform

Read the relevant material in Sobell! If you want to follow along with the examples that follow, and you do, open a Linux terminal.

Chapter 4. Unix Tutorial. Unix Shell

Unix basics exercise MBV-INFX410

Sub-Topic 1: Quoting. Topic 2: More Shell Skills. Sub-Topic 2: Shell Variables. Referring to Shell Variables: More

Transcription:

Introduction to Unix Part 1: Navigating directories First we download the directory called "Fisher" from Carmen. This directory contains a sample from the Fisher corpus. The Fisher corpus is a collection of short phone conversations between people who typically do not know each other. Participants are assigned a random topic to talk about. We will explore the data at the command line. To do so, we open a shell. Open the Terminal application Go to "Shell" > "New Window" > "New Window with Settings - Basic" This gives you a shell in which we can type. Here is the prompt I see (yours will be different): dhcp48:~ mcdm$ Let's explore some Unix commands. ls What does this command do? Try it. You need to type it and hit "return". It lists the contents of the directory we are in. By default, the Unix login session begins in your home directory. man If you want more information about a command, you can look in the manual. For example, to get information about the "ls" command, just type $ man ls NAME ls -- list directory contents SYNOPSIS ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file...] DESCRIPTION For each operand that names a file of a type other than directory, ls displays its name as well as any requested, associated information. For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well as any requested, associated information. If no operands are given, the contents of the current directory are displayed. To quit the manual, type "q" (for quit). cd This command changes your current directory location. Using it, we can navigate the directory hierarchy. Our Fisher directory has been downloaded into the "Downloads" folder, so let's go there. $ cd Downloads $ ls

Remember, "ls" lists the directory content. You should see the Fisher directory there. What's in it? $ cd Fisher $ ls What do we see? We see that "Fisher" contains two directories, named "058" and "065". Go to one of these directories. $ cd 058 $ ls We see a bunch of files in there. What's in the other directory? We can go there directly. We know that the directory hierarchy looks like this: Downloads Fisher 058 065 fe_03_05851.txt fe_03_05852.txt etc. Right now we are in the directory called "058". To go to the other directory ("065"), we need to go back up to "Fisher" and then go into "065". Conveniently "cd.." moves to the parent of the current directory. So we can do $ cd.. $ cd 065 Or in one command: $ cd../065 What follows the command is the path. "pwd" will give you the whole path of the directory you are currently in. Just typing "cd" makes you go back to your home directory. Go into the "065" directory again (if you are not there anymore). How do we go back to the Downloads folder from there? $ cd../../ Part 2: Looking into a file Now we want to see how the files are structured. Let's look into one.

more $ more fe_03_06596.txt 0.59 1.92 A-f: hello 1.96 2.97 B-m: (( hello )) 2.95 3.98 A-f: hello 3.71 5.43 B-m: my name is kevin gonzales 5.49 7.30 A-f: hi this is carol 6.95 8.35 B-m: carol okay carol 8.42 9.44 A-f: [laughter] 9.56 11.27 A-f: well that was fast 12.48 13.79 B-m: uh hello 13.47 14.85 A-f: yeah i'm here 14.58 15.53 B-m: okay 16.81 25.87 A-f: um so do you think that public or private school To see more of the file, hit the "return" key. It seems that we have one utterance per line. Participants are coded as A and B, and we have gender information (as -f or -m tagged to the participant code). The numbers at the beginning of the lines are the start and end time of the utterance. Ok, we have a pretty decent understanding of how the file is structured. How do we quit this? "q" -- easy :-) Let's look at the second file. People are lazy... Using the "tab" key, Unix will fill the rest of the command for you. So let's try this. $ more f [Then hit "tab"] Magical, isn't it? Now we just type "01", and hit "tab" again, and we are seeing the content of the file "fe_03_06501.txt". Another small trick: Using the up and down arrows, we can repeat commands we just typed (there is a history of the commands we typed). less We can also use the "less" command to look into the file. $ less fe_03_06596.txt What's the difference between "more" and "less"? How can I find the answer to this question? Look at the manual! $ man less DESCRIPTION Less is a program similar to more (1), but which allows backward movement in the file as well as forward movement. Also, less does not have to read the entire input file before starting, so with large input files it starts up faster

wc Can we roughly quantify the amount of data that's available here? E.g., how many files do we have? How many words? We can count stuff easily with the "wc" command. I. What happens when we type the "wc" command? $ wc fe_03_06501.txt 173 2729 14055 fe_03_06501.txt Tell me what these numbers mean. How do you find the answer? That's right: we look at the manual. $ man wc NAME wc -- word, line, character, and byte count SYNOPSIS wc [-clmw] [file...] DESCRIPTION The wc utility displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified) to the standard output. A line is defined as a string of characters delimited by a <newline> character. Characters beyond the final <newline> character will not be included in the line count. So what are the different numbers we see? 173 is the number of lines in the file "fe_03_06501.txt" 2,729 is the number of words in the file "fe_03_06501.txt" 14,055 is the number of bytes in the file "fe_03_06501.txt" Now I want to know how many words I have in this whole directory "065", and not only for this particular file. We will use * which serves as a wildcard character. So *.txt will mean "anything that ends with.txt" How is a word defined? $ wc *.txt 19178 252415 1324595 total II. Can we know how many files we have in this "065" directory? We can list the directory content and count the output. We use the vertical bar " " to connect two commands together so that the output from one command becomes the input of the next command. Two (or more) commands connected in this way form what's called a pipe. $ ls wc 100 100 1600 So we have 100 files in the "065" directory. Now count the number of files in the "058" directory.

Redirection What if we want to capture the output of a command in a more permanent way? We can use the command line to create a file with our results using the ">" character. This is just like the pipe, except that the output goes to a file instead of another command. Try the command: $ wc *.txt > linecounts.txt What is in the linecounts.txt file now? (Which command do you use to view the file contents?) Be very careful with output redirection... if the file after the ">" already exists, its contents are overwritten with no warning! Exercise How many files and how many words do we have in total in the Fisher directory? Part 3: Going further There are other useful Unix commands, and you might want to take a look at Unix for Poets. But for our purpose, we just want to be able to navigate the directory tree and have quick peeks into files. Sometimes a file might be really big, and you don't want to open it in Word ;-) The "grep" command might be quite useful to you but we will not insist too much on this. Regular expressions are quite handy but tricky too (see xkcd comic). Just for the fun of it, I will quickly illustrate what you can do with regular expressions. "grep" stands for Globally search a Regular Expression and Print. It searches plain-text data for lines matching a regular expression. The syntax is the following: grep regular_expression filename We can first try regular expressions in an interactive mode. We will type "grep" followed by the regular expression we want to look for, then "enter". We can then type some text (one line), and hit enter. If the regular expression matches the text, the line will be echoed back to us. If not, nothing will happen. grep "hello" hello hello #echoed back to us hello dave hello dave #echoed back to us dave #nothing is happening Use ctrl-c to quit. It can be useful to use some colors, to see what exactly gets matched. Do to this, we add the option "--color=auto": grep --color=auto "hello" Using "grep", we can have an idea of how much people laugh in these conversations. How would we do this?

Remember the first file we looked at: $ cd $ cd Downloads/Fisher/065/ $ more fe_03_06596.txt 0.59 1.92 A-f: hello 1.96 2.97 B-m: (( hello )) 2.95 3.98 A-f: hello 3.71 5.43 B-m: my name is kevin gonzales 5.49 7.30 A-f: hi this is carol 6.95 8.35 B-m: carol okay carol 8.42 9.44 A-f: [laughter] 9.56 11.27 A-f: well that was fast 12.48 13.79 B-m: uh hello 13.47 14.85 A-f: yeah i'm here 14.58 15.53 B-m: okay 16.81 25.87 A-f: um so do you think that public or private school Laughter is coded explicitly in the transcriptions as [laughter]. Using "wc", we know how many lines a file contains. If we can count how often [laughter] appears in these lines, then we will have an idea of the proportion of utterances containing laughter. Let's make sure we are working on the same directory here. Let's start with 065. $ grep "\[laughter\]" *.txt We need to use the backslash character to escape the bracket character which has a special meaning in regular expressions. But here we want its literal meaning as a character. (If we type [xyz] without escaping the brackets, it will match any line that contains the letter x, y or z.) Now how do we get the line count? $ grep "\[laughter\]" *.txt wc 1482 23609 161959 How many total lines do we have in the "065" directory? $ wc *.txt 19178 252415 1324595 total So roughly 8% of the utterances in the files from the "065" directory contain laughter. Perhaps we want to do this on all the files in all the directories. $ cd../ $ grep -R "\[laughter\]"./ wc 1681 26608 186119 -R stands for recursive./ refers to the current directory

Getting the number of lines of all the files in all the directories is a bit trickier, and I will not enter into the details, but here is a command to do it. $ find. -name "*.txt" xargs wc 29734 362020 1902592 total