Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview

Similar documents
Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview

CMPS 105 Systems Programming. Prof. Darrell Long E2.371

Lecture 03 Bits, Bytes and Data Types

Crash Course in Unix. For more info check out the Unix man pages -orhttp:// -or- Unix in a Nutshell (an O Reilly book).

Hello, World! in C. Johann Myrkraverk Oskarsson October 23, The Quintessential Example Program 1. I Printing Text 2. II The Main Function 3

C Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011

gcc hello.c a.out Hello, world gcc -o hello hello.c hello Hello, world

CSE 124 Discussion (10/3) C/C++ Basics

CS Programming In C

Beyond this course. Machine code. Readings: CP:AMA 2.1, 15.4

Introduction to Supercomputing

INF322 Operating Systems

Program Organization and Comments

PRINCIPLES OF OPERATING SYSTEMS

CptS 360 (System Programming) Unit 2: Introduction to UNIX and Linux

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CSCI-243 Exam 2 Review February 22, 2015 Presented by the RIT Computer Science Community

COSC 2P91. Introduction Part Deux. Week 1b. Brock University. Brock University (Week 1b) Introduction Part Deux 1 / 14

Laboratory 1 Semester 1 11/12

CSci 4061 Introduction to Operating Systems. Programs in C/Unix

377 Student Guide to C++

are all acceptable. With the right compiler flags, Java/C++ style comments are also acceptable.

Layers in a UNIX System. Create a new process. Processes in UNIX. fildescriptors streams pipe(2) labinstructions

THE C STANDARD LIBRARY & MAKING YOUR OWN LIBRARY. ISA 563: Fundamentals of Systems Programming

C How to Program, 6/e by Pearson Education, Inc. All Rights Reserved.

CSE 374 Programming Concepts & Tools

211: Computer Architecture Summer 2016

Topic 6: A Quick Intro To C. Reading. "goto Considered Harmful" History

CS 326 Operating Systems C Programming. Greg Benson Department of Computer Science University of San Francisco

Reviewing gcc, make, gdb, and Linux Editors 1

C for C++ Programmers

UNIX input and output

EL2310 Scientific Programming

Computers and Computation. The Modern Computer. The Operating System. The Operating System

Lecture 2, September 4

Advanced C Programming Topics

Basic C Programming. Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

Introduction to C An overview of the programming language C, syntax, data types and input/output

CSC111 Computer Science II

cs3157: another C lecture (mon-21-feb-2005) C pre-processor (3).

CS4023 Week04 Lab Exercise

Lecture 12 CSE July Today we ll cover the things that you still don t know that you need to know in order to do the assignment.

UNIVERSITY OF NEBRASKA AT OMAHA Computer Science 4500/8506 Operating Systems Summer 2016 Programming Assignment 1 Introduction The purpose of this

The C standard library

C BOOTCAMP DAY 2. CS3600, Northeastern University. Alan Mislove. Slides adapted from Anandha Gopalan s CS132 course at Univ.

Saint Louis University. Intro to Linux and C. CSCI 2400/ ECE 3217: Computer Architecture. Instructors: David Ferry

CSCI 171 Chapter Outlines

Tutorial 1 C Tutorial: Pointers, Strings, Exec

Topic 6: A Quick Intro To C

Introduction to Linux

Bash command shell language interpreter

CS 3113 Introduction to Operating Systems Midterm October 11, 2018

CS 3113 Introduction to Operating Systems Midterm October 11, 2018

Problem Set 1: Unix Commands 1

Programs. Function main. C Refresher. CSCI 4061 Introduction to Operating Systems

Kurt Schmidt. October 30, 2018

System Programming. Introduction to Unix

APT Session 4: C. Software Development Team Laurence Tratt. 1 / 14

COMP26120 Academic Session: Lab Exercise 2: Input/Output; Strings and Program Parameters; Error Handling

CS3157: Advanced Programming. Outline

Exercise 1: Basic Tools

CS 220: Introduction to Parallel Computing. Beginning C. Lecture 2

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files

CSC209: Software tools. Unix files and directories permissions utilities/commands Shell programming quoting wild cards files. Compiler vs.

CSC209 Review. Yeah! We made it!

CE221 Programming in C++ Part 1 Introduction

ITCS 4145/5145 Assignment 2

Computer Science 330 Operating Systems Siena College Spring Lab 5: Unix Systems Programming Due: 4:00 PM, Wednesday, February 29, 2012

Operating System Interaction via bash

Unix and C Program Development SEEM

CS4023 Week06 Lab Exercise

Operator overloading

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #29 Arrays in C

CSE 333 Midterm Exam 7/29/13

Lab 4: Tracery Recursion in C with Linked Lists

SWEN-250 Personal SE. Introduction to C

Programming for Engineers Arrays

COMP s1 Lecture 1

Getting started with UNIX/Linux for G51PRG and G51CSA

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: Processes and Threads

Operating Systems 3. Operating Systems. Content. What is an Operating System? What is an Operating System? Resource Abstraction and Sharing

Operating Systems. Copyleft 2005, Binnur Kurt

Final CSE 131B Spring 2004

Writing to and reading from files

C introduction: part 1

Creating a Shell or Command Interperter Program CSCI411 Lab

Programming Tools. Venkatanatha Sarma Y. Lecture delivered by: Assistant Professor MSRSAS-Bangalore

File IO and command line input CSE 2451

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial

Carnegie Mellon. Cache Lab. Recitation 7: Oct 11 th, 2016

EL2310 Scientific Programming

Room 3P16 Telephone: extension ~irjohnson/uqc146s1.html

CS 361 Computer Systems Fall 2017 Homework Assignment 1 Linking - From Source Code to Executable Binary

assembler Machine Code Object Files linker Executable File

Exploring the file system. Johan Montelius HT2016

Introduction to Linux

CS 300 Data Structures

Basic Survival UNIX.

CS 220: Introduction to Parallel Computing. Input/Output. Lecture 7

Transcription:

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is in C in a Unix environment, we ll spend some time getting you up to speed on C and Unix. C for C++ Programmers C is a subset of C++ so much of the familiar syntax if you re a C++ programmer is the same in C. built-in base types like char, short, int, long, float, double preprocessor for #include, #define, etc. familiar control structures: if, while, do, switch, for operators: +, -, *, /, =, ==,!=, <, >, +=, ++,... execution begins by calling a function named main() function definition syntax is the same multiple files and headers handled similarly comment blocks with /* */, lines with // Main difference: no classes. no constructors/destructors no inheritance no operator overloading no new or delete memory is allocated through system calls no templates no String or Boolean types no cout, cin use stdio library functions such as printf, scanf, fopen, fclose..

Object oriented programming is still possible, it s just not directly supported by the language. A Very Simple C Program We begin by compiling and running a very simple C program (hello.c) in a Unix environment. More about some of these steps later, but for now we will just look at the example. See: /cs/terescoj/shared/cs2500/examples/hello Things to note from this simple example: We run a program named gcc, which is a free C compiler. gcc, in its simplest form, can be used to compile a C program in a single file: gcc hello.c In this case, we re asking gcc to compile a C program found in the file hello.c. Since we didn t specify what to call the executable program produced, gcc produces a file a.out. The name is a.out for historical reasons. When we want to run a program located in our current directory in a Unix shell, we type its name. For example, when we wanted to run gcc, we typed its name, and the Unix shell found a program on the system in a file named gcc. How does it know where to find it? The shell searches for programs in a sequence of directories known as the search path. Try: env. So if we want to run a.out, we should be able to type its name. But our current directory, always referred to in a Unix shell by., is not in the search path. We need to specify the. as part of the command to run:./a.out Of course, we probably don t want to compile up a bunch of programs all named a.out, so we usually ask gcc to put its output in a file named as one of the parameters to gcc: gcc -o hello hello.c Here, the executable file produced is called hello. And in the program itself, let s make sure we understand everything: At the top of the file, we have a big comment describing what the program does, who wrote it, and when. Your programs should have something similar in each C file. 2

We are going to use a C library function called printf to print a message to the screen. Before we can use this function, we need to tell the C compiler about it. For C library functions, the needed information is provided in header files, which usually end in.h. In this case, we need to include stdio.h. Why? See man 3 printf. (More on the Unix manual later.) A C program starts its execution by calling the function main. Any command-line parameters are provided to main through the first two arguments to main, traditionally declared as argc, the number of command-line parameters (including the name of the program itself), and argv, an array of pointers to character strings, each of which represents one of the command-line parameters. In this case, we don t use them, but there they are. Our call to printf results in the string passed as a parameter to be printed to the screen. The \n results in a new line. Our main function returns an int value. A value of 0 returned from main generally indicates a successful execution, while a non-zero return indicates an error condition. So we return a 0. A Bit More Complex Example We next consider an overly complicated C program that simply computes the greatest common denominator of two integer values. See: /cs/terescoj/shared/cs2500/examples/gcd-c Lots of things to notice here: We have four files: gcd.c: the implementation of the gcd function gcd.h: a header file with a prototype for the gcd function gcdmain.c: a main program that determines the input numbers, computes the GCD, and prints the answer, and Makefile: a make file that gives a set of rules for compiling these files into the executable program gcdmain. When executing, functions from both gcdmain.c (main) and gcd.c (gcd) will be used. Both of these are included in our executable file gcdmain. Start with gcd.c: This is a very simple recursive function to compute the greatest common denominator using the Euclidean Algorithm. 3

There is no main function here, so if we try to compile this by itself as we did with hello.c, we will get an error. Instead, we have gcc use compile only mode to generate an object file gcd.o from gcd.c: gcc -c gcd.c gcd.o is a compiled version of gcd.c, but it cannot be executed. C (and many other languages) require a two steps for source code to be converted into an executable. The first step compiles source code into object code, the second takes a collection of object code files and links together the references in those files into an executable file. (There s much more to discuss here, but this should suffice for now.) Next up, gcd.h: Much like stdio.h tells the compiler what it needs to know about printf (among other things), we have gcd.h to tell other C functions what they need to know about the function gcd. Namely, that it s a function that takes two ints as parameters and returns an int. Any C file that contains a function that calls gcd should #include "gcd.h". The driver program, gcdmain.c: We include several header files to tell the compiler what it needs to know about C library functions (and our gcd function) that are called by functions defined here. This is where our main function is defined. We can define local variables to functions, just like local variables in a Java or C++ method. In this case, we look at the arguments to main that provide the command-line parameters of our program: argc and argv. If we have fewer than three command-line parameters, including the program name itself (which is always there), we prompt the user for two numbers (with printf), then read in two numbers from the terminal with scanf. scanf is a very strange thing. It will make a bit more sense when you are more familiar with printf, but for now we can summarize what we see there as read in two integer values (represented by the %d s in the format string), and put them into the place pointed at by the address of a and the address of b, then return the number of values that matched the input with the correct format. Right. The scanf call forces us to think a bit about pointers, which are the key to understanding so much of how C works. scanf s parameters after the format string are always a list of pointers to a place in memory where there is room to put the values being read in. In this case, we want the two int values to end up in the local variables a and b, so we have to take the address of those variables with the & operator. Don t worry, it will make better sense when you see more examples. 4

Next, we check to make sure that the input to scanf did, in fact, represent two int values. If not, we print an error message and exit. Otherwise, we continue. Some things to notice in the error condition: We use fprintf instead of printf. This is because we want to give this output special significance. Rather than sending it to the standard output, which is what printf would do, we send it to standard error, by using fprintf and specifying stderr as the first parameter. Other than that, it works just like printf. We give it a format string. In this case, it includes one specifier, a %s, which means to expect an additional parameter which is a character string. Here, the string is argv[0], the first command-line parameter, which is always the name of the program. This labels the error message with the program name. Once we have detected the error, we don t want to continue, so we call the exit function with an error code of 1 to terminate execution. We could also use the call return 1;. In the case where at least two command-line parameters were provided, we try to convert them (argv[1] and argv[2]) to integer values. This is done with the overly complicated strtol function, which we use then check error conditions. The man page for strtol tells us we need to include two additional header files, stdlib.h and limits.h. It also tells us about the parameters to strtol, which are the string which we would like to convert to a number, a pointer into the string at the point beyond which we matched a number (which we don t care about, so we pass in NULL), and the base to use for the conversion. We also see that the number is the return value. Error checking for strtol is messy we need to check the variable errno, defined in errno.h, to see if an error condition was encountered. If so, errno will be a non-zero value and we print an error message and exit. Note that the error check here has two %s s, so we have two additional parameters to fprintf. Finally, we re ready to check that the numbers entered are non-negative, and if so, we print out the answer (obtained by the gcd function call inside of a printf parameter). This file includes a main function, so we might think we could compile it to an executable as we did with hello.c, but if we try, we ll find that it doesn t know how to find the gcd function. Again, we ll have to compile but not link: gcc -c gcdmain.c This produces the object file gcdmain.o. We need to link together our two object files, which, together, have the function definitions we need: gcc -o gcdmain gcdmain.o gcd.o 5

This gives is gcdmain, which we can run. The Makefile, which you will learn about as part of Lab 1, contains rules to generate a sequence of calls to gcc that will correctly compile and link the gcdmain executable. The bad news: that was a lot of trouble just to write a simple program. The good news: you will have a lot of examples to go on and you can ask a lot of questions. Example: Matrix Multiplication We ll get started by using a matrix-matrix multiply as a running example. We start with a simple implementation of a matrix-matrix multiply and use it to learn/review a bit about C and Unix: See: /cs/terescoj/shared/cs2500/examples/matmult This is another example of separate compilation The function in timer.c will be useful throughout the semester. We tell matmult.c about it with the line #include "timer.h" This provides a prototype of the function in timer.c. In many cases, this file would also define any data structures or constants/macros used by the functions it defines. This is a good model to use as you move forward and develop more complicated C programs. Group functions as you would group methods in a Java class or member functions in a C++ class. Along those same lines, the include files in angle brackets #include <stdio.h> #include <stdlib.h> #include <sys/time.h> specify system-wide header files. By convention (though most compilers don t really make a distinction) system-wide header files are in angle brackets, while your own header files are in double quotes. Each file can then be compiled separately to create an object file (.o file) from the C source. These object files are all listed at the linking step. What happens for function diffgettime() at compile time? Link time? The program uses two system calls: printf() and gettimeofday(). To see how these work, we can look at their man pages: man printf 6

to see everything we wanted to know about a particular system call. But if you do this, you might get a man page for a command-line utility called printf instead of the system call printf(). Not what we were looking for. The Unix manual is divided up into sections. The most important of these sections, for our purposes, are Section 1: User Commands, and Section 3: Library Functions. If we don t ask for a section, we get section 1. Since section 1 contains an entry for printf, that s what it produced. To force it to give you the system call manual page, you can use (in Linux, on a Mac, or with FreeBSD) man 3 printf This tells it to look in section 3, which contains system calls in the C library. How did I know to look in section 3? Mainly because the printf man page in section 1 told me so, at the bottom under the See Also section. In Solaris, the syntax is a bit more complex: man -s 3C printf Again, I knew to request section 3C of the manual by looking at the bottom of the printf(1) man page. Fortunately, you only need to concern yourself with what section of the manual to use when you look something up that it in more than one section. For example, man gettimeofday brings up the man page we want, for the gettimeofday() system call in section 3C when requested in Solaris, section 2 (the system calls section) under FreeBSD, Mac OS X, or Linux. If you see a reference to something like ctime(3c) in the See Also section of a man page, such as that in gettimeofday() s man page, that means the ctime() man page is in section 3C. I will use this notation frequently throughout the semester. You will find the Unix manual very helpful as we move forward. So what does gettimeofday(3c) do? See the man page and look at the usage in the example program. what s going on with memory management? what would happen if we declared struct timeval * variables instead of struct timeval? gettimeofday(3c) returns wall clock times. This is the amount of elapsed real time. So if our process is taking turns on the CPU with other processes (see the Operating Systems course) and it is not always running, it continues to accumulate wall clock time, but not CPU usage time. There are also system calls to examine CPU usage time which we may consider later. The Makefile is using the GNU compiler (gcc) with the option -O for optimization. If you want to run this with a different compiler or optimization flags, you can change the CC= line in the Makefile. 7

If we compile and run this program, it reports initialization and matrix multiplication times. Initialization is just filling in matrices a and b. Then we compute the value of each element of c using the dot product of the corresponding row of a and column of b. Remember your data structures and algorithms: what is the complexity of matrix-matrix multiply? How long does it take you to run this program? Example: Reversing a String Solution to be developed in class. Focus on: strings dynamic and stack memory allocation More Unix Basics To access a Unix system you need to have an account. A Unix account includes: username and password userid and groupid home directory shell About the username: A username is (typically) a sequence of alphanumeric characters of length no more than 8 the username is the primary identifying attribute of an account username is often used as the basis of an email address the name of your home directory is usually related to your username About the password: a password is a secret string that only the user knows not even the system knows, since the password is stored in an encrypted form and there is no method to decrypt 8

when you enter your password the system encrypts it and compares to a stored string it s a good idea to include numbers and/or special characters (don t use an English word!) About the userid: a userid (uid) is a number (an integer) that identifies a Unix account each userid is unique within a system it is easier (and more efficient) for the system to use a number than a string like the username You don t need to know your userid, but you can find it by typing id at the command prompt Groups and the groupid: Unix includes the notion of a group of users members of a Unix group can share files and active processes each account is assigned a primary group the groupid (gid) is a number that corresponds to this primary group a single account can belong to many groups (but has only one primary group) Home directory: a home directory is a place in the file system where files related to an account are stored a directory is like a Windows or Mac folder many Unix commands and applications make use of the account home directory (as a place to look for customization files, for example) The Unix shell: a shell is a unix program that provides an interactive session - a text-based user interface when you log in to a Unix system, the program you initially interact with is your shell there are a number of popular shells that are available bash, csh, tcsh, ksh, etc. the shell presents you with a (customizable) prompt 9

this prompt (and much more) can be changed by startup files or dotfiles every Unix process has a current working directory the shell starts with the home directory as the current working directory the shell runs programs on your behalf Files and file names: a file is a basic unit of storage (usually storage on a disk) every file has a name Unix file names can contain any characters (although some make it difficult to access the file) Unix file names can be long! how long depends on your specific flavor of Unix (1024 on my Mac) each file can hold some raw data. Unix does not impose any structure on files files can hold any sequence of bytes many programs interpret the contents of a file as having some special structure text file, sequence of integers, database records, etc. Directories: a directory is a special kind of file - Unix uses a directory to hold information about other files we often think of a directory as a container that holds other files (or directories) Mac and Windows users: A directory is the same idea as a folder folders are used as a GUI interface to directories and not unique to Unix/Linux/FreeBSD each file in the same directory must have a unique name files that are in different directories can have the same name Unix File System: 10

the filesystem is a hierarchical system of organizing files and directories the top level in the hierarchy is called the root and holds all files and directories the name of the root directory is / File paths: the pathname of a file includes the file name and the name of the directory that holds the file, and the name of the directory that holds the directory that holds the file, and the name of the,..., up to the root the pathname of every file in a Unix filesystem is unique to create a pathname you start at the root (so you start with /), then follow the path down the hierarchy (including each directory name) and you end with the filename in between every directory name you put a / absolute pathnames start with a /, and fully specify the file relative pathnames do not, and are relative to the current working directory. is a relative pathname that always refers to the current working directory.. is a relative pathname that always refers to the parent of the current directory File attributes: access/modification times size ownership (owner, group) permissions hey, it s octal! 11