Computing Seminar Introduction Oct 6 2010
Outline today Programming/computing basics terminology, high level concepts (variables, control flow, input/output) Before next week... Make sure you can login (windows or linux) Make sure MATLAB runs (windows desktop icon or type 'matlab' in a linux console window)
What is a 'computer language'? A description of a solution to a computational problem, using some specific format and syntax Computers are dumb the languages they natively understand are extremely primitive. Higher level languages will contain more complicated operations, and less restricted formatting The actual written text we call source code, code, piece of code,...
Language types Increasing Complexity Lowest Level = Machine code (what the CPU actually understands) Assembly Compiled Languages (C,C++,Fortran) Highest Level = Interpreted Languages (MATLAB, IDL, Python, Perl, etc...)
We won't talk about machine code/assembly they are far too low level for our purposes Compiled languages: Source code must be translated into machine code by a program called a compiler This translation process is non-interactive Note that machine code is never portable, and the source code is sometimes portable (e.g. between a Windows machine and a Linux machine) Errors can be very confusing (both at compile-time and at run-time) But, compiled code can have fast execution times. Needed for simulation and modeling
Interpreted languages: All interpreted languages have a program (the interpreter, or the shell ), which does 3 things: Reads a line of text, input by the user Parses the line and runs the associated programs Collects output from the programs, and displays or stores it somewhere This process is inherently interactive, and much more suitable for many scientific tasks (e.g. exploratory data analysis) Running a program in an interpreted language loads the text into the interpreter. (No compilation) The same program could be identically run interactively at the shell Errors are much easier to diagnose and correct
Summary points: Compiled language advantages: Speed Interpreted language advantages: Easier to debug (Almost) always portable Much more suitable for data analysis Most interpreters include huge numbers of useful additional programs (e.g., numerical algorithms, graphical plotting) Interactive shell is powerful in its own right, even without writing any programs
Generic Language Concepts Variables a text name containing a piece of data Program controls source code is always processed linearly, one line at a time controls allow branching or repetition I/O (input/output) how to get data in or out of the program We'll use pseudocode for examples of some of these e.g., not specific syntax for any particular language
Variables A Variable is a piece of data accessible by the shell, usable in a program. Variables have a name and a type. Specific languages have different restrictions on names (e.g., must start with an alphabetic character) Variable types: Numeric: integers, floating point Text: characters (or chars ) Logical: (0 or 1, true or false) also called booleans Structures: compound data types containing multiple of any of the above Assignment: MyVar = 2.01
Numeric types: Variable Types Integers can be signed, unsigned Integers have a limited range, fixed by the size: 16 bits for a short integer = 32768-32767 16 bits for an unsigned short integer = 0-65535 32 bits for a long integer = 2.1B 2.1B, etc... Calculations that produce fractions are truncated if performed with integer variable: e.g., the fraction 19/10 will evaluate to 1. In general, integers are best used for variables representing counts of things, etc, and not data values
Variable Types Floating point has a much larger range: single precision (32 bits)= ±3.40282 10 38 double precision (64 bits) = ±1.79769 10 308 Can represent fractions: 19/10 will evaluate to 1.9. Measurement data is (almost) always floating point Although floating point numbers represent fractions, the actual internal representation still uses integers for example, in single precision, the value of pi is: 4788187 = 1 8388608 2 128 127 This is the reason for floating point round-off error Single precision roundoff error ~ 10 7, double ~ 10 16
Variable Types Characters are stored as 8-bit unsigned integers (ASCII) 65 = 'A', 66 = 'B', 48 = '0' Computer translates the integer values back to the coded characters for display More complicated text representations exist (Unicode) not used much in scientific settings
Variable Types Arrays: variables with multiple elements of the same time, in 1-D, 2-D, (must be rectangular) An array of chars is usually called a string Arrays must be rectangular (like a matrix in 2-D has a specific number of rows and columns) Refer to an array element with an integer index: MyArrayName(i), MyArrayName(i,j,k) Different languages have different syntax here: (k) vs [k], etc 0-ordered vs 1-ordered, etc.
Variable Types Structures: Compound data type, where each element has a name, and an arbitrary data type Each element is usually called a field, and the name is the fieldname, separated by a period: MyStruct.MyFieldName Example: a variable describing a surface station might have 3 fields: Station_Name (a string), Altitude (meters above sea level - a floating pt. number) Lat_Lon (degrees, a 2-element floating pt. array)
Program Control - Conditionals If / else: Execute a section of code if a certain condition is true, or execute a different section if false Conditional tests: equal, greater than, less than,... If MyVariable is equal to 5, Run some commands Else Run some other commands
Program Control - Loops for : Repeat a section of code a specified number of times while Repeat a section of code until a certain condition is met For each element in MyArray Compute a number based on the element. While there are outlier elements in MyArray Refit model and remove outlier(s)
Program control - I/O I/O directly from the shell: Input in this case means read values from user input automatic for interpreted languages Output means to print values back to the shell's console window I/O from files: Read values from a file on disk, and put them into variables; Or write the contents of variables into files
Program Control - Functions A piece of code that can be called - it has a name, and a list of input variables and output variables Similar to a mathematical function z = F(x,y) In general, input variables should not be changed (be careful about this one different languages may or may not enforce this) Variables inside functions have local scope
Program Flow with functions Define var1 Define var2 Running some commands... newvar = MyFunc(var1,var2) Running more commands... Creating some files Printing to console... Inside MyFunc: Gets var1 and var2 From the caller... Define localvar Run an algorithm using var1, var2, and localvar, results in newvar. Send newvar back to caller
Variable Scope: Variables inside MyFunc are local, and do not exist in the caller MyFunc should not modify var1, var2 (some languages automatically prevent that behavior) Define var1 Define var2 Running some commands... newvar = MyFunc(var1,var2) Running more commands... Creating some files Printing to console... Inside MyFunc: Gets var1 and var2 From the caller... Define localvar Run an algorithm using var1, var2, and localvar, results in newvar. Send newvar back to caller
Why use functions? Isn't that more complicated than one single, linear program? Yes, a little, but: Sometimes useful to hide local variables Functions can be re-used Locating bugs is easier with shorter, well-designed functions (rather than one monolithic program) Well documented functions can be easily used by others Breaks large implementation tasks into smaller more manageable ones
Really simple example: Convert Fahrenheit to Celsius: Think about how this is translated into pseudocode
Really simple example: Convert Fahrenheit to Celsius: Think about how this is translated into pseudocode: Get input values in degrees F Convert from F to C (subtract 32, multiply 5/9) Send degrees C values to output
Less simple example: With a single atmospheric sounding (temperature and dewpoint temperature as a function of pressure), compute the relative humidity at 800 hpa
Less simple example:
Less simple example: Read sounding data from file If needed, convert units If needed, interpolate T and DP to 800 hpa Compute saturation vapor pressure at T and DP RH = vapor pressure at DP / vapor pressure at T Send output to return variable (or print to console, output file...?)