Ways to Fail this Class. Welcome! Questions? Course Policies. CSC 9010: Natural Language Processing. Who / what / where / when / why / how 1/9/2014

Similar documents
Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

Python I. Some material adapted from Upenn cmpe391 slides and other sources

Chapter 2: Complexity Analysis

Some material adapted from Upenn cmpe391 slides and other sources

History Installing & Running Python Names & Assignment Sequences types: Lists, Tuples, and Strings Mutability

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

Getting started with programming For geospatial data analysis. Robert Hijmans UC Davis

CS:3330 (22c:31) Algorithms

CS240 Fall Mike Lam, Professor. Algorithm Analysis

CSE 146. Asymptotic Analysis Interview Question of the Day Homework 1 & Project 1 Work Session

Lecture 12 Programming for automation of common data management tasks

Lecture 5: Running Time Evaluation

Lab 1: Course Intro, Getting Started with Python IDLE. Ling 1330/2330 Computational Linguistics Na-Rae Han

Choice of C++ as Language

What is an algorithm?

Data Structures Lecture 8

Analysis of Algorithm. Chapter 2

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Analysis of Algorithms

Algorithm Analysis. Applied Algorithmics COMP526. Algorithm Analysis. Algorithm Analysis via experiments

Algorithm. Algorithm Analysis. Algorithm. Algorithm. Analyzing Sorting Algorithms (Insertion Sort) Analyzing Algorithms 8/31/2017

LECTURE 9 Data Structures: A systematic way of organizing and accessing data. --No single data structure works well for ALL purposes.

Algorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

Asymptotic Analysis of Algorithms

CS240 Fall Mike Lam, Professor. Algorithm Analysis

Module 1: Asymptotic Time Complexity and Intro to Abstract Data Types

[ 11.2, 11.3, 11.4] Analysis of Algorithms. Complexity of Algorithms. 400 lecture note # Overview

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

asymptotic growth rate or order compare two functions, but ignore constant factors, small inputs

Today s Outline. CSE 326: Data Structures Asymptotic Analysis. Analyzing Algorithms. Analyzing Algorithms: Why Bother? Hannah Takes a Break

And Parallelism. Parallelism in Prolog. OR Parallelism

CSE 373 APRIL 3 RD ALGORITHM ANALYSIS

Why study algorithms? CS 561, Lecture 1. Today s Outline. Why study algorithms? (II)

CS S-02 Algorithm Analysis 1

Assignment 1 (concept): Solutions

Basic Syntax - First Program 1

(Refer Slide Time: 1:27)

Python for Non-programmers

Recall from Last Time: Big-Oh Notation

10/5/2016. Comparing Algorithms. Analyzing Code ( worst case ) Example. Analyzing Code. Binary Search. Linear Search

Outline and Reading. Analysis of Algorithms 1

Complexity of Algorithms. Andreas Klappenecker

ENGR 101 Engineering Design Workshop

Analysis of Algorithms & Big-O. CS16: Introduction to Algorithms & Data Structures Spring 2018

Elementary maths for GMT. Algorithm analysis Part I

9/10/2018 Algorithms & Data Structures Analysis of Algorithms. Siyuan Jiang, Sept

CSCE 110 Programming I Basics of Python: Variables, Expressions, Input/Output

CSCE 110 Programming I

Data Structures and Algorithms

Algorithms and Programming I. Lecture#12 Spring 2015

Complexity of Algorithms

Analysis of Algorithms. CSE Data Structures April 10, 2002

Python: common syntax

CSI33 Data Structures

0.1 Welcome. 0.2 Insertion sort. Jessica Su (some portions copied from CLRS)

ORDERS OF GROWTH, DATA ABSTRACTION 4

Part III Appendices 165

Variables, expressions and statements

Introduction to Programming

Algorithm Analysis. Part I. Tyler Moore. Lecture 3. CSE 3353, SMU, Dallas, TX

CHAPTER 2: Introduction to Python COMPUTER PROGRAMMING SKILLS

CMSC 201 Computer Science I for Majors

What Version Number to Install

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: September 28, 2016 Edited by Ofir Geri

Programming for Engineers in Python. Autumn

There are two ways to use the python interpreter: interactive mode and script mode. (a) open a terminal shell (terminal emulator in Applications Menu)

MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015

Introduction to Python. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

CSCI-1200 Data Structures Fall 2017 Lecture 7 Order Notation & Basic Recursion

age = 23 age = age + 1 data types Integers Floating-point numbers Strings Booleans loosely typed age = In my 20s

Algorithms and Theory of Computation. Lecture 2: Big-O Notation Graph Algorithms

The current topic: Python. Announcements. Python. Python

CS1 Lecture 2 Jan. 16, 2019

Overview. CSE 101: Design and Analysis of Algorithms Lecture 1

1 Introduction. 2 InsertionSort. 2.1 Correctness of InsertionSort

CS 3813/718 Fall Python Programming. Professor Liang Huang.

There are two ways to use the python interpreter: interactive mode and script mode. (a) open a terminal shell (terminal emulator in Applications Menu)

Recursion. COMS W1007 Introduction to Computer Science. Christopher Conway 26 June 2003

Table of Contents EVALUATION COPY

Introduction to: Computers & Programming: Review prior to 1 st Midterm

Principles of Algorithm Analysis. Biostatistics 615/815

ASYMPTOTIC COMPLEXITY

CS1 Lecture 3 Jan. 22, 2018

CS302 Topic: Algorithm Analysis. Thursday, Sept. 22, 2005

Starting with a great calculator... Variables. Comments. Topic 5: Introduction to Programming in Matlab CSSE, UWA

Chapter 2 Writing Simple Programs

PYTHON FOR KIDS A Pl ayfu l I ntrodu ctio n to Prog r am m i ng J a s o n R. B r i g g s

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

CS 220: Introduction to Parallel Computing. Beginning C. Lecture 2

TREES AND ORDERS OF GROWTH 7

CS 310: Order Notation (aka Big-O and friends)

Data Structures and Algorithms

ASYMPTOTIC COMPLEXITY

STATS 507 Data Analysis in Python. Lecture 0: Introduction and Administrivia

The growth of functions. (Chapter 3)

Algorithmics. Some information. Programming details: Ruby desuka?

ANALYSIS OF ALGORITHMS

UNIT 1 ANALYSIS OF ALGORITHMS

Transcription:

Welcome! About me Max Peysakhov, Adjunct Faculty, CS Email: mpeysakhov@gmail.com Office: N/A Office hours:wed 5-6PM, or email for appt. About this course Syllabus, timeline, & resources on-line... http://edge.cs.drexel.edu/people/peysakhov/classes/cs260/ Ways to Fail this Class Fail to hand in more than 50% of the homeworks.(regardless of the exam grades) Receive a failing grade for both exams.(regardless of the homework grades) Falsify any results. Misrepresent another's work as your own (i.e., plagiarism). Who / what / where / when / why / how Prerequisites Lectures Readings Assignments Exams Grading Communication Policies Questions? Questions? If you do not understand ask questions. Otherwise I will continue the lecture. And we end up like this: Course Policies Late policy: NO late homeworks, No excuses. (Some considerations may be given for a valid medical reason) NO Cell phones! Cell phone ring = extra homework NO cheating. You caught you get F. No mercy( http://drexel.edu/cs/academics/undergrad/policies/academic-integrity/ ). No extra homeworks, projects or tests to improve the grade will be given after the final. Makeup exams. Advanced notice only. Medical reasons only. Midterm/Final grade will be assigned to final/midterm test. CSC 9010: Natural Language Processing Python Intro 1

What is Python? A programming language with strong similarities to PERL, but with powerful typing and object oriented features. Commonly used for producing HTML content on websites. Great for text files. Useful built-in types (lists, dictionaries). Clean syntax, powerful extensions. Python Tutorials Things to read through Dive into Python (Chapters 2 to 4) http://diveintopython.org/ Python 101 Beginning Python http://www.rexx.com/~dkuhlman/python_101/python_101.html Things to refer to The Official Python Tutorial http://www.python.org/doc/current/tut/tut.html The Python Quick Reference http://rgruet.free.fr/pqr2..html Why Python? Natural Language ToolKit Ease of use; interpreter AI Processing: Symbolic Python s built-in datatypes for strings, lists, and more. Java or C++ require the use of special classes for this. AI Processing: Statistical Python has strong numeric processing capabilities: matrix operations, etc. Suitable for probability and machine learning code. Installing Python Python is installed on the PCs in 156. Python for Win/Mac/Unix/Linux is available from www.python.org. Generally an easy install. On macs, already part of OS X. For NLTK you need Python 2. or higher. GUI development environment: IDLE. Credits: http://hkn.eecs.berkeley.edu/~dyoo/python/idle_intro/index.ht Learning Python Unfortunately, we won t have time to cover all of Python in class; so, we re just going to go over ome highlights. You ll need to learn more on your own. Homework 0 asks you to install Python or get comfortable using it here in the lab. It also asks you to read some online Python tutorials. Later homeworks will include Python programming exercises to help you practice. We will fairly rapidly move to using the Natural Language ToolKit. IDLE Development Environment Shell for interactive evaluation. Text editor with color-coding and smart indenting for creating python files. Menu commands for changing system settings and running files. We will use IDLE in class. 2

Look at a sample of code x = 4-2 # A comment. y = Hello # Another one. z =.45 if z ==.45 or y == Hello : x = x + 1 y = y + World # String concat. print x print y Basic Datatypes Integers (default for numbers) z = 5 / 2 # Answer is 2, integer division. Floats x =.456 Strings Can use or to specify. abc abc (Same thing.) Unmatched ones can occur within the string. matt s Use triple double-quotes for multi-line strings or strings than contain both and inside of them: a b c Look at a sample of code x = 4-2 # A comment. y = Hello # Another one. z =.45 if z ==.45 or y == Hello : x = x + 1 y = y + World # String concat. print x print y Whitespace Whitespace is meaningful in Python: especially indentation and placement of newlines. Use a newline to end a line of code. (Not a semicolon like in C++ or Java.) (Use \ when must go to next line prematurely.) No braces { } to mark blocks of code in Python Use consistent indentation instead. The first line with a new indentation is considered outside of the block. Often a colon appears at the start of a new block. (We ll see this later for function and class definitions.) Enough to Understand the Code Assignment uses = and comparison uses ==. For numbers +-*/% are as expected. Special use of + for string concatenation. Special use of % for string formatting. Logical operators are words (and, or, not) not symbols (&&,,!). The basic printing command is print. First assignment to a variable will create it. Variable types don t need to be declared. Python figures out the variable types on its own. Comments Start comments with # the rest of line is ignored. Can include a documentation string as the first line of any new function or class that you define. The development environment, debugger, and other tools use it: it s good style to include one. def my_function(x, y): This is the docstring. This function does blah blah blah. # The code would go here...

Look at a sample of code x = 4-2 # A comment. y = Hello # Another one. z =.45 if z ==.45 or y == Hello : x = x + 1 y = y + World # String concat. print x print y Accessing Non-existent Name If you try to access a name before it s been properly created (by placing it on the left side of an assignment), you ll get an error. >>> y Traceback (most recent call last): File "<pyshell#16>", line 1, in -toplevely NameError: name y' is not defined >>> y = >>> y Python and Types Python determines the data types in a program automatically. But Python s not casual about types, it enforces them after it figures them out. Typing Dynamic Typing Strong So, for example, you can t just append an integer to a string. You must first convert the integer to a string itself. x = the answer is # Decides x is string. y = 2 # Decides y is integer. print x + y # Python will complain about this. Multiple Assignment You can also assign to multiple names at the same time. >>> x, y = 2, >>> x 2 >>> y Naming Rules Names are case sensitive and cannot start with a number. They can contain letters, numbers, and underscores. bob Bob _bob _2_bob_ bob_2 BoB There are some reserved words: and, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while String Operations We can use some methods built-in to the string data type to perform some formatting operations on strings: >>> hello.upper() HELLO There are many other handy string operations available. Check the Python documentation for more. 4

Printing with Python You can print a string to the screen using print. Using the % string operator in combination with the print command, we can format our output text. >>> print %s xyz %d % ( abc, 4) abc xyz 4 Print automatically adds a newline to the end of the string. If you include a list of strings, it will concatenate them with a space between them. >>> print abc >>> print abc, def abc abc def Complexity In examining algorithm efficiency we must understand the idea of complexity Space complexity Time Complexity Hands On Okay, let s try it. www.python.org/doc/current/tut/tut.html Or on these PCs under Documentation. First two sections you should read. For now: Start tonight with Section, An Informal Introduction to Python, and work through the examples given. Space Complexity When memory was expensive we focused on making programs as space efficient as possible and developed schemes to make memory appear larger than it really was (virtual memory and memory paging schemes) Space complexity is still important in the field of embedded computing (hand held computer based equipment like cell phones, palm devices, etc) Time Complexity Algorithm Analysis (Big O) CS-41 Dick Steflik Is the algorithm fast enough for my needs How much longer will the algorithm take if I increase the amount of data it must process Given a set of algorithms that accomplish the same thing, which is the right one to choose 5

Algorithm Efficiency a measure of the amount of resources consumed in solving a problem of size n time space Benchmarking: implement algorithm, run with some specific input and measure time taken better for comparing performance of processors than for comparing performance of algorithms Big Oh (asymptotic analysis) associates n, the problem size, with t, the processing time required to solve the problem Frequency Count examine a piece of code and predict the number of instructions to be executed Inst # 1 2 for each instruction predict how many times each will be encountered as the code runs Code for (int i=0; i< n ; i++) { cout << i; } p = p + i; totaling the counts produces the F.C. (frequency count) F.C. n+1 n n n+1 Cases to examine Best case if the algorithm is executed, the fewest number of instructions are executed Average case executing the algorithm produces path lengths that will on average be the same Worst case executing the algorithm produces path lengths that are always a maximum Order of magnitude In the previous example: best_case = avg_case = worst_case Example is based on fixed iteration n By itself, Freq. Count is relatively meaningless Order of magnitude -> estimate of performance vs. amount of data To convert F.C. to order of magnitude: discard constant terms disregard coefficients pick the most significant term Worst case path through algorithm -> order of magnitude will be Big O (i.e. O(n)) Worst case analysis Of the three cases, only useful case (from the standpoint of program design) is that of the worst case. Worst case helps answer the software lifecycle question of: If its good enough today, will it be good enough tomorrow? Another example Inst # 1 2 4 Code for (int i=0; i< n ; i++) for int j=0 ; j < n; j++) { cout << i; } p = p + i; discarding constant terms produces : n 2 +2n clearing coefficients : n 2 +n picking the most significant term: n 2 F.C. n+1 n(n+1) n*n n*n F.C. n+1 n 2 +n n 2 n 2 n 2 +2n+1 Big O = O(n 2 ) 6

What is Big O Big O rate at which algorithm performance degrades as a function of the amount of data it is asked to handle For example: O(n) -> performance degrades at a linear rate O(n 2 ) -> quadratic degradation Asymptotic Growth Rates Big-O (upper bound) f(n) = O(g(n)) [f grows at the same rate or slower than g] iff: There exists positive constants c and n 0 such that f(n) c g(n) for all n n 0 f is bound above by g Note: Big-O does not imply a tight bound Ignore constants and low order terms Common growth rates Big-O, Examples E.G. 1: 5n 2 = O(n ) c = 1, n 0 = 5: 5n 2 n n 2 = n E.G. 2: 100n 2 = O(n 2 ) c = 100, n 0 = 1 E.G. : n = O(2 n ) c = 1, n 0 = 12 n (2 n/ ), n 2 n/ for n 12 [use induction] Big Oh - Formal Definition Definition of "big oh": f(n)=o(g(n)), iff there exist constants c and n 0 such that: f(n) <= c g(n) for all n>=n 0 Thus, g(n) is an upper bound on f(n) Note: f(n) = O(g(n)) is NOT the same as O(g(n)) = f(n) The '=' is not the usual mathematical operator "=" (it is not reflexive) Little-o Loose upper bound f(n) = o(g(n)) [f grows strictly slower than g] f(n) = O(g(n)) and g(n) O(f(n)) lim n f(n)/g(n) = 0 f is bound above by g, but not tightly 7

Little-o, restatement lim n f(n)/g(n) = 0 f(n) = o(g(n)) ε >0, n 0 s.t. n n 0, f(n)/g(n) < ε Common Results [c > 1, k an integer] lim n n k /c n = / lim n kn k-1 / c n ln(c) lim n k(k-1)n k-2 / c n ln(c) 2 lim n k(k-1) (k-1)/c n ln(c) k = 0 n k = o(c n ) Equivalence - Theta f(n) = Θ(g(n)) [grows at the same rate] f(n) = O(g(n)) and g(n) = O(f(n)) g(n) = Θ(f(n)) lim n f(n)/g(n) = c, c 0 f(n) = Θ(g(n)) f is bound above by g, and below by g Asymptotic Growth Rates Θ(log(n)) logarithmic [log(2n)/log(n) = 1 + log(2)/log(n)] Θ(n) linear [double input double output] Θ(n 2 ) quadratic [double input quadruple output] Θ(n ) cubit [double input output increases by factor of 8] Θ(n k ) polynomial of degree k Θ(c n ) exponential [double input square output] Common Results [j < k] lim n n j /n k = lim n 1/n (k-j) = 0 n j = o(n k ), if j<k [c < d] lim n c n /d n = lim n (c/d) n = 0 c n = o(d n ), if c<d lim n ln(n)/n = / lim n ln(n)/n = lim n (1/n)/1 = 0 [L Hopital s Rule] ln(n) = o(n) [ε > 0] ln(n) = o(n ε ) [similar calculation] Asymptotic Manipulation Θ(cf(n)) = Θ(f(n)) Θ(f(n) + g(n)) = Θ(f(n)) if g(n) = O(f(n)) 8