TEXT MINING INTRO TO PYTHON

Similar documents
Course Introduction and Python Basics

CIS192: Python Programming

Python. Executive Summary

Introduction to Python

ENGR 102 Engineering Lab I - Computation

Python. Karin Lagesen.

Data Structures. Dictionaries - stores a series of unsorted key/value pairs that are indexed using the keys and return the value.

LISTS WITH PYTHON. José M. Garrido Department of Computer Science. May College of Computing and Software Engineering Kennesaw State University

Introduction to Python - Part I CNV Lab

Python. Jae-Gil Lee Based on the slides by K. Naik, M. Raju, and S. Bhatkar. December 28, Outline

Python: common syntax

The current topic: Python. Announcements. Python. Python

CS 349 / SE 382 Scripting. Professor Michael Terry March 18, 2009

Sequence types. str and bytes are sequence types Sequence types have several operations defined for them. Sequence Types. Python

LECTURE 3 Python Basics Part 2

Part I. Wei Tianwen. A Brief Introduction to Python. Part I. Wei Tianwen. Basics. Object Oriented Programming

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Python Intro GIS Week 1. Jake K. Carr

Data Structures. Lists, Tuples, Sets, Dictionaries

: Intro Programming for Scientists and Engineers Final Exam

Python I. Some material adapted from Upenn cmpe391 slides and other sources

Table of Contents. Preface... xxi

Extending Jython. with SIM, SPARQL and SQL

Rapid Application Development with

Lecture 4. while and for loops if else test Tuples Functions. Let us start Python Ssh (putty) to UNIX/Linux computer puccini.che.pitt.

The Practice of Computing Using PYTHON

CS2304: Python for Java Programmers. CS2304: Sequences and Collections

Python Tutorial. CS/CME/BioE/Biophys/BMI 279 Oct. 17, 2017 Rishi Bedi

Python Crash-Course. C. Basso. Dipartimento di Informatica e Scienze dell Informazione Università di Genova. December 11, 2007

COMP519 Web Programming Lecture 17: Python (Part 1) Handouts

Python Class-Lesson1 Instructor: Yao

Unit E Step-by-Step: Programming with Python

COMP 204: Dictionaries Recap & Sets

DSC 201: Data Analysis & Visualization

Python review. 1 Python basics. References. CS 234 Naomi Nishimura

Accelerating Information Technology Innovation

Some material adapted from Upenn cmpe391 slides and other sources

Python Programming, bridging course 2011

tutorial.txt Introduction

Table of Contents EVALUATION COPY

Midterm 1 Review. Important control structures. Important things to review. Functions Loops Conditionals

CSCE 110 Programming I Basics of Python: Variables, Expressions, Input/Output

Pace University. Fundamental Concepts of CS121 1

Python Review IPRE

Introduction to Python! Lecture 2

CSC148, Lab #4. General rules. Overview. Tracing recursion. Greatest Common Denominator GCD

Introduction to Python. Fang (Cherry) Liu Ph.D. Scien5fic Compu5ng Consultant PACE GATECH

Advanced Algorithms and Computational Models (module A)

GIS 4653/5653: Spatial Programming and GIS. More Python: Statements, Types, Functions, Modules, Classes

Senthil Kumaran S

Introduction to Python for Plone developers

Accelerating Information Technology Innovation

Introduction to Python Part 1. Brian Gregor Research Computing Services Information Services & Technology

Basic Python Revision Notes With help from Nitish Mittal

SOFT 161. Class Meeting 1.6

ARTIFICIAL INTELLIGENCE AND PYTHON

More Examples Using Lists Tuples and Dictionaries in Python CS 8: Introduction to Computer Science, Winter 2018 Lecture #11

Flow Control: Branches and loops

PYTHON FOR MEDICAL PHYSICISTS. Radiation Oncology Medical Physics Cancer Care Services, Royal Brisbane & Women s Hospital

Introduction to Python: Data types. HORT Lecture 8 Instructor: Kranthi Varala

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Python Review IPRE

CSCE 110 Programming I

PYTHON CONTENT NOTE: Almost every task is explained with an example

LECTURE 1. Getting Started with Python

CHAPTER 2: Introduction to Python COMPUTER PROGRAMMING SKILLS

DSC 201: Data Analysis & Visualization

Numerical Methods. Centre for Mathematical Sciences Lund University. Spring 2015

Introduction to Python. Didzis Gosko

Python: Short Overview and Recap

GE PROBLEM SOVING AND PYTHON PROGRAMMING. Question Bank UNIT 1 - ALGORITHMIC PROBLEM SOLVING

Topic 7: Lists, Dictionaries and Strings

Python Lists 2 CS 8: Introduction to Computer Science Lecture #9

History Installing & Running Python Names & Assignment Sequences types: Lists, Tuples, and Strings Mutability

Python a modern scripting PL. Python

Python Tutorial. CSE 3461: Computer Networking

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

MEMOIZATION, RECURSIVE DATA, AND SETS

Python for Scientists

Python Tutorial for CSE 446

Introduction to Python

Jarek Szlichta

Crash Dive into Python

Python. Department of Computer Science And Engineering. European University Cyprus

VLC : Language Reference Manual

[Software Development] Python (Part A) Davide Balzarotti. Eurecom Sophia Antipolis, France

Python BASICS. Introduction to Python programming, basic concepts: formatting, naming conventions, variables, etc.

Introduction to Python. Prof. Steven Ludtke

Webgurukul Programming Language Course

SD314 Outils pour le Big Data

Introduction to Scientific Python, CME 193 Jan. 9, web.stanford.edu/~ermartin/teaching/cme193-winter15

python 01 September 16, 2016

Exam 3. 1 True or False. CSCE 110. Introduction to Programming Exam 3

Python Basics. Nakul Gopalan With help from Cam Allen-Lloyd

age = 23 age = age + 1 data types Integers Floating-point numbers Strings Booleans loosely typed age = In my 20s

Python. ECE 650 Methods & Tools for Software Engineering (MTSE) Fall Prof. Arie Gurfinkel

STEAM Clown & Productions Copyright 2017 STEAM Clown. Page 1

Introduction to Python

More Data Structures. What is a Dictionary? Dictionaries. Python Dictionary. Key Value Pairs 10/21/2010

Python Training. Complete Practical & Real-time Trainings. A Unit of SequelGate Innovative Technologies Pvt. Ltd.

Transcription:

TEXT MINING INTRO TO PYTHON Johan Falkenjack (based on slides by Mattias Villani) NLPLAB Dept. of Computer and Information Science Linköping University JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 1 / 23

OVERVIEW What is Python? How is it special? Python's objects If-else, loops and list comprehensions Functions Classes Modules JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 2 / 23

WHAT IS PYTHON? First version in 1991 High-level language Emphasizes readability Interpreted (bytecode.py and.pyc) [can be compiled via C/Java] Automatic memory management Strongly dynamically typed Functional and/or object-oriented Glue to other programs (interface to C/C++ or Java etc) Popular in data science (Prototype in R, implement in Python) Two currently developed versions, 2.x and 3.x This course uses Python 3 JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 3 / 23

THE BENEVOLENT(?) DICTATOR FOR LIFE (BDFL) GUIDO VAN ROSSUM JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 4 / 23

PYTHON PECULIARITES (COMPARED TO R/MATLAB) Not primarily a numerical language. Indexing begins at 0, as indexes refer to breakpoints between elements. It follows that myvector[0:2] returns the rst and second element, but not the third. Indentation matters! Can import specic functions from a module. Assignment by object, NOT by copy or by reference. Approximately, assignment by copy of reference. a = b = 1 assigns 1 to both a and b. JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 5 / 23

PYTHON S OBJECTS Built-in types: numbers, strings, lists, dictionaries, tuples and les. Vectors, arrays and functions to manipulate them are available in the numpy/scipy modules. Python is a strongly typed language. 'johan' + 3 gives an error. Python is a dynamically typed language. No need to declare a variables type before it is used. Python gures out the object's type. Implication: Polymorphism by default: In other words, don't check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with. - Alex Martelli JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 6 / 23

STRINGS No character type, strings are strings, even if they are length 1. s = 'Spam' s[0] returns rst character, s[-2] return next to last character. s[0:2] returns rst two characters (slicing). len(s) returns the number of letters. s.lower(), s.upper(), s.count('m'), s.endswith('am'),... Which methods are available for my object? Try in Jupyter: type s. followed by TAB. + operator concatenates strings by creating a new string (computationally expensive if you do it often). sentence = 'Guido is the benevolent dictator for life'.sentence.split() s*3 returns 'SpamSpamSpam' JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 7 / 23

INDEXING Strings are always read from left to right Indexes are the positions between characters and start at 0, i.e. the position before the rst character. s[0] starts at index 0 and reads once character forward, i.e. returns the rst character in s. Slicing are when you take a slice from a string based on indexes, for instance s[1:2] returns the substring from position 1 to 3, i.e. the second and third characters. s[:3] returns the string upto position 3, s[3:] returns the string from position 3 to the end. s[2] is equivalent to s[2:3] (with strings). Indexes can be negative and are then counted from the end of the sequence (they are NOT excluders as in R). s[-2] returns next to last character, s[-3:] returns the substring containing the last three characters. JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 8 / 23

THE LIST OBJECT A list is an ordered container of several values, possibly of dierent types. mylist = ['spam','spam','bacon',2] The list object has several associated methods mylist.append('egg') mylist.count('spam') mylist.sort() + operator concatenates lists: mylist + myotherlist merges the two lists as one list. Elements are not named and lists are not vectors in a mathematical sense. JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 9 / 23

THE LIST OBJECT Extract elements from a list: mylist[1] Lists inside lists (nested lists): myotherlist = ['monty','python'] mylist[1] = myotherlist mylist[1] returns the list ['monty','python'] mylist[1][1] returns the string 'Python' Indexing and slicing works as with strings... EXCEPT a bit more generally, in a string, all elements are strings, but not all elements of lists are necessarily lists. myotherlist[1] returns an element (the string 'Python') while myotherlist[1:2] returns a list (['Python']). JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 10 / 23

STRINGS AGAIN Strings are immutable, i.e. can't be changed after creation. Every change creates a new string (remember concatenation). Try to avoid creating more strings than necessary: Avoid: my_string = 'Python' + ' ' + 'is' + ' ' + 'fun!' Instead: my_string = ' '.join(['python', 'is', 'fun!']) In loops where you construct strings, add constituents to a list and join after loop nishes. Remember that you have to explicitly change the type of, for instance, numbers, when concatenating: str(1) JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 11 / 23

DICTIONARIES Unordered collection of pairs, often names and some value. mydict = {'Sarah':29, 'Erik':28, 'Evelina':26} Elements are accessed by keyword not by index: mydict['evelina'] returns 26. Values can contain any object: mydict = {'Marcus':[23,14], 'Cassandra':17, 'Per':[12,29]}. returns 14. mydict['marcus'][1] Keys must be immutable: mydict = {2:'contents of box2', (3, 'a'):'content of box 4', 'blackbox':10} mydict.keys() mydict.values() mydict.items() JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 12 / 23

TUPLES mytuple = (3,4,'johan') Like lists, but immutable Why? Faster than lists Protected from change Can be used as keys in dictionaries Multiple return object from function Swapping variable content (a, b) = (b, a) Sequence unpacking a, b, c = mytuple list(mytuple) returns mytuple as a list. tuple(mylist) does the opposite. JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 13 / 23

SETS Set. Contains objects in no order with no identication. With a list, elements are ordered and identied by position. mylist[2] With a dictionary, elements are unordered but identied by some key. mydict['mykey'] With a set, elements stand for themselves. No indexing, no key-reference. Declaration: fib=set( (1,1,2,3,5,8,13) ) returns the set {1, 2, 3, 5, 8, 13}. Supported operations: len(s), x in s, set1 < set2, union, intersection, add, remove, pop... JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 14 / 23

VECTORS AND ARRAYS (AND MATRICES) from scipy import * x = array([1,7,3]) 2-dimensional array (matrix): X = array([[2,3],[4,5]]) Indexing arrays First row: X[0,] Second column: X[,1] Element in position 1,2: X[0,1] Array multiplication (*) is element-wise, for matrix multiplication use dot(). There is also a matrix object: X = matrix([[2,3],[4,5]]) Arrays are preferred (not matrices). Submodule scipy.linalg contains a lot of matrix-functions applicable to arrays (det(), inv(), eig() etc). I recommend: from scipy.linalg import * JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 15 / 23

BOOLEAN OPERATORS True/False and or not a = True; b = False; a and b [returns False]. JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 16 / 23

IF-ELSE CONSTRUCTS IF-ELSE STATEMENT a =1 if a==1: print('a is one') elif a==2: print('a is two) else: print('a is not one or two') Switch statements via dictionaries (see Jackson's Python book) if absolutely necessary. JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 17 / 23

WHILE LOOPS WHILE LOOP a =10 while a>1: print('bigger than one') a = a - 1 else: print('smaller than one') JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 18 / 23

FOR LOOPS for loops can iterate over any iterable. iterables: strings, lists, tuples, ranges FOR LOOP word = 'johan' for letter in word: print(letter) mylist = []*10 for i in range(10): mylist = 'johan' + str(i) JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 19 / 23

LIST COMPREHENSIONS As in R, loops can be slow. For small loops executed many times to generate many results, try comprehensions: Set denition in mathematics where X is some a nite set. Comprehensions in Python: {x for x X } {f (x) for x X } mylist = [x for x in range(10)] myset = {sqrt(x) for x in range(10)} (don't forget from math import sqrt) mydict = {x : sqrt(x) for x in sample if x >= 0} JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 20 / 23

DEFINING FUNCTIONS AND CLASSES def mysquare(x): return x**2 class myclass(object): def init (): self.count = 0 def method(self, x): self.count += x If a is instance of myclass, a.method(b) can be thought of as myclass.method(a, b) where self becomes the object a itself Make you own module by putting several classes or functions in a.py le. Then import what you need from that le. JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 21 / 23

STRINGS, A THIRD TIME String formatting is often used to create recurrent string injections: "My name is {0}".format('Fred') "The story of {0}, {1}, and {c}".format(a, b, c=d) Regular expressions. Used to nd and manipulate patterns in strings, extremely powerful stu. Available in Python through the module re. Introduction: https://regexone.com/lesson/introduction_abcs Practice: https://alf.nu/regexgolf JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 22 / 23

MISC Comments on individual lines starts with # Doc-strings can be used as comments spanning over multiple lines but this should be avoided """This is a looooong comment""" JOHAN FALKENJACK (NLPLAB, LIU) TEXT MINING 23 / 23