Play with Python: An intro to Data Science Ignacio Larrú Instituto de Empresa
Who am I? Passionate about Technology From Iphone apps to algorithmic programming I love innovative technology Former Entrepreneur: Founded several companies from online memorabilia e-commerce to structural civil engineering calculations. Investment Banker: I advise Spanish companies in M&A and IPO processes Venture Capital & Bootcamp: CFO, investment director at K Fund + Academic Director of IE Data Science Bootcamp 2
Big Data and Data Science Big Data technologies Data Science
Why is Data Science so difficult?
Overview of the Data Science process Validation!!!! Framing the Problem Solving the Problem Action!!!
Problem recognition Business comes first, think on what moves the needle Focus specific on decisions that will be made as a result of the analysis Helps everyone realize the reason for the analysis Makes identifying key stakeholders easier No decision. No analytics? Plan your objective for your problem: Investigation Exploration A/B Testing Survey Prediction Past performance ( reporting) Scope of the problem should be expansive but by the end of the problem framing you should have a clear statement of the problem
Exploratory Data Analysis Use descriptive statistics (median, mode, variance, frequency tables, correlations lines, etc ) to understand the important characteristics of a dataset Identify trends and outliers
Overview of Data processing algorithms (i) 1. Classification -> for each individual in a population, which of a set of clasess this individual belongs to. Among all the customers of ACME, which are likely to respond to a given offer? 2. Regression -> Estimate or predict, for each individual, the numerical value of some variable for that individual How much will a customer use the service?
Overview of Data processing algorithms (ii) 3. Similarity matching -> identify similar indivduals based on data known about them Other customers also bought 4. Clustering -> Group individuals in a population together by their similarity but not driven by nay specific purpose Do our customers form natural groups or segments? 5. Co occurence ->Find associations between entitites based on transactions involving them What items are commonly purchased together?
Can data visualizations hurt your analysis?
Can data visualizations hurt your analysis?
Can data visualizations hurt your analysis?
Can data visualizations hurt your analysis?
Lying with graphs Source: Hbr.com
Lying with graphs Source: Hbr.com
Python Data Science Stack
Hello World! print( hello World ) 17
Python is interpreted 18
Programming Python 19
Comments in Python # for a single line comment for a multiple line comment 20
Variables in Python 21
Variables in Python Don t need to have a pre-defined type, they get the type from the value they are pointing at Four main types: String holds text based values name = Ignacio int Integer numbers name = 10 float floating decimal numbers name = 10.4 Boolean (True or False) More variable types (lists, tuples, dictionaries to be reviewed during the course) 22
Variables in Python A variable has a name (identifier) a type a scope and a value A valid identifier is a non-empty sequence of characters with: The start character can be the underscore "_" or a capital or lower case letter The letters following the start character can be anything which is permitted as a start character plus the digits Identifiers are case-sensitive! Python keywords are not allowed as identifier names! 23
Python vs. other languages Python Statically typed languages Variable type determined at runtime Variable bound to one object and the object has only one type Varible can change type by changing the type of the object bound to the variable Bound to a type at compile type Bound to an object at runtime Need to declare the variable before using it 24
Python is a dynamically typed language 25
Python is a strongly typed language 26
With every great power Guidelines Use descriptive names (x vs. sales_amount) Be consistent (user_name or username?) Follow the traditions of the language Usually in Python variable names start with a lowercase letter and avoid starting with an underscore Keep the length in check no user_total_sales_month_report 27
Mathematical operators Source:http://www.emcu.it/ 28
Mathematical operators 29
Converting values float(x) - returns a floating-point value by converting x int(x) str(x) - returns an integer value by converting x - returns a string value by converting x bool(x) returns a boolean value 30
If else - elif 31
If else - elif 32
Logical operators 33
While Loop 34
While and if 35
For Loops 36
Range() Function range([start], stop[, step]) 37
Break, Continue and pass with else Break -> End loop Continue -> End operation Pass -> Null statement used as placeholder Else at the end of loops: For -> ended normally the loop ( no break) While -> The loop condition is false 38
Break, Continue and pass with else 39
Python simple data structures 40
Sequences 41
Strings are sequences 42
Using len() and in len() function will return the length of a sequence The in operator checks if an element is a member of a sequence If the element is a member the condition is true else it is false 43
Using len() and in 44
Programming exercise 45
Slicing Sequences 46
Programming exercise 47
Lists Mutables sequences 48
Lists Adding new items append adds at the end of the list Insert(index,value) allows you to insert at a given index 49
Lists Remove remove(value) del function 50
Lists Remove items 51
Sort() vs. Sorted() 52
Tuples Inmutable secuences that can contain elements of different types that can be mutable If the contents not need to change used tuples vs. lists Faster than lists 53
Tuples are inmutable but not its elements 54
Sets Non duplicative unordered collections 55
Sets Operations 56
Sets Math Operations 57
Sets Math Operations 58
Dictionaries 59
Dictionaries 60
Dictionaries - operations update(d) to join dictionaries (or {**x,**y}) copy() to creatr a shallow copy get( key ) returns None if element doesn t exist 61
Dictionaries - operations 62
Crossfit coding 63
Session Wrap-up 64