Ch.1 Introduction. Why Machine Learning (ML)?

Size: px
Start display at page:

Download "Ch.1 Introduction. Why Machine Learning (ML)?"

Transcription

1 Syllabus, prerequisites Ch.1 Introduction Notation: Means pencil-and-paper QUIZ Means coding QUIZ Why Machine Learning (ML)? Two problems with conventional if - else decision systems: brittleness: The logic is specific to a single task or domain. Changing the task slightly may require massive changes in the code. Example: Learning chess, go, and shogi - Google s AlphaZero. manual designing of rules requires knowing how humans do it. Two big classes of ML algorithms: supervised: Learning from examples. Easy to understand and evaluate. E.g. Google s Cat project unsupervised: No examples, the algorithm has to figure out the output. E.g. find out topics in blog posts. ML lingo: Rows = samples = data points Columns = features

2 ML is only one part of the Data Science flowchart: [Source:

3 Why Python? Installing scikit-learn It is prepackaged in the Anaconda distribution After installing Anaconda, the simplest way to update all packages we need is by updating Scikit-learn: Essential Libraries and Tools Skip Jupyter Notebook instead, we shall use the Spyder editor + ipython console combination, that comes pre-packaged in Anaconda. Review problems for Numpy: Numpy Create the following Numpy arrays:. Hint: Use np.eye( ) and np.diag( ) Change the diagonals into antidiagonals, E.g. the last one from above should be. Hint: Use np.fliplr( ) Create this block-diagonal array: Hint: Use np.hstack( ) and vstack( )

4 Matplotlib Review problems for Matplotlib: Plot the function f(x) = x 2-0.1*x 3 for values of x between -1 and 8. Use only Python lists. Hint: Creating the list of x values like this does not work! Why? Same as above, but using Numpy arrays. Which solution is easier? Plot the two plots above as subplots of the same figure. covered pp.1-10 (skip Scipy, we shall cover it next time)

5 Solutions Create the following Numpy arrays: Create this Numpy array: Plot the function f(x) = x 2-0.1*x 3 for values of x between -1 and 8. Use only Python lists.

6 Same as above, but using Numpy arrays. Plot the two plots above as subplots of the same figure.

7 Scipy - Sparse Matrices Our text briefly shows how to use the CSR and COO representations (or forms, for short) for sparse matrices. Here we explain the two in more detail. First, let us visualize the difference between sparse and dense matrices using matplotlib s spy( ) function, which plots the sparsity pattern on a 2-D array: Sparse matrix - COO (COOrdinate list) form We use for a first example the dense matrix from the Wikipedia page 1 shown here: The COO form is: 1

8 If needed, the three members of sm can be accessed individually, and even modified: Note well: In COO matrices, it is not possible to assign values directly using square brackets: According to the documentation 2, the COO format is used mainly for fast conversion to other sparse formats, like CSR (explained below), and for fast operations on the underlying data array of non-zero values. How would you add 42 to all non-zero elements of a COO matrix? It is also possible to perform more complex operations, like matrix-vector multiplications, with a COO matrix, but we are going to only do it with CSR matrices. We can find the total size of a matrix using Numpy s nbytes attribute (No, the sparse matrix object does not have its own nbytes): Create a COO representation for the identity matrix of size 10x10. Find the compression ratio. Create a COO representation for this matrix of size NxN, where N is entered by the user. Hint: Use the solution to the problem in the Numpy section above. Calculate the compression ratio. Write a loop that iterates N from 2 to 20. What do you notice about the compression ratio? 2

9 Sparse matrix - CSR (Compressed Sparse Row) form For the same example dense matrix, the CSR form is: The array indptr always has a number of elements equal to the number of rows plus one. Its first element is always zero, and its last is always equal to the number of nonzero elements; this is redundant, but it makes access easier, and, for realistic-sized matrices, the overhead is negligible 3. The entry with index i stores how many non-zero elements are in the matrix up to and including the previous row, i - 1; it is a cumulative count of nonzero elements. What is the dense matrix represented here? enough information to specify the matrix completely? Do we have Note well: Unlike the COO case, all three arrays are part of the same tuple! indices is placed before indptr! Note: The constructor csr_matrix( ) accepts other arguments as well, e.g. a dense 2D array, or any of the other sparse representations, like COO. As with all dense matrices, the three members of sm can be accessed individually, and modified: Find the size in Bytes of the CSR matrix sm above. Calculate the compression ratio. We covered pp Scipy, with a lot of material that s not in the text 3 In Big Oh notation, the overhead is O(1).

10 Solutions Find the size in Bytes of the CSR matrix above. Calculate the compression ratio.

11 Lab 1 problems: Unlike COO, in CSR, it is possible to assign values directly using square brackets:, but here is a more challenging and insightful way: Change the top-right corner to 43 by manipulating the data, indptr and indices attributes. Is this a O(1) operation? What is the complexity? Create a CSR representation for the identity matrix of size 10x10. Find the compression ratio. (a) Create a CSR representation for this matrix of size NxN, where N is entered by the user. N is any integer > 1 (2, 3,...) Hint: Use the solution to the problem in the COO section above. (b) Calculate the compression ratio. (c) Write a loop that iterates N from 2 to 20. (d) Plot with Matplotlib the compression ratios obtained above. What do you notice? Let us demonstrate a useful application of sparse matrices, matrix-vector multiplication. Enter and execute this code: We can see the improvement in the case of a large matrix: Scale up the example above to a matrix size of 30,000x30,000, and do it two ways: with a dense and with a sparse matrix. Time each multiplication and compare the results. Calculate the compression ratio of the sparse matrix when compared to the dense one. the lab report containing screenshots of the code and output to the instructor: agapie@tarleton.edu. Please write 4086 Lab 1 in the subject line.

12 Solutions: Change the top-right corner to 43 by manipulating the data, indptr and indices attributes. Is this a O(1) operation? What is the real complexity? Solution: Here are the representations before and after the change:. We can see that data and indices require an insertion, which is inefficient for Numpy arrays (but it could in principle be made O(1) with a linked list implementation). When it comes to indptr, however, the counts of nonzero elements for all subsequent rows must be changed, so this operation is O(m) in the worst case. Scale up the example above to a matrix size of 30,000x30,000, and do it two ways: with a dense and with a sparse matrix. Time each multiplication and compare the results. Also calculate the compression ratio between the matrices.

13 pandas The pandas module has two main objects: the Series and the (Data)Frame. The latter is more useful in machine learning. Among many other options, the constructor for pandas Frame accepts a dictionary: Notes: For compactness, the variable name data_pandas used in the text was changed to df. In the ipython console, the function display produces the same output as print. One advantage of pandas over Numpy is that each column can have a different data type. Even inside a column, different data types may be present. We say that Frames are heterogeneous. Replace London with the integer 42 in the example above the column Location are of different types. Write code to show that the element of The construction above, based on a dictionary is the most logical, since Frames are best thought of as dictionaries of Series; this is confirmed by the syntax for selecting a column, shown below. Multiple columns are selected by providing the column names as a list: Extract the Age column and find the average age. Another advantage of pandas is that it supports SQL-style queries:

14 The query capability is, however, severely limited, i.e. this does not work: mglearn Python 2 vs. Python 3 See a list of the major differences between P2 and P3 at: Versions Used in this Book All non-standard modules have a version attribute enclosed in between double underscores: Then why and not? PEP Module Version Numbers: [ ] modules in the standard library SHOULD NOT have version numbers. They implicitly carry the version number of the Python release they are included in. See what the Python doc considers to be the Standard Library here: Conclusion: sys.version is not the version number of the (Standard Library) sys module, but that of the entire Python release!

15

16 Formatting Two ways to print with formatting: Old way, using C-style printf formatting specifiers: New way, using the method format( ): Same output. One of the many Monthy Python references from the documentation 4 : Two capabilities of the new style that are not available in the old style: By numbering the placeholders, it is possible to change the print order without physically changing the arguments of the format method: New style placeholders accept data structures: For a complete list of options and differences, see here: 4

17 Another example with integers 5 : Print a table similar to the one above, with the square and cubic roots instead of the squares and cubes. The numbers are the integers from 0 to 500, in steps of 50. Allow for 3 decimal places. Do it using both formatting styles. The table should also have column headers. 5 Ibid.

18 Solutions: Replace London with the integer 42 in the example above the column Location are of different types. Write code to show that the element of Extract the Age column and find the average age. Print a table similar to the one above, with the square and cubic roots instead of the squares and cubes. The numbers are the integers from 0 to 500, in steps of 50. Allow for 3 decimal places. Do it using both formatting styles. The table should also have column headers.

19 A First application: Classifying Iris Species 6 The constructor load_iris( ) returns an object of class Bunch, which is essentially a dictionary whose keys can also be accesses like attributes: Here are the keys (ignoring the two extra ones from the example above): Read, enter and execute the code on pp (In[11] to In[19]) that explains the five keys/attributes outlined above. Conclusion: The inportant attributes are data and target. Explain them in your own words: data --> target --> 6

20 The two parts of any ML algorithm: training and testing Two dangers: underfitting and overfitting. Split the dataset into two parts: train set and test set. Try the code above with different ratios of train, e.g. 0.8, which is the rule or the Pareto principle. Note: A third part, cross-validation will be introduced in Ch.5.

21 Visualizing the data For 2 (or 3) features, one 2D (or 3D) plot is sufficient. For higher numbers of features, we can plot each pair. pandas scatter_matrix is very handy: What do you notice in the matrix above? Explain!

22 K-Nearest-Neighbors (KNN or knn) Explaining KNN on a simple example (not in text): Training or fitting the classifier Normally, we don t print the classifier options: Making predictions

23 We have a second flower, with the four features, respectively, 4, 1.9, 3, 0.7. Use the 2D array to predict both in one pass. Evaluating the model Note: fit( ), predict( ), and score( ) are common to all supervised algorithms in scikit-learn. Homework for Ch.1 is assigned, due next Wednesday!

24 Solutions We have a second flower, with the four features, respectively, 4, 1.9, 3, 0.7. Use the 2D array to predict both in one pass.

Ch.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it.

Ch.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it. Ch.1 Introduction Syllabus, prerequisites Notation: Means pencil-and-paper QUIZ Means coding QUIZ Code respository for our text: https://github.com/amueller/introduction_to_ml_with_python Why Machine Learning

More information

Python With Data Science

Python With Data Science Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,

More information

ARTIFICIAL INTELLIGENCE AND PYTHON

ARTIFICIAL INTELLIGENCE AND PYTHON ARTIFICIAL INTELLIGENCE AND PYTHON DAY 1 STANLEY LIANG, LASSONDE SCHOOL OF ENGINEERING, YORK UNIVERSITY WHAT IS PYTHON An interpreted high-level programming language for general-purpose programming. Python

More information

Certified Data Science with Python Professional VS-1442

Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional VS-1442 Certified Data Science with Python Professional Certified Data Science with Python Professional Certification Code VS-1442 Data science has become

More information

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016 HANDS ON DATA MINING By Amit Somech Workshop in Data-science, March 2016 AGENDA Before you start TextEditors Some Excel Recap Setting up Python environment PIP ipython Scientific computation in Python

More information

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core) Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data

More information

Data Science with Python Course Catalog

Data Science with Python Course Catalog Enhance Your Contribution to the Business, Earn Industry-recognized Accreditations, and Develop Skills that Help You Advance in Your Career March 2018 www.iotintercon.com Table of Contents Syllabus Overview

More information

Introducing Categorical Data/Variables (pp )

Introducing Categorical Data/Variables (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Definition: Feature Engineering (FE) = the process of transforming the data to an optimal representation for a given application. Scaling (see Chs.

More information

Python Certification Training

Python Certification Training Introduction To Python Python Certification Training Goal : Give brief idea of what Python is and touch on basics. Define Python Know why Python is popular Setup Python environment Discuss flow control

More information

CME 193: Introduction to Scientific Python Lecture 1: Introduction

CME 193: Introduction to Scientific Python Lecture 1: Introduction CME 193: Introduction to Scientific Python Lecture 1: Introduction Nolan Skochdopole stanford.edu/class/cme193 1: Introduction 1-1 Contents Administration Introduction Basics Variables Control statements

More information

Scientific Computing using Python

Scientific Computing using Python Scientific Computing using Python Swaprava Nath Dept. of CSE IIT Kanpur mini-course webpage: https://swaprava.wordpress.com/a-short-course-on-python/ Disclaimer: the contents of this lecture series are

More information

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

Webgurukul Programming Language Course

Webgurukul Programming Language Course Webgurukul Programming Language Course Take One step towards IT profession with us Python Syllabus Python Training Overview > What are the Python Course Pre-requisites > Objectives of the Course > Who

More information

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course: DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business

More information

k-means Clustering (pp )

k-means Clustering (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ k-means Clustering (pp. 170-183) Explaining the intialization and iterations of k-means clustering algorithm: Let us understand the mechanics of

More information

Intermediate/Advanced Python. Michael Weinstein (Day 2)

Intermediate/Advanced Python. Michael Weinstein (Day 2) Intermediate/Advanced Python Michael Weinstein (Day 2) Topics Review of basic data structures Accessing and working with objects in python Numpy How python actually stores data in memory Why numpy can

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming Machine Learning in Python Robert Rand University of Pennsylvania October 22, 2015 Robert Rand (University of Pennsylvania) CIS 192 October 22, 2015 1 / 18 Outline 1 Machine Learning

More information

Numerical Methods. Centre for Mathematical Sciences Lund University. Spring 2015

Numerical Methods. Centre for Mathematical Sciences Lund University. Spring 2015 Numerical Methods Claus Führer Alexandros Sopasakis Centre for Mathematical Sciences Lund University Spring 2015 Preface These notes serve as a skeleton for the course. They document together with the

More information

Data Acquisition and Processing

Data Acquisition and Processing Data Acquisition and Processing Adisak Sukul, Ph.D., Lecturer,, adisak@iastate.edu http://web.cs.iastate.edu/~adisak/bigdata/ Topics http://web.cs.iastate.edu/~adisak/bigdata/ Data Acquisition Data Processing

More information

Neural Networks (pp )

Neural Networks (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.

More information

Episode 8 Matplotlib, SciPy, and Pandas. We will start with Matplotlib. The following code makes a sample plot.

Episode 8 Matplotlib, SciPy, and Pandas. We will start with Matplotlib. The following code makes a sample plot. Episode 8 Matplotlib, SciPy, and Pandas Now that we understand ndarrays, we can start using other packages that utilize them. In particular, we're going to look at Matplotlib, SciPy, and Pandas. Matplotlib

More information

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225 Index A Anonymous function, 66 Apache Hadoop, 1 Apache HBase, 42 44 Apache Hive, 6 7, 230 Apache Kafka, 8, 178 Apache License, 7 Apache Mahout, 5 Apache Mesos, 38 42 Apache Pig, 7 Apache Spark, 9 Apache

More information

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

SQL Server Machine Learning Marek Chmel & Vladimir Muzny SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning

More information

Python Training. Complete Practical & Real-time Trainings. A Unit of SequelGate Innovative Technologies Pvt. Ltd.

Python Training. Complete Practical & Real-time Trainings. A Unit of SequelGate Innovative Technologies Pvt. Ltd. Python Training Complete Practical & Real-time Trainings A Unit of. ISO Certified Training Institute Microsoft Certified Partner Training Highlights : Complete Practical and Real-time Scenarios Session

More information

CPSC 67 Lab #5: Clustering Due Thursday, March 19 (8:00 a.m.)

CPSC 67 Lab #5: Clustering Due Thursday, March 19 (8:00 a.m.) CPSC 67 Lab #5: Clustering Due Thursday, March 19 (8:00 a.m.) The goal of this lab is to use hierarchical clustering to group artists together. Once the artists have been clustered, you will calculate

More information

1 Document Classification [60 points]

1 Document Classification [60 points] CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text

More information

ECE 202 LAB 3 ADVANCED MATLAB

ECE 202 LAB 3 ADVANCED MATLAB Version 1.2 1 of 13 BEFORE YOU BEGIN PREREQUISITE LABS ECE 201 Labs EXPECTED KNOWLEDGE ECE 202 LAB 3 ADVANCED MATLAB Understanding of the Laplace transform and transfer functions EQUIPMENT Intel PC with

More information

Introduction to Python Part 2

Introduction to Python Part 2 Introduction to Python Part 2 v0.2 Brian Gregor Research Computing Services Information Services & Technology Tutorial Outline Part 2 Functions Tuples and dictionaries Modules numpy and matplotlib modules

More information

PYTHON FOR MEDICAL PHYSICISTS. Radiation Oncology Medical Physics Cancer Care Services, Royal Brisbane & Women s Hospital

PYTHON FOR MEDICAL PHYSICISTS. Radiation Oncology Medical Physics Cancer Care Services, Royal Brisbane & Women s Hospital PYTHON FOR MEDICAL PHYSICISTS Radiation Oncology Medical Physics Cancer Care Services, Royal Brisbane & Women s Hospital TUTORIAL 1: INTRODUCTION Thursday 1 st October, 2015 AGENDA 1. Reference list 2.

More information

ENGG1811 Computing for Engineers Week 1 Introduction to Programming and Python

ENGG1811 Computing for Engineers Week 1 Introduction to Programming and Python ENGG1811 Computing for Engineers Week 1 Introduction to Programming and Python ENGG1811 UNSW, CRICOS Provider No: 00098G W4 Computers have changed engineering http://www.noendexport.com/en/contents/48/410.html

More information

Course May 18, Advanced Computational Physics. Course Hartmut Ruhl, LMU, Munich. People involved. SP in Python: 3 basic points

Course May 18, Advanced Computational Physics. Course Hartmut Ruhl, LMU, Munich. People involved. SP in Python: 3 basic points May 18, 2017 3 I/O 3 I/O 3 I/O 3 ASC, room A 238, phone 089-21804210, email hartmut.ruhl@lmu.de Patrick Böhl, ASC, room A205, phone 089-21804640, email patrick.boehl@physik.uni-muenchen.de. I/O Scientific

More information

A brief introduction to coding in Python with Anatella

A brief introduction to coding in Python with Anatella A brief introduction to coding in Python with Anatella Before using the Python engine within Anatella, you must first: 1. Install & download a Python engine that support the Pandas Data Frame library.

More information

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Command Line and Python Introduction Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Today Assignment #1! Computer architecture Basic command line skills Python fundamentals

More information

Short Introduction to Python Machine Learning Course Laboratory

Short Introduction to Python Machine Learning Course Laboratory Pattern Recognition and Applications Lab Short Introduction to Python Machine Learning Course Laboratory Battista Biggio battista.biggio@diee.unica.it Luca Didaci didaci@diee.unica.it Dept. Of Electrical

More information

LECTURE 7: STUDENT REQUESTED TOPICS

LECTURE 7: STUDENT REQUESTED TOPICS 1 LECTURE 7: STUDENT REQUESTED TOPICS Introduction to Scientific Python, CME 193 Feb. 20, 2014 Please download today s exercises from: web.stanford.edu/~ermartin/teaching/cme193-winter15 Eileen Martin

More information

Data Analytics Training Program using

Data Analytics Training Program using Data Analytics Training Program using In exclusive association with 1200+ Trainings 20,000+ Participants 10,000+ Brands 45+ Countries [Since 2009] Training partner for Who Is This Course For? Programers

More information

Python Advance Course via Astronomy street. Sérgio Sousa (CAUP) ExoEarths Team (http://www.astro.up.pt/exoearths/)

Python Advance Course via Astronomy street. Sérgio Sousa (CAUP) ExoEarths Team (http://www.astro.up.pt/exoearths/) Python Advance Course via Astronomy street Sérgio Sousa (CAUP) ExoEarths Team (http://www.astro.up.pt/exoearths/) Advance Course Outline: Python Advance Course via Astronomy street Lesson 1: Python basics

More information

David J. Pine. Introduction to Python for Science & Engineering

David J. Pine. Introduction to Python for Science & Engineering David J. Pine Introduction to Python for Science & Engineering To Alex Pine who introduced me to Python Contents Preface About the Author xi xv 1 Introduction 1 1.1 Introduction to Python for Science and

More information

SCIENCE. An Introduction to Python Brief History Why Python Where to use

SCIENCE. An Introduction to Python Brief History Why Python Where to use DATA SCIENCE Python is a general-purpose interpreted, interactive, object-oriented and high-level programming language. Currently Python is the most popular Language in IT. Python adopted as a language

More information

Introduction to Python Part 1. Brian Gregor Research Computing Services Information Services & Technology

Introduction to Python Part 1. Brian Gregor Research Computing Services Information Services & Technology Introduction to Python Part 1 Brian Gregor Research Computing Services Information Services & Technology RCS Team and Expertise Our Team Scientific Programmers Systems Administrators Graphics/Visualization

More information

Homework 01 : Deep learning Tutorial

Homework 01 : Deep learning Tutorial Homework 01 : Deep learning Tutorial Introduction to TensorFlow and MLP 1. Introduction You are going to install TensorFlow as a tutorial of deep learning implementation. This instruction will provide

More information

SQL Server 2017: Data Science with Python or R?

SQL Server 2017: Data Science with Python or R? SQL Server 2017: Data Science with Python or R? Dejan Sarka Sponsor Introduction Dejan Sarka (dsarka@solidq.com, dsarka@siol.net, @DejanSarka) 30 years of experience SQL Server MVP, MCT, 16 books 20+ courses,

More information

Introduction to Computer Vision Laboratories

Introduction to Computer Vision Laboratories Introduction to Computer Vision Laboratories Antonino Furnari furnari@dmi.unict.it www.dmi.unict.it/~furnari/ Computer Vision Laboratories Format: practical session + questions and homeworks. Material

More information

Search. The Nearest Neighbor Problem

Search. The Nearest Neighbor Problem 3 Nearest Neighbor Search Lab Objective: The nearest neighbor problem is an optimization problem that arises in applications such as computer vision, pattern recognition, internet marketing, and data compression.

More information

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer i About the Tutorial Project is a comprehensive software suite for interactive computing, that includes various packages such as Notebook, QtConsole, nbviewer, Lab. This tutorial gives you an exhaustive

More information

Week Two. Arrays, packages, and writing programs

Week Two. Arrays, packages, and writing programs Week Two Arrays, packages, and writing programs Review UNIX is the OS/environment in which we work We store files in directories, and we can use commands in the terminal to navigate around, make and delete

More information

Python for Analytics. Python Fundamentals RSI Chapters 1 and 2

Python for Analytics. Python Fundamentals RSI Chapters 1 and 2 Python for Analytics Python Fundamentals RSI Chapters 1 and 2 Learning Objectives Theory: You should be able to explain... General programming terms like source code, interpreter, compiler, object code,

More information

Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

More information

ENGR 102 Engineering Lab I - Computation

ENGR 102 Engineering Lab I - Computation ENGR 102 Engineering Lab I - Computation Learning Objectives by Week 1 ENGR 102 Engineering Lab I Computation 2 Credits 2. Introduction to the design and development of computer applications for engineers;

More information

: Intro Programming for Scientists and Engineers Final Exam

: Intro Programming for Scientists and Engineers Final Exam Final Exam Page 1 of 6 600.112: Intro Programming for Scientists and Engineers Final Exam Peter H. Fröhlich phf@cs.jhu.edu December 20, 2012 Time: 40 Minutes Start here: Please fill in the following important

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 2B Transforming Data with Scripts By Graeme Malcolm and Stephen Elston Overview In this lab, you will learn how to use Python or R to manipulate and analyze

More information

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.

More information

MATH 1MP3 Homework #4 Due: 11:59pm, Wednesday, March 6.

MATH 1MP3 Homework #4 Due: 11:59pm, Wednesday, March 6. MATH 1MP3 Homework #4 Due: 11:59pm, Wednesday, March 6. Important notes: To start the assignment, download the Jupyter notebook file assignment 4 template.ipynb found here: https://ms.mcmaster.ca/~matt/1mp3/homework/assignment_4_template.

More information

Scientific computing platforms at PGI / JCNS

Scientific computing platforms at PGI / JCNS Member of the Helmholtz Association Scientific computing platforms at PGI / JCNS PGI-1 / IAS-1 Scientific Visualization Workshop Josef Heinen Outline Introduction Python distributions The SciPy stack Julia

More information

pandas: Rich Data Analysis Tools for Quant Finance

pandas: Rich Data Analysis Tools for Quant Finance pandas: Rich Data Analysis Tools for Quant Finance Wes McKinney April 24, 2012, QWAFAFEW Boston about me MIT 07 AQR Capital: 2007-2010 Global Macro and Credit Research WES MCKINNEY pandas: 2008 - Present

More information

Programming with Python

Programming with Python Stefan Güttel Programming with Python Getting started for Programming with Python A little bit of terminology Python A programming language, the language you write computer programs in. IPython A Python

More information

PYTHON NUMPY TUTORIAL CIS 581

PYTHON NUMPY TUTORIAL CIS 581 PYTHON NUMPY TUTORIAL CIS 581 VARIABLES AND SPYDER WORKSPACE Spyder is a Python IDE that s a part of the Anaconda distribution. Spyder has a Python console useful to run commands quickly and variables

More information

Spreadsheet View and Basic Statistics Concepts

Spreadsheet View and Basic Statistics Concepts Spreadsheet View and Basic Statistics Concepts GeoGebra 3.2 Workshop Handout 9 Judith and Markus Hohenwarter www.geogebra.org Table of Contents 1. Introduction to GeoGebra s Spreadsheet View 2 2. Record

More information

Converting categorical data into numbers with Pandas and Scikit-learn -...

Converting categorical data into numbers with Pandas and Scikit-learn -... 1 of 6 11/17/2016 11:02 AM FastML Machine learning made easy RSS Home Contents Popular Links Backgrounds About Converting categorical data into numbers with Pandas and Scikit-learn 2014-04-30 Many machine

More information

File Input/Output in Python. October 9, 2017

File Input/Output in Python. October 9, 2017 File Input/Output in Python October 9, 2017 Moving beyond simple analysis Use real data Most of you will have datasets that you want to do some analysis with (from simple statistics on few hundred sample

More information

EPL451: Data Mining on the Web Lab 5

EPL451: Data Mining on the Web Lab 5 EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available

More information

Webinar Series. Introduction To Python For Data Analysis March 19, With Interactive Brokers

Webinar Series. Introduction To Python For Data Analysis March 19, With Interactive Brokers Learning Bytes By Byte Academy Webinar Series Introduction To Python For Data Analysis March 19, 2019 With Interactive Brokers Introduction to Byte Academy Industry focused coding school headquartered

More information

Data Science Bootcamp Curriculum. NYC Data Science Academy

Data Science Bootcamp Curriculum. NYC Data Science Academy Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations

More information

Automation.

Automation. Automation www.austech.edu.au WHAT IS AUTOMATION? Automation testing is a technique uses an application to implement entire life cycle of the software in less time and provides efficiency and effectiveness

More information

Frameworks in Python for Numeric Computation / ML

Frameworks in Python for Numeric Computation / ML Frameworks in Python for Numeric Computation / ML Why use a framework? Why not use the built-in data structures? Why not write our own matrix multiplication function? Frameworks are needed not only because

More information

Introduction to Python. Didzis Gosko

Introduction to Python. Didzis Gosko Introduction to Python Didzis Gosko Scripting language From Wikipedia: A scripting language or script language is a programming language that supports scripts, programs written for a special run-time environment

More information

Scientific Python. 1 of 10 23/11/ :00

Scientific Python.   1 of 10 23/11/ :00 Scientific Python Neelofer Banglawala Kevin Stratford nbanglaw@epcc.ed.ac.uk kevin@epcc.ed.ac.uk Original course authors: Andy Turner Arno Proeme 1 of 10 23/11/2015 00:00 www.archer.ac.uk support@archer.ac.uk

More information

MS6021 Scientific Computing. TOPICS: Python BASICS, INTRO to PYTHON for Scientific Computing

MS6021 Scientific Computing. TOPICS: Python BASICS, INTRO to PYTHON for Scientific Computing MS6021 Scientific Computing TOPICS: Python BASICS, INTRO to PYTHON for Scientific Computing Preliminary Notes on Python (v MatLab + other languages) When you enter Spyder (available on installing Anaconda),

More information

CS1 Lecture 2 Jan. 16, 2019

CS1 Lecture 2 Jan. 16, 2019 CS1 Lecture 2 Jan. 16, 2019 Contacting me/tas by email You may send questions/comments to me/tas by email. For discussion section issues, sent to TA and me For homework or other issues send to me (your

More information

Ch.5. Loops. (a.k.a. repetition or iteration)

Ch.5. Loops. (a.k.a. repetition or iteration) Ch.5 Loops (a.k.a. repetition or iteration) 5.1 The FOR loop End of for loop End of function 5.1 The FOR loop What is the answer for 100? QUIZ Modify the code to calculate the factorial of N: N! Modify

More information

1. BASICS OF PYTHON. JHU Physics & Astronomy Python Workshop Lecturer: Mubdi Rahman

1. BASICS OF PYTHON. JHU Physics & Astronomy Python Workshop Lecturer: Mubdi Rahman 1. BASICS OF PYTHON JHU Physics & Astronomy Python Workshop 2017 Lecturer: Mubdi Rahman HOW IS THIS WORKSHOP GOING TO WORK? We will be going over all the basics you need to get started and get productive

More information

A. Python Crash Course

A. Python Crash Course A. Python Crash Course Agenda A.1 Installing Python & Co A.2 Basics A.3 Data Types A.4 Conditions A.5 Loops A.6 Functions A.7 I/O A.8 OLS with Python 2 A.1 Installing Python & Co You can download and install

More information

Copyright 2006 Melanie Butler Chapter 1: Review. Chapter 1: Review

Copyright 2006 Melanie Butler Chapter 1: Review. Chapter 1: Review QUIZ AND TEST INFORMATION: The material in this chapter is on Quiz 1 and Exam 1. You should complete at least one attempt of Quiz 1 before taking Exam 1. This material is also on the final exam. TEXT INFORMATION:

More information

KNIME Python Integration Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on )

KNIME Python Integration Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on ) KNIME Python Integration Installation Guide KNIME AG, Zurich, Switzerland Version 3.7 (last updated on 2019-02-05) Table of Contents Introduction.....................................................................

More information

About Intellipaat. About the Course. Why Take This Course?

About Intellipaat. About the Course. Why Take This Course? About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 700,000 in over

More information

STATS 507 Data Analysis in Python. Lecture 0: Introduction and Administrivia

STATS 507 Data Analysis in Python. Lecture 0: Introduction and Administrivia STATS 507 Data Analysis in Python Lecture 0: Introduction and Administrivia Data science has completely changed our world Course goals Establish a broad background in Python programming Prepare you for

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Overview. Linear Algebra Notation. MATLAB Data Types Data Visualization. Probability Review Exercises. Asymptotics (Big-O) Review

Overview. Linear Algebra Notation. MATLAB Data Types Data Visualization. Probability Review Exercises. Asymptotics (Big-O) Review Tutorial 1 1 / 21 Overview Linear Algebra Notation Data Types Data Visualization Probability Review Exercises Asymptotics (Big-O) Review 2 / 21 Linear Algebra Notation Notation and Convention 3 / 21 Linear

More information

Image Compression With Haar Discrete Wavelet Transform

Image Compression With Haar Discrete Wavelet Transform Image Compression With Haar Discrete Wavelet Transform Cory Cox ME 535: Computational Techniques in Mech. Eng. Figure 1 : An example of the 2D discrete wavelet transform that is used in JPEG2000. Source:

More information

Science One CS : Getting Started

Science One CS : Getting Started Science One CS 2018-2019: Getting Started Note: if you are having trouble with any of the steps here, do not panic! Ask on Piazza! We will resolve them this Friday when we meet from 10am-noon. You can

More information

Rapid growth of massive datasets

Rapid growth of massive datasets Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,

More information

Data Analytics Training Program

Data Analytics Training Program Data Analytics Training Program In exclusive association with 1200+ Trainings 20,000+ Participants 10,000+ Brands 45+ Countries [Since 2009] Training partner for Who Is This Course For? Programers Willing

More information

Homework 2: HMM, Viterbi, CRF/Perceptron

Homework 2: HMM, Viterbi, CRF/Perceptron Homework 2: HMM, Viterbi, CRF/Perceptron CS 585, UMass Amherst, Fall 2015 Version: Oct5 Overview Due Tuesday, Oct 13 at midnight. Get starter code from the course website s schedule page. You should submit

More information

Scientific Computing: Lecture 1

Scientific Computing: Lecture 1 Scientific Computing: Lecture 1 Introduction to course, syllabus, software Getting started Enthought Canopy, TextWrangler editor, python environment, ipython, unix shell Data structures in Python Integers,

More information

Semester 2, 2018: Lab 1

Semester 2, 2018: Lab 1 Semester 2, 2018: Lab 1 S2 2018 Lab 1 This lab has two parts. Part A is intended to help you familiarise yourself with the computing environment found on the CSIT lab computers which you will be using

More information

The Python interpreter

The Python interpreter The Python interpreter Daniel Winklehner, Remi Lehe US Particle Accelerator School (USPAS) Summer Session Self-Consistent Simulations of Beam and Plasma Systems S. M. Lund, J.-L. Vay, D. Bruhwiler, R.

More information

AMath 483/583 Lecture 2. Notes: Notes: Homework #1. Class Virtual Machine. Notes: Outline:

AMath 483/583 Lecture 2. Notes: Notes: Homework #1. Class Virtual Machine. Notes: Outline: AMath 483/583 Lecture 2 Outline: Binary storage, floating point numbers Version control main ideas Client-server version control, e.g., CVS, Subversion Distributed version control, e.g., git, Mercurial

More information

15-388/688 - Practical Data Science: Relational Data. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Relational Data. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Relational Data J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Piazza etiquette: Changing organization of threads to be easier to search (starting

More information

AMath 483/583 Lecture 2

AMath 483/583 Lecture 2 AMath 483/583 Lecture 2 Outline: Binary storage, floating point numbers Version control main ideas Client-server version control, e.g., CVS, Subversion Distributed version control, e.g., git, Mercurial

More information

Introduction to Scientific Python, CME 193 Jan. 9, web.stanford.edu/~ermartin/teaching/cme193-winter15

Introduction to Scientific Python, CME 193 Jan. 9, web.stanford.edu/~ermartin/teaching/cme193-winter15 1 LECTURE 1: INTRO Introduction to Scientific Python, CME 193 Jan. 9, 2014 web.stanford.edu/~ermartin/teaching/cme193-winter15 Eileen Martin Some slides are from Sven Schmit s Fall 14 slides 2 Course Details

More information

Lecture 8: Grid Search and Model Validation Continued

Lecture 8: Grid Search and Model Validation Continued Lecture 8: Grid Search and Model Validation Continued Mat Kallada STAT2450 - Introduction to Data Mining with R Outline for Today Model Validation Grid Search Some Preliminary Notes Thank you for submitting

More information

S206E Lecture 19, 5/24/2016, Python an overview

S206E Lecture 19, 5/24/2016, Python an overview S206E057 Spring 2016 Copyright 2016, Chiu-Shui Chan. All Rights Reserved. Global and local variables: differences between the two Global variable is usually declared at the start of the program, their

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3C Evaluating Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to evaluate and improve the performance of

More information

QUIZ Friends class Y;

QUIZ Friends class Y; QUIZ Friends class Y; Is a forward declaration neeed here? QUIZ Friends QUIZ Friends - CONCLUSION Forward (a.k.a. incomplete) declarations are needed only when we declare member functions as friends. They

More information

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods Kunal Sharma, Nov 26 th 2018 Dr. Lewe, Dr. Duncan Areospace Design Lab Georgia Institute of Technology Objective

More information

Connecting ArcGIS with R and Conda. Shaun Walbridge

Connecting ArcGIS with R and Conda. Shaun Walbridge Connecting ArcGIS with R and Conda Shaun Walbridge https://github.com/sc w/nyc-r-ws High Quality PDF ArcGIS Today: R and Conda Conda Introduction Optional demo R and the R-ArcGIS Bridge Introduction Demo

More information

COSC 490 Computational Topology

COSC 490 Computational Topology COSC 490 Computational Topology Dr. Joe Anderson Fall 2018 Salisbury University Course Structure Weeks 1-2: Python and Basic Data Processing Python commonly used in industry & academia Weeks 3-6: Group

More information

Go to the Python download page:

Go to the Python download page: Prof. Dr.-Ing. Jürgen Brauer Page 1 / 6 Exercise: First steps with Python Go to the Python download page: https://www.python.org/downloads/ You will see that there are two different versions of Python

More information

ESRI User Conference 2016 WITH

ESRI User Conference 2016 WITH ESRI User Conference 2016 WITH INTRODUCTION Currently Planning Analyst Helena-Lewis and Clark National Forest GIS Program Manager Bighorn National Forest Systems Analyst Georgia-Pacific and Olympic Resource

More information

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming Intro to Programming Unit 7 Intro to Programming 1 What is Programming? 1. Programming Languages 2. Markup vs. Programming 1. Introduction 2. Print Statement 3. Strings 4. Types and Values 5. Math Externals

More information