Derek Bridge School of Computer Science and Information Technology University College Cork. from sklearn.preprocessing import add_dummy_feature
|
|
- Walter Bailey
- 6 years ago
- Views:
Transcription
1 CS4618: Artificial Intelligence I Gradient Descent Derek Bridge School of Computer Science and Information Technology University College Cork Initialization In [1]: %load_ext autoreload %autoreload 2 %matplotlib inline In [2]: import pandas as pd import numpy as np import matplotlib.pyplot as plt In [45]: from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import add_dummy_feature from sklearn.linear_model import SGDRegressor Gradient Descent for OLS Regression We saw the basic idea now, the details In fact, three variants: Batch Gradient Descent Stochastic Gradient Descent Mini-batch Gradient Descent 1 of 10
2 Partial Derivatives We need the gradient of the loss function with regards to each In other words, how much the loss will change if we change a little With respect to a particular, it is called the partial derivative Without doing the calculus, the partial derivatives of X y β with respect to β j are m The gradient vector, β J(X, X y, β) And there is a vectorized way to compute it: β j J(X, y, β) J(X, X y, β) β j β j 1 = ( x ) β j m (i) β y (i) x (i) j i=1, is a vector of each partial derivative: J(X,y,β) X y β β 0 J(X,y,β) X y β β 1 β β J(X, X y, β) = J(X,y,β) X y β β n 1 J(X, X y, β) = (Xβ y) m X T y Gradient Descent, Again Recap: It starts with an initial guess for the values of the parameters Then repeatedly: It updates the parameter values hopefully to reduce the loss But now we know how to update the parameter values to reduce the loss: Compute the gradient vector But this points 'uphill' and we want to go 'downhill' Or And we want to make 'baby steps', so we use the learning rate,, which is between 0 and 1 So subtract the times the gradient vector from β β β β J(X, X y, β) β β (Xβ y) m X T y (BTW, this is vectorized. Naive loop implementations are wrong: they lose the simultaneous update of the β j ) Batch Gradient Descent Pseudocode: initialize β randomly repeat until convergence β β m X T (Xβ y) Why is it called Batch Gradient Descent? The update involves a calculation over the entire training set This can be slow for large training sets X on every iteration 2 of 10
3 Batch Gradient Descent in numpy For the hell of it, let's implement it ourselves (We'll be naughty: we'll train on the whole dataset) In [4]: # Loss function for OLS regression (assumes X contains all 1s in its fir st column) def J(X, y, beta): return np.mean((x.dot(beta) - y) ** 2) / 2.0 In [49]: def batch_gradient_descent_for_ols_linear_regression(x, y, alpha, num_it erations): m, n = X.shape beta = np.random.randn(n) Jvals = np.zeros(num_iterations) for iter in range(num_iterations): beta -= (1.0 * alpha / m) * X.T.dot(X.dot(beta) - y) Jvals[iter] = J(X, y, beta) return beta, Jvals In [50]: # Use pandas to read the CSV file df = pd.read_csv("datasets/dataset_corka.csv") # Get the feature-values and the target values X_without_dummy_unscaled = df[["flarea", "bdrms", "bthrms"]].values y = df["price"].values # Scale it scaler = StandardScaler() X_without_dummy = scaler.fit_transform(x_without_dummy_unscaled) # Add the extra column to X X = add_dummy_feature(x_without_dummy) # Run the Batch Gradient Descent beta, Jvals = batch_gradient_descent_for_ols_linear_regression(x, y, alp ha = 0.03, num_iterations = 1000) # Display beta beta Out[50]: array([ , , , ]) Bear in mind that the coefficients it finds are on the scaled data It's a good idea to plot the values of the loss function against the number of iterations If its value ever increases, then the code might be incorrect (I think it's OK!) the value of is too big and is causing divergence 3 of 10
4 In [51]: fig = plt.figure(figsize=(8,6)) plt.title("$j$ during learning") plt.xlabel("number of iterations") plt.xlim(1, Jvals.size) plt.ylabel("$j$") plt.ylim(3500, 50000) xvals = np.linspace(1, Jvals.size, Jvals.size) plt.scatter(xvals, Jvals) plt.show() The algorithm gives us the problem of choosing the number of iterations An alternative is to use a very large number of iterations but exit when the gradient vector becomes tiny: when its norm becomes smaller than tolerance, η Try it without scaling: In [52]: # Get the feature-values and the target values X_without_dummy = df[["flarea", "bdrms", "bthrms"]].values y = df["price"].values # Add the extra column to X X = add_dummy_feature(x_without_dummy) # Run the Batch Gradient Descent beta, Jvals = batch_gradient_descent_for_ols_linear_regression(x, y, alp ha = 0.03, num_iterations = 4000) # Display beta beta C:\Anaconda3\lib\site-packages\ipykernel\ main.py:3: RuntimeWarning: o verflow encountered in square app.launch_new_instance() C:\Anaconda3\lib\site-packages\ipykernel\ main.py:8: RuntimeWarning: i nvalid value encountered in subtract Out[52]: array([ nan, nan, nan, nan]) 4 of 10
5 How can you get it to work? Some people suggest a variant of Batch Gradient Descent in which the value of value in later iterations is smaller Why do they suggest this? And why isn't it necessary? (But, we'll revisit this idea in Stochastic Gradient Descent) is decreased over time, i.e. its Stochastic Gradient Descent As we saw, Batch Gradient Descent can be slow on large training sets Stochastic Gradient Descent (SGD): On each iteration, it picks just one training example one example at random and computes the gradients on just that This gives huge speed-up It enables us to train on huge training sets since only one example needs to be in memory in each iteration But, because it is stochastic (the randomness), the loss will not necessarily decrease on each iteration: On average, the loss decreases, but in any one iteration, loss may go up or down Eventually, it will get close to the minimum, but it will continue to go up and down a bit So, once you stop it, the β will be close to the best, but not necessarily optimal Ironically, if you have a local minimum (which, with OLS regression, we don't), SGD might even escape the local minimum, and might even get to the global minimum x β β x T (xβ y) Simulated Annealing As we discussed, SGD does not settle at the minimum One solution is to gradually reduce the learning rate Updates start out 'large' so you make progress and can escape local minima But, over time, updates get smaller, allowing SGD to settle at the global minimum The function that determines how to reduce the learning rate is called the learning schedule Reduce it too quickly and you may get stuck in a local minimum or en route to the global minimum Reduce it too slowly and you may bounce around a lot and, if stopped after too few iterations, may end up with a suboptimal solution SGD in scikit-learn The fit method of scikit-learn's SGDRegressor class is doing what we have described: You must scale the features but it inserts the extra column of 1s You can supply a learning_rate and lots of other things (in the code below, we'll just use the defaults) (In the code below, we'll be naughty: we'll train on the whole dataset) 5 of 10
6 In [53]: # Use pandas to read the CSV file df = pd.read_csv("datasets/dataset_corka.csv") # Get the feature-values and the target values X_unscaled = df[["flarea", "bdrms", "bthrms"]].values y = df["price"].values # Scale it scaler = StandardScaler() X = scaler.fit_transform(x_unscaled) # Create the SGDRegressor and fit the model sgd = SGDRegressor() sgd.fit(x, y) Out[53]: SGDRegressor(alpha=0.0001, average=false, epsilon=0.1, eta0=0.01, fit_intercept=true, l1_ratio=0.15, learning_rate='invscaling', loss='squared_loss', n_iter=5, penalty='l2', power_t=0.25, random_state=none, shuffle=true, verbose=0, warm_start=false) SGD in numpy For the hell of it, let's implement a simple version ourselves (Again, we'll be naughty: we'll train on the whole dataset) In [56]: def stochastic_gradient_descent_for_ols_linear_regression(x, y, alpha, n um_epochs): m, n = X.shape beta = np.random.randn(n) Jvals = np.zeros(num_epochs * m) for epoch in range(num_epochs): for i in range(m): rand_idx = np.random.randint(m) xi = X[rand_idx:rand_idx + 1] yi = y[rand_idx:rand_idx + 1] beta -= alpha * xi.t.dot(xi.dot(beta) - yi) Jvals[epoch * m + i] = J(X, y, beta) return beta, Jvals 6 of 10
7 In [57]: # Use pandas to read the CSV file df = pd.read_csv("datasets/dataset_corka.csv") # Get the feature-values and the target values X_without_dummy_unscaled = df[["flarea", "bdrms", "bthrms"]].values y = df["price"].values # Scale it scaler = StandardScaler() X_without_dummy = scaler.fit_transform(x_without_dummy_unscaled) # Add the extra column to X X = add_dummy_feature(x_without_dummy) # Run the Stochastic Gradient Descent beta, Jvals = stochastic_gradient_descent_for_ols_linear_regression(x, y, alpha = 0.03, num_epochs = 50) # Display beta beta Out[57]: array([ , , , ]) In [58]: fig = plt.figure(figsize=(8,6)) plt.title("$j$ during learning") plt.xlabel("number of iterations") plt.xlim(1, Jvals.size) plt.ylabel("$j$") plt.ylim(3500, 50000) xvals = np.linspace(1, Jvals.size, Jvals.size) plt.scatter(xvals, Jvals) plt.show() Quite a bumpy ride! So, let's try simuated annealingl 7 of 10
8 In [59]: def learning_schedule(t): return 5 / (t + 50) def stochastic_gradient_descent_for_ols_linear_regression(x, y, num_epoc hs): m, n = X.shape beta = np.random.randn(n) Jvals = np.zeros(num_epochs * m) for epoch in range(num_epochs): for i in range(m): rand_idx = np.random.randint(m) xi = X[rand_idx:rand_idx + 1] yi = y[rand_idx:rand_idx + 1] alpha = learning_schedule(epoch * m + i) beta -= alpha * xi.t.dot(xi.dot(beta) - yi) Jvals[epoch * m + i] = J(X, y, beta) return beta, Jvals In [60]: # Use pandas to read the CSV file df = pd.read_csv("datasets/dataset_corka.csv") # Get the feature-values and the target values X_without_dummy_unscaled = df[["flarea", "bdrms", "bthrms"]].values y = df["price"].values # Scale it scaler = StandardScaler() X_without_dummy = scaler.fit_transform(x_without_dummy_unscaled) # Add the extra column to X X = add_dummy_feature(x_without_dummy) # Run the Stochastic Gradient Descent beta, Jvals = stochastic_gradient_descent_for_ols_linear_regression(x, y, num_epochs = 50) # Display beta beta Out[60]: array([ e+02, e+02, e+01, e-01]) 8 of 10
9 In [61]: fig = plt.figure(figsize=(8,6)) plt.title("$j$ during learning") plt.xlabel("number of iterations") plt.xlim(1, Jvals.size) plt.ylabel("$j$") plt.ylim(3500, 50000) xvals = np.linspace(1, Jvals.size, Jvals.size) plt.scatter(xvals, Jvals) plt.show() Mini-Batch Gradient Descent Batch Gradient Descent computed gradients from the full training set Stochastic Gradient Descent computed gradients from just one example Mini-Batch Gradient Descent lies between the two: It computes gradients from a small randomly-selected subset of the training set, called a mini-batch Since it lies between the two: It may bounce less and get closer to the global minimum than SGD Although both of them can reach the global minimum with a good learning schedule But it may be harder to escape local minima, if you have them (which, for OLS, we don't) And its time and memory costs lie between the two 9 of 10
10 The Normal Equation versus Gradient Descent Efficiency/scaling-up Normal Equation is linear in m, so can handle large training sets efficiently if they fit into main memory but it has to compute the inverse (or psueudo-inverse) of a between quadratic and cubic in, and so is only feasible for smallish Gradient Descent SGD scales really well to huge matrix, which takes time And all three Gradient Descent methods can handle huge (even 100s of 1000s) (up to a few thousand) Finding the global minimum for OLS regression Normal Equation: guaranteed to find the global minimum Gradient Descent: all a bit dependent on number of iterations, learning rate, learning schedule Feature scaling: Normal Equation: scaling is not needed (In fact, I find that scikit-learn's LinearRegression class produces weird results if I do any scaling. I don't know why. So don't do it!) Gradient Descent: scaling is needed n m n n Finally, Gradient Descent is a general method, whereas the Normal Equation is only for OLS regression n n Logicstic Regression So what about classification using logistic regression? We have a different loss function (cross entropy) Happily, it is convex But there is no equivalent to the Normal Equation, so we must use Gradient Descent Not that it matters, but here is the partial derivative of its loss function with respect to β i (binary classification) m J 1 = ( x )β ) β j m (i) β y (i) x (i) j i=1 scikit-learn has the class LogisticRegression, but also SGDClassifier if you want more control After Christmas Here endeth CS4618 What will we do in CS4619? We will study some more complex models (i.e. non-linear ones) We will study underfitting and overfitting, and solutions to these This will lead into Neural Networks From there, we will study so-called Deep Learning for regression and classification, including for images We will generalize to problems such as sequence to vector, vector to sequence and sequence to sequences such as machine translation, speech recognition, We will reviste Reinforcement Learning We will consider knowledge representation and reasoning It'll be tough but brilliant In [ ]: 10 of 10
Derek Bridge School of Computer Science and Information Technology University College Cork
CS468: Artificial Intelligence I Ordinary Least Squares Regression Derek Bridge School of Computer Science and Information Technology University College Cork Initialization In [4]: %load_ext autoreload
More informationDerek Bridge School of Computer Science and Information Technology University College Cork
CS4619: Artificial Intelligence II Overfitting and Underfitting Derek Bridge School of Computer Science and Information Technology University College Cork Initialization In [1]: %load_ext autoreload %autoreload
More informationIn stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,
Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since
More information3 Types of Gradient Descent Algorithms for Small & Large Data Sets
3 Types of Gradient Descent Algorithms for Small & Large Data Sets Introduction Gradient Descent Algorithm (GD) is an iterative algorithm to find a Global Minimum of an objective function (cost function)
More informationCS4618: Artificial Intelligence I. Accuracy Estimation. Initialization
CS4618: Artificial Intelligence I Accuracy Estimation Derek Bridge School of Computer Science and Information echnology University College Cork Initialization In [1]: %reload_ext autoreload %autoreload
More informationDerek Bridge School of Computer Science and Information Technology University College Cork
CS4618: rtificial Intelligence I Vectors and Matrices Derek Bridge School of Computer Science and Information Technology University College Cork Initialization In [1]: %load_ext autoreload %autoreload
More informationDEEP LEARNING IN PYTHON. The need for optimization
DEEP LEARNING IN PYTHON The need for optimization A baseline neural network Input 2 Hidden Layer 5 2 Output - 9-3 Actual Value of Target: 3 Error: Actual - Predicted = 4 A baseline neural network Input
More informationintro_mlp_xor March 26, 2018
intro_mlp_xor March 26, 2018 1 Introduction to Neural Networks Some material from peterroelants Goal: understand neural networks so they are no longer a black box In [121]: # do all of the imports here
More informationDerek Bridge School of Computer Science and Information Technology University College Cork
CS4619: Artificial Intelligence II Methodology Dere Bridge School of Computer Science and Information Technology University College Cor Initialization In [1]: %load_ext autoreload %autoreload 2 %matplotlib
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationPlanar data classification with one hidden layer
Planar data classification with one hidden layer Welcome to your week 3 programming assignment. It's time to build your first neural network, which will have a hidden layer. You will see a big difference
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationHMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression
HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression Goals: To open up the black-box of scikit-learn and implement regression models. To investigate how adding polynomial
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More informationLogistic Regression and Gradient Ascent
Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence
More informationLecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017
INF 5860 Machine learning for image classification Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 0, 207 Mandatory exercise Available tonight,
More informationUnsupervised Learning: K-means Clustering
Unsupervised Learning: K-means Clustering by Prof. Seungchul Lee isystems Design Lab http://isystems.unist.ac.kr/ UNIST Table of Contents I. 1. Supervised vs. Unsupervised Learning II. 2. K-means I. 2.1.
More informationNeural Networks (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Neural Networks (pp. 106-121) The first artificial neural network (ANN) was the (single-layer) perceptron, a simplified model of a biological neuron.
More informationEfficient Deep Learning Optimization Methods
11-785/ Spring 2019/ Recitation 3 Efficient Deep Learning Optimization Methods Josh Moavenzadeh, Kai Hu, and Cody Smith Outline 1 Review of optimization 2 Optimization practice 3 Training tips in PyTorch
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationNeural networks. About. Linear function approximation. Spyros Samothrakis Research Fellow, IADS University of Essex.
Neural networks Spyros Samothrakis Research Fellow, IADS University of Essex About Linear function approximation with SGD From linear regression to neural networks Practical aspects February 28, 2017 Conclusion
More informationscikit-learn (Machine Learning in Python)
scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize
More informationIST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent
IST 597 Foundations of Deep Learning Fall 2018 Homework 1: Regression & Gradient Descent This assignment is worth 15% of your grade for this class. 1 Introduction Before starting your first assignment,
More informationCS4618: Artificial Intelligence I. Clustering: Introduction. Initialization
CS4618: Artificial Intelligence I Clustering: Introduction Dere Bridge School of Computer Science and Information Technology University College Cor Initialization %reload_et autoreload %autoreload 2 %matplotlib
More informationIST 597 Deep Learning Overfitting and Regularization. Sep. 27, 2018
IST 597 Deep Learning Overfitting and Regularization 1. Overfitting Sep. 27, 2018 Regression model y 1 3 x3 13 2 x2 36x10 import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import
More information5 Machine Learning Abstractions and Numerical Optimization
Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer
More informationLinear Regression Optimization
Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:
More information06: Logistic Regression
06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into
More informationSGD: Stochastic Gradient Descent
Improving SGD Hantao Zhang Deep Learning with Python Reading: http://neuralnetworksanddeeplearning.com/index.html Chapter 2 SGD: Stochastic Gradient Descent Main Idea: Given a set of input/output examples
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationREGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA
REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA MACHINE LEARNING It is the science of getting computer to learn without being explicitly programmed. Machine learning is an area of artificial
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationCOMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions
COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Regression
More informationUnderstanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)
Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Note: All source materials and diagrams are taken from the Coursera s lectures created by Dr Andrew Ng. Everything I have
More informationTutorial 1. Linear Regression
Tutorial 1. Linear Regression January 11, 2017 1 Tutorial: Linear Regression Agenda: 1. Spyder interface 2. Linear regression running example: boston data 3. Vectorize cost function 4. Closed form solution
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest
More informationLab Four. COMP Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves. October 22nd 2018
Lab Four COMP 219 - Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves October 22nd 2018 1 Reading Begin by reading chapter three of Python Machine Learning until page 80 found in the learning
More informationLab Five. COMP Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves. October 29th 2018
Lab Five COMP 219 - Advanced Artificial Intelligence Xiaowei Huang Cameron Hargreaves October 29th 2018 1 Decision Trees and Random Forests 1.1 Reading Begin by reading chapter three of Python Machine
More informationLogistic Regression with a Neural Network mindset
Logistic Regression with a Neural Network mindset Welcome to your first (required) programming assignment! You will build a logistic regression classifier to recognize cats. This assignment will step you
More informationMachine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Due: Friday, February 6, 2015, at 4pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationLecture Linear Support Vector Machines
Lecture 8 In this lecture we return to the task of classification. As seen earlier, examples include spam filters, letter recognition, or text classification. In this lecture we introduce a popular method
More informationGradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent
Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Slide credit: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationLinear Regression & Gradient Descent
Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.
More informationDeep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES
Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016
CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:
More informationCS230: Deep Learning Winter Quarter 2018 Stanford University
: Deep Learning Winter Quarter 08 Stanford University Midterm Examination 80 minutes Problem Full Points Your Score Multiple Choice 7 Short Answers 3 Coding 7 4 Backpropagation 5 Universal Approximation
More informationCost Functions in Machine Learning
Cost Functions in Machine Learning Kevin Swingler Motivation Given some data that reflects measurements from the environment We want to build a model that reflects certain statistics about that data Something
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More information2. Linear Regression and Gradient Descent
Pattern Recognition And Machine Learning - EPFL - Fall 2015 Emtiyaz Khan, Timur Bagautdinov, Carlos Becker, Ilija Bogunovic & Ksenia Konyushkova 2. Linear Regression and Gradient Descent 2.1 Goals The
More informationClass 6 Large-Scale Image Classification
Class 6 Large-Scale Image Classification Liangliang Cao, March 7, 2013 EECS 6890 Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/visualrecognitionandsearch Visual
More informationMachine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Due: Friday, February 5, 2015, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationLecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationCS 331: Artificial Intelligence Local Search 1. Tough real-world problems
CS 331: Artificial Intelligence Local Search 1 1 Tough real-world problems Suppose you had to solve VLSI layout problems (minimize distance between components, unused space, etc.) Or schedule airlines
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationTested Paradigm to Include Optimization in Machine Learning Algorithms
Tested Paradigm to Include Optimization in Machine Learning Algorithms Aishwarya Asesh School of Computing Science and Engineering VIT University Vellore, India International Journal of Engineering Research
More informationOptimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund
Optimization Plugin for RapidMiner Technical Report Venkatesh Umaashankar Sangkyun Lee 04/2012 technische universität dortmund Part of the work on this technical report has been supported by Deutsche Forschungsgemeinschaft
More informationLogistic Regression. Abstract
Logistic Regression Tsung-Yi Lin, Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl60}@ucsd.edu January 4, 013 Abstract Logistic regression
More informationMore on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.
More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood
More informationTutorial Four: Linear Regression
Tutorial Four: Linear Regression Imad Pasha Chris Agostino February 25, 2015 1 Introduction When looking at the results of experiments, it is critically important to be able to fit curves to scattered
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationIris Example PyTorch Implementation
Iris Example PyTorch Implementation February, 28 Iris Example using Pytorch.nn Using SciKit s Learn s prebuilt datset of Iris Flowers (which is in a numpy data format), we build a linear classifier in
More informationClassification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions
ENEE 739Q SPRING 2002 COURSE ASSIGNMENT 2 REPORT 1 Classification and Regression using Linear Networks, Multilayer Perceptrons and Radial Basis Functions Vikas Chandrakant Raykar Abstract The aim of the
More informationLinear Regression Implementation
Linear Regression Implementation 1 When you experience regression, you go back in some way. The process of regressing is to go back to a less perfect or less developed state. Modeling data is focused on
More informationPackage automl. September 13, 2018
Type Package Title Deep Learning with Metaheuristic Version 1.0.5 Author Alex Boulangé Package automl September 13, 2018 Maintainer Alex Boulangé Fits from
More informationHyperparameter optimization. CS6787 Lecture 6 Fall 2017
Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Mini-batching Batch size Momentum Momentum
More informationConvolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN) By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Convolution on Image I. 1.1. Convolution in 1D II. 1.2. Convolution
More informationOptimization. Industrial AI Lab.
Optimization Industrial AI Lab. Optimization An important tool in 1) Engineering problem solving and 2) Decision science People optimize Nature optimizes 2 Optimization People optimize (source: http://nautil.us/blog/to-save-drowning-people-ask-yourself-what-would-light-do)
More informationHomework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1
Homework 2 Due: March 2, 2018 at 7:00PM Written Questions Problem 1: Estimator (5 points) Let x 1, x 2,..., x m be an i.i.d. (independent and identically distributed) sample drawn from distribution B(p)
More informationGradient Descent. 1) S! initial state 2) Repeat: Similar to: - hill climbing with h - gradient descent over continuous space
Local Search 1 Local Search Light-memory search method No search tree; only the current state is represented! Only applicable to problems where the path is irrelevant (e.g., 8-queen), unless the path is
More informationOptimization and least squares. Prof. Noah Snavely CS1114
Optimization and least squares Prof. Noah Snavely CS1114 http://cs1114.cs.cornell.edu Administrivia A5 Part 1 due tomorrow by 5pm (please sign up for a demo slot) Part 2 will be due in two weeks (4/17)
More informationCase Study 1: Estimating Click Probabilities
Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationMachine Learning Part 1
Data Science Weekend Machine Learning Part 1 KMK Online Analytic Team Fajri Koto Data Scientist fajri.koto@kmklabs.com Machine Learning Part 1 Outline 1. Machine Learning at glance 2. Vector Representation
More informationCPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016
CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2016 Assignment 5: Due Friday. Assignment 6: Due next Friday. Final: Admin December 12 (8:30am HEBB 100) Covers Assignments 1-6. Final from
More informationLecture 3: Theano Programming
Lecture 3: Theano Programming Misc Class Items Registration & auditing class Paper presentation Projects: ~10 projects in total ~2 students per project AAAI: Hinton s invited talk: Training data size increase
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationIMPROVEMENT OF DEEP LEARNING MODELS ON CLASSIFICATION TASKS USING HAAR TRANSFORM AND MODEL ENSEMBLE
IMPROVEMENT OF DEEP LEARNING MODELS ON CLASSIFICATION TASKS USING HAAR TRANSFORM AND MODEL ENSEMBLE Bachelor s thesis Valkeakoski, Automation Engineering Spring 2017 Tung Son Nguyen 1 ABSTRACT Automation
More informationPractical example - classifier margin
Support Vector Machines (SVMs) SVMs are very powerful binary classifiers, based on the Statistical Learning Theory (SLT) framework. SVMs can be used to solve hard classification problems, where they look
More informationAutoencoder. 1. Unsupervised Learning. By Prof. Seungchul Lee Industrial AI Lab POSTECH.
Autoencoder By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Unsupervised Learning II. 2. Autoencoders III. 3. Autoencoder with TensorFlow I. 3.1. Import
More informationHomework 5. Due: April 20, 2018 at 7:00PM
Homework 5 Due: April 20, 2018 at 7:00PM Written Questions Problem 1 (25 points) Recall that linear regression considers hypotheses that are linear functions of their inputs, h w (x) = w, x. In lecture,
More informationLab 10 - Ridge Regression and the Lasso in Python
Lab 10 - Ridge Regression and the Lasso in Python March 9, 2016 This lab on Ridge Regression and the Lasso is a Python adaptation of p. 251-255 of Introduction to Statistical Learning with Applications
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationIndex. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,
A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,
More information15.1 Optimization, scaling, and gradient descent in Spark
CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 16, 5/24/2017. Scribed by Andreas Santucci. Overview
More informationMATH 829: Introduction to Data Mining and Analysis Model selection
1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationInterpolation and curve fitting
CITS2401 Computer Analysis and Visualization School of Computer Science and Software Engineering Lecture 9 Interpolation and curve fitting 1 Summary Interpolation Curve fitting Linear regression (for single
More informationCSC 1315! Data Science
CSC 1315! Data Science Data Visualization Based on: Python for Data Analysis: http://hamelg.blogspot.com/2015/ Learning IPython for Interactive Computation and Visualization by C. Rossant Plotting with
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University
More informationNeural Networks: Optimization Part 1. Intro to Deep Learning, Fall 2018
Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2018 1 Story so far Neural networks are universal approximators Can model any odd thing Provided they have the right architecture We must
More information