CS 294-1: Assignment 2 A Large-Scale Linear Regression Sentiment Model
|
|
- Austen McKinney
- 6 years ago
- Views:
Transcription
1 CS 294-1: Assignment 2 A Large-Scale Linear Regression Sentiment Model Shaunak Chatterjee Computer Science Division University of California Berkeley, CA shaunakc@cs.berkeley.edu Abstract The primary objective of this assignment was to build a linear regression sentiment model based on amazon.com reviews. The main challenge comprised of handling moderately large amounts of data on a single machine. The different variations that I tried include the following: exact solution (L2 loss and ridge regularization), stochastic gradient with different training schemes and initialization, lasso regularization and unigram and bigram features. 1 Introduction Linear regression is a very popular approach to modeling the relationship between a response variable y and one or more explanatory variables x = {x 1, x 2,..., x p } using a linear model: ŷ = ˆβ 0 + p x j ˆβj j=1 ˆβ 0 is the intercept or bias of the model. This can also be handled by adding a constant x 0 = 1 to every x, in which case the formula simplifies to: ŷ = p x j ˆβj = x T ˆβ j=0 There are several variants of the linear regression model. The loss function that we wish to minimize (by learning an appropriate ˆβ) can be chosen based on the application. Popular choices are the L1 and L2 norm of the difference between y (the true value) and ŷ (the predicted value). For the L2-loss, there exists an exact solution: ˆβ = (X T X) 1 X T y In practice, this regression solution could be singular, especially if the features are dependent. The system actually has a linear space of solutions but the matrix X T X is not invertible because of singularity. We can get rid of this problem (with very high probability) by imposing additional constraints of minimizing the norm of ˆβ. If we minimize the L2-norm of ˆβ, it is called Ridge regression and the solution is: ˆβ = (X T X + λi) 1 X T y We can also minimize the L1-loss of ˆβ, and this is called Lasso regularization. This does not have a closed form solution, but can be solved by stochastic gradient methods. The rest of this report is structured as follows. Section 2 describes the data pre-processing issues. Section 3 and Section 4 provide detailed analysis of the classifier performance with unigrams and bigrams respectively and we conclude in Section 5. 2 Dataset and its pre-processing The task in this assignment was to build a linear regression sentiment model based on book reviews from The reviews (about 975,000 including duplicates) were collected by Mark Dredze et al at Johns Hopkins (available from datasets/sentiment/). I used a binary file containing the XML tree representation of the reviews tokens.bin. I performed all my experiments for this assignment in MAT- LAB. 2.1 Parsing reviews from tokens.bin The XML tree for each review contained all the relevant information about it. The < rating > field contained a numerical score (1, 2, 4, or 5). The reviews corresponding to 3 were removed as they were deemed to be neutral (and hence uninformative!). The < review > token was used to identify where file information changed. The
2 < reviewer >, < title >, < review text >, < date > and < helpf ul > fields all contained information possibly relevant to the sentiment model. However, in this assignment, I have only used the < review text > field. This choice was based on what I felt most closely resembles the true world. We do not generally have access to any information other than the raw text. I started off by reading in from the binary stream entry by entry. This was painfully slow (would have required more than a day to go through all the reviews XML trees). Instead, when I read the binary input stream in blocks of 100, 000 entries, the process finished in less than half an hour! That was an important lesson learnt. The time taken was not very sensitive to the block size that I used. 2.2 Duplicate review removal Another artifact in this dataset was that there were a number of duplicate reviews. I used a hash function of the first 500 words (or the entire review if its length was smaller) to eliminate duplicates. MATLAB does not have any inbuilt HashSet, so I implemented one myself. My hash function was not very strong, so there were a few false duplicate detections. The number of unique reviews I am using are 515, 516. For my experiments, I divided my dataset into 10 partitions and consider a 9 : 1 train:test ratio. All review numbers (in the unique reviews list) having the same digit is considered to be one partition. 3 Experiments with Unigrams In this section, I will describe the different classifiers I implemented with unigram tokens. 3.1 Ridge regression with L2-loss The first classifier I implemented was the one with an L2- loss function and regularization. For this classifier, we can obtain the solution in closed form (as described in Section 1). Let us analyze each step in this algorithm: Computing X T X : X is a sparse feature matrix. Hence this matrix multiplication is quite fast. For dim(x) = , this step took between 30 to 40 seconds. Computing X T y : Much faster than the previous step Inverting X T X : This p p matrix is dense. Hence, inverting it takes a lot of time a few minutes in my case. This step is the main computational bottleneck. The largest matrix I could invert was for p = 20000, before I ran out of memory on 12GB RAM. The parameters of this classifier are the number of features used (p) and the regularization parameter λ. I used the p most frequent tokens as my features for p = 5000, and As expected, the test RMSE decreased with an increase in the number of features used. I also varied the regularization parameter λ, using the values 1, 10, 100 and Increasing the regularization parameter seemed to have a marginal positive impact on the performance. This suggests that the dataset favors an intercept-only model (for λ = ). The results are shown in Table 1. The most influential positive and negative words from this exact method are listed in Table 2 under the Exact column (these are results for p = 15000, λ = 100). #F eatures λ = 1 λ = 10 λ = 100 λ = Table 1: Root Mean Squared Error (RMSE) on test data: Exact Ridge Regression (L2 error) with unigrams Positive Negative Exact St. Gr. Exact St. Gr. refreshingly best disappointing i funniest life unreadable book patron most waste but pleased history drivel was invaluable love poorly not excellent human useless just bravo knowledge tripe like awesome years laughable dont punches you worst no donovan us worthless author Table 2: Highest weighted positive and negative unigrams 3.2 Stochastic gradient descent The next thing I implemented was the stochastic gradient method. My first objective was to include more features in the model and see how that affected my error. All experiments in this section were run with p = features (the p most frequently occurring tokens were chosen). Initially, I stuck to the L2-loss and ridge regularization Armijo rule The initial runs of the stochastic gradient would either just hover around the initial RMSE (without reducing it) or just explode! Starting with a large step size and then gradually reducing it with increasing number of iterations
3 also did not help (I tried quite a few step size functions). So I finally settled on the adaptive Armijo rule to decide the stepsize. The Armijo rule is essentially a line search in m = {1, 2,..., } for α m such that f(x+α m f(x)) < cf(x), where c < 1 is the improvement you wish to have in each iteration. α is the base step size. I picked the best value for m {1, 10} at every step. I also experimented with different values of α. For α larger than.5, the process would diverge. For α less than.3, the process would almost always converge Training block samples The next issue was to choose a block size. In each iteration of the stochastic gradient descent, we update ˆβ based on a block of reviews. If the dataset is uniform, how we choose a block is immaterial. However, in this dataset, the distribution was not uniform. I tried two different block selection schemes. Firstly, I tried a sequential scanning scheme, where I based my update on a sequential chunk of 1000 reviews (from the training set). The other alternative that I tried was random sampling randomly selecting a set of reviews from the entire training set in every iteration. Figure 2: Receiver operating curve (ROC) for the different classifiers. The exact model with smaller number of features dominates Lift Scores The lift score is essentially a measure of how much better a classifier is, compared to a random classifier. We have reported the 1% lift scores. It is interesting to note that the lift scores for the positive class (i.e. positive reviews) is consistently better than the lift scores for the negative class (see Table 3). In our dataset, there were a lot more positive reviews than negative ones. Firstly, this results in a more confident positive classifier. Secondly, the test set also has more positive reviews, which in turn results in a relative high false positive rate for the negative classifier for a given true positive rate Better initialization Figure 1: RMSE convergence with iterations: Different training schemes The convergence rates for the two methods are shown in Figure 1. The random sampling clearly works better. This is also demonstrated in a better AUC score and higher lift scores (see Table 3). Unfortunately, the stochastic gradient descent method could not learn very effective classifiers (at least not as effective as the exact model with a much smaller number of features). This is reflected in the better AUC and lift scores of the exact method. Also, a look at the most influential words reveals that the gradient descent method is focusing on very non-intuitive or vague words (Table 2). The performance of the classifiers learnt by stochastic gradient were not as good as the exact model learnt previously with lesser number of features. The convergence patterns also seemed to suggest that a better initialization of ˆβ might help. An obvious initialization point was the ˆβ learnt from the exact method. Although this was only for a small portion of the new, much larger weight vector, this could prove useful. The experiments supported the intuition. The convergence rates were much better as were the AUC and lift score (see Table 3, Figure 1, Figure 2). 3.3 Lasso regularization The final thing I tried with unigrams was to change the regularization term from L2 (ridge) to L1 (lasso). This, of course, was convenient from an implementation standpoint since it was a single line change in the code for the stochastic gradient descent implementation. The convergence of the ridge and lasso regularization methods were almost identical as seen in Figure 3.
4 Method AUC Class 1% lift score Sequential Scan ve ve Random Scan ve ve Ridge - Better init ve ve Lasso - Better init ve ve Matrix inversion ve ve Table 3: AUC and Lift Scores of various models with unigrams enormous number of collisions. Instead we can choose the bin number by the following mapping: #bin(< token 1, token 2 >) = mod(token 1 token 2, K) This ensures a much more even distribution of the bigrams (in terms of their number of occurrences. To reduce the number of possible bigrams, I only considered bigrams where both tokens were among the 100, 000 most frequent unigram tokens. Once we have the hash set up, we give them token numbers as per a decreasing order of frequency (similar to the unigram numbering scheme). 4.2 Results Figure 3: Gradient descent convergence for Ridge and Lasso regularization 4 Experiments with bigrams The final thing that I tried was to run the exact method with bigram tokens instead of unigrams. Since bigrams capture more context than what their constituent unigrams do individually, this is a natural extension. For bigrams, I tried the exact method. I selected the most frequent bigrams as my features. Varying λ did not affect the results much (all reported results are with λ = 100). The AUC score was This was initially a surprise since it was smaller than the AUC for the unigram model. However, an inspection of the most influential positive and negative bigrams (see Table 4) indicates what went wrong. The top scoring unigrams (which were also very intuitive) did not make it into the list of most frequent bigrams, hence they were not considered as features. Positive enlightening and the funniest together they a moral not afraid Negative however in job with from page works on i became Table 4: Highest weighted positive and negative bigrams 4.1 Constructing bigram tokens As mentioned before, MATLAB does not have a HashSet implementation. In order to assign token numbers to the bigrams, I had to create a hash function for any possible bigram. A unique hash is ensured by the following mapping: < token 1, token 2 >: B token 1 + token 2 where B is the vocabulary size. However, if we place this hash value into K bins with a modular K operation, the results will be disastrous, since all bigrams ending with the same word, end up in the same bin and thus result in an 4.3 Measuring flops MATLAB has deprecated the flops function since it included LAPACK a few years ago. I looked online at other options of still doing it and came across a library by Tom Minka ( minka/software/lightspeed/). The regular flops method does not quite work as advertised. The operation specific methods (I checked flops inv ) seem to work. But it did not make sense to quote flops of single operations could not figure out a meaningful way of combining the individual methods to come up with a composite flops value (would be curious to know if someone else did).
5 5 Conclusion Unfortunately, I underestimated the time it would take me for create the bigram tokens. Hence, I ran out of time to run a stochastic gradient optimization with a larger number of bigram tokens (which would have included the more expected bigrams). There is a small chance that there is some indexing issue in my bigram creation pipeline but it has passed all the sanity checks that I put it through. The biggest takeaway from this assignment for me was learning to deal with moderately large amounts of data with a reasonable amount of computational power (a single machine). The implementation optimizations including hash tables, sparse vectors and matrices and block updating of these sparse structures are important lessons for the future. Wishlist I did not have access to the matfile command in my version of MATLAB. That command could have possibly eased the initial data handling phase of this assignment. Acknowledgements The author would like to thank Aastha Jain for generously lending her powerful workstation. The author also acknowledges Mobin Javed and Anupam Prakash for several interesting discussions.
CS294-1 Assignment 2 Report
CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More information1 Training/Validation/Testing
CPSC 340 Final (Fall 2015) Name: Student Number: Please enter your information above, turn off cellphones, space yourselves out throughout the room, and wait until the official start of the exam to begin.
More informationLasso. November 14, 2017
Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................
More informationLearning Meanings for Sentences with Recursive Autoencoders
Learning Meanings for Sentences with Recursive Autoencoders Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationFastText. Jon Koss, Abhishek Jindal
FastText Jon Koss, Abhishek Jindal FastText FastText is on par with state-of-the-art deep learning classifiers in terms of accuracy But it is way faster: FastText can train on more than one billion words
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationCS281 Section 3: Practical Optimization
CS281 Section 3: Practical Optimization David Duvenaud and Dougal Maclaurin Most parameter estimation problems in machine learning cannot be solved in closed form, so we often have to resort to numerical
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationConstrained optimization
Constrained optimization A general constrained optimization problem has the form where The Lagrangian function is given by Primal and dual optimization problems Primal: Dual: Weak duality: Strong duality:
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationKaggle See Click Fix Model Description
Kaggle See Click Fix Model Description BY: Miroslaw Horbal & Bryan Gregory LOCATION: Waterloo, Ont, Canada & Dallas, TX CONTACT : miroslaw@gmail.com & bryan.gregory1@gmail.com CONTEST: See Click Predict
More informationPredicting User Ratings Using Status Models on Amazon.com
Predicting User Ratings Using Status Models on Amazon.com Borui Wang Stanford University borui@stanford.edu Guan (Bell) Wang Stanford University guanw@stanford.edu Group 19 Zhemin Li Stanford University
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationBuilding Fast Performance Models for x86 Code Sequences
Justin Womersley Stanford University, 450 Serra Mall, Stanford, CA 94305 USA Christopher John Cullen Stanford University, 450 Serra Mall, Stanford, CA 94305 USA jwomers@stanford.edu cjc888@stanford.edu
More informationCS294-1 Final Project. Algorithms Comparison
CS294-1 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 2013-05-15 1 INTRODUCTION In this project, we
More informationProgramming Exercise 1: Linear Regression
Programming Exercise 1: Linear Regression Machine Learning Introduction In this exercise, you will implement linear regression and get to see it work on data. Before starting on this programming exercise,
More informationSentiment analysis under temporal shift
Sentiment analysis under temporal shift Jan Lukes and Anders Søgaard Dpt. of Computer Science University of Copenhagen Copenhagen, Denmark smx262@alumni.ku.dk Abstract Sentiment analysis models often rely
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More information3 Types of Gradient Descent Algorithms for Small & Large Data Sets
3 Types of Gradient Descent Algorithms for Small & Large Data Sets Introduction Gradient Descent Algorithm (GD) is an iterative algorithm to find a Global Minimum of an objective function (cost function)
More informationTutorials Case studies
1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.
More informationHMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression
HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression Goals: To open up the black-box of scikit-learn and implement regression models. To investigate how adding polynomial
More informationPartitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning
Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning
More informationProblems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1
Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure
More information311 Predictions on Kaggle Austin Lee. Project Description
311 Predictions on Kaggle Austin Lee Project Description This project is an entry into the SeeClickFix contest on Kaggle. SeeClickFix is a system for reporting local civic issues on Open311. Each issue
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationParallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney
Parallel Methods for Convex Optimization A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney Problems minimize g(x)+f(x; A, b) Sparse regression g(x) =kxk 1 f(x) =kax bk 2 2 mx Sparse SVM g(x) =kxk
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More informationLinear Regression Optimization
Gradient Descent Linear Regression Optimization Goal: Find w that minimizes f(w) f(w) = Xw y 2 2 Closed form solution exists Gradient Descent is iterative (Intuition: go downhill!) n w * w Scalar objective:
More informationCS140 Final Project. Nathan Crandall, Dane Pitkin, Introduction:
Nathan Crandall, 3970001 Dane Pitkin, 4085726 CS140 Final Project Introduction: Our goal was to parallelize the Breadth-first search algorithm using Cilk++. This algorithm works by starting at an initial
More informationGradient LASSO algoithm
Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016
CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationThe Fly & Anti-Fly Missile
The Fly & Anti-Fly Missile Rick Tilley Florida State University (USA) rt05c@my.fsu.edu Abstract Linear Regression with Gradient Descent are used in many machine learning applications. The algorithms are
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationBy Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin
By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin Outline Problem Statement Background Proposed Solution Experiments & Results Related Work Future
More informationPractice Questions for Midterm
Practice Questions for Midterm - 10-605 Oct 14, 2015 (version 1) 10-605 Name: Fall 2015 Sample Questions Andrew ID: Time Limit: n/a Grade Table (for teacher use only) Question Points Score 1 6 2 6 3 15
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationPS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:
Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning
CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with
More informationPredict the box office of US movies
Predict the box office of US movies Group members: Hanqing Ma, Jin Sun, Zeyu Zhang 1. Introduction Our task is to predict the box office of the upcoming movies using the properties of the movies, such
More information2. On classification and related tasks
2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.
More informationAkarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction
Akarsh Pokkunuru EECS Department 03-16-2017 Contractive Auto-Encoders: Explicit Invariance During Feature Extraction 1 AGENDA Introduction to Auto-encoders Types of Auto-encoders Analysis of different
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationCS 179 Lecture 16. Logistic Regression & Parallel SGD
CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)
More informationOn The Value of Leave-One-Out Cross-Validation Bounds
On The Value of Leave-One-Out Cross-Validation Bounds Jason D. M. Rennie jrennie@csail.mit.edu December 15, 2003 Abstract A long-standing problem in classification is the determination of the regularization
More informationOnline Algorithm Comparison points
CS446: Machine Learning Spring 2017 Problem Set 3 Handed Out: February 15 th, 2017 Due: February 27 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that
More informationCSC 2515 Introduction to Machine Learning Assignment 2
CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error
More informationComparing Implementations of Optimal Binary Search Trees
Introduction Comparing Implementations of Optimal Binary Search Trees Corianna Jacoby and Alex King Tufts University May 2017 In this paper we sought to put together a practical comparison of the optimality
More informationCSE151 Assignment 2 Markov Decision Processes in the Grid World
CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are
More information06: Logistic Regression
06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into
More informationCHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS
CHAPTER 8 COMPOUND CHARACTER RECOGNITION USING VARIOUS MODELS 8.1 Introduction The recognition systems developed so far were for simple characters comprising of consonants and vowels. But there is one
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationCPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015
CPSC 340: Machine Learning and Data Mining Robust Regression Fall 2015 Admin Can you see Assignment 1 grades on UBC connect? Auditors, don t worry about it. You should already be working on Assignment
More informationSolution 1 (python) Performance: Enron Samples Rate Recall Precision Total Contribution
Summary Each of the ham/spam classifiers has been tested against random samples from pre- processed enron sets 1 through 6 obtained via: http://www.aueb.gr/users/ion/data/enron- spam/, or the entire set
More informationPopularity of Twitter Accounts: PageRank on a Social Network
Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationPerceptron: This is convolution!
Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image
More informationMore on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.
More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood
More informationCSE 250B Assignment 2 Report
CSE 250B Assignment 2 Report February 16, 2012 Yuncong Chen yuncong@cs.ucsd.edu Pengfei Chen pec008@ucsd.edu Yang Liu yal060@cs.ucsd.edu Abstract In this report we describe our implementation of a conditional
More informationAnnouncements: projects
Announcements: projects 805 students: Project proposals are due Sun 10/1. If you d like to work with 605 students then indicate this on your proposal. 605 students: the week after 10/1 I will post the
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationGradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent
Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Slide credit: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
More informationYoutube Graph Network Model and Analysis Yonghyun Ro, Han Lee, Dennis Won
Youtube Graph Network Model and Analysis Yonghyun Ro, Han Lee, Dennis Won Introduction A countless number of contents gets posted on the YouTube everyday. YouTube keeps its competitiveness by maximizing
More informationSalford Systems Predictive Modeler Unsupervised Learning. Salford Systems
Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationUniversity of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2
Assignment goals Use mutual information to reconstruct gene expression networks Evaluate classifier predictions Examine Gibbs sampling for a Markov random field Control for multiple hypothesis testing
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More information3D model classification using convolutional neural network
3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing
More informationRobust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson
Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationProgramming Exercise 4: Neural Networks Learning
Programming Exercise 4: Neural Networks Learning Machine Learning Introduction In this exercise, you will implement the backpropagation algorithm for neural networks and apply it to the task of hand-written
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationMachine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD Due: Friday, February 5, 2015, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationLecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations
More informationShingling Minhashing Locality-Sensitive Hashing. Jeffrey D. Ullman Stanford University
Shingling Minhashing Locality-Sensitive Hashing Jeffrey D. Ullman Stanford University 2 Wednesday, January 13 Computer Forum Career Fair 11am - 4pm Lawn between the Gates and Packard Buildings Policy for
More informationExcerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.
Chapter 5 Using the Integers In spite of their being a rather restricted class of numbers, the integers have a lot of interesting properties and uses. Math which involves the properties of integers is
More information