STAT432 Mini-Midterm Exam I (green) University of Illinois Urbana-Champaign February 25 (Monday), :00 10:50a SOLUTIONS
|
|
- Priscilla Thompson
- 5 years ago
- Views:
Transcription
1 STAT432 Mini-Midterm Exam I (green) University of Illinois Urbana-Champaign February 25 (Monday), :00 10:50a SOLUTIONS Question 1 (6.5 points) 2 (3.5 points) 3 (10 points) extra-credit (1 point) Total (21 points) Points Question 1: Multiple Choices Choose ALL of correct statements for each problem. (1) [1 pt] Which of the following is the instructor of this course? (a) (b) (c) (b). (2) [1 pt] What did the instructor do when once he forgot to bring any marker in the class? (a) He borrowed one from the class. (b) He used the document camera with a pen and papers. (c) He never used markers on the board. (d) He always brings markers to the class. 1
2 (b). (3) (1.5pts) Which of the following is (are) true regarding the bias-variance trade-off? (a) Linear model does not involve the bias-variance trade-off if the true underlying model is indeed linear. (b) In linear regression, smaller number of covariates leads to larger variance. (c) For knn, larger k means smaller variance. (d) For Lasso, larger λ means smaller variance. (c) (d). (4) (1.5pts) Which of the following is an unsupervised model? (a) knn for classification (b) k-means clustering (c) Lasso (d) AIC (e) Ridge regression (b). (5) (1.5pts) Use line to connect the figures with the correct descriptions. (a) lasso (b) ridge regression (c) elastic net 1 (b); 2 (c); 3 (a) Question 2: Proof [3.5 pts] Prove that the eigenvalues of X T X are square of the singular values of X. (For SVD of X = UDV T, singular values are diagonal entries of D). 2
3 First we have SVD of X = UDV T with orthogonal matrices U and V. Then we can also have eigendecomposition of X T X = V D V where D is a diagonal matrix with entries being eigenvalues of X T X. If we substitute SVD into X T X we have X T X = V DU T UDV T = V D 2 V T because of the orthogonality of U. Then comparing with the eigen-decomposition we have Therefore, we have completed the proof. D = D 2 Question 3: Calculation Suppose there are 5 observations with 2 covariates. They are currently assigned to cluster C, shown below. For this question, you don t have to show all the detailed calculations, as long as your answer is correct. obs x 1 x 2 C (1) [2.5 pts] Plot the points. (2) [2.5 pts] Based on the k-means clustering algorithm, what is the cluster assignment C for the next one iteration? What is the corresponding cluster means? (3) [2.5 pts] Will C and cluster means be updated again? If they do, give the new values, if not, give a brief explanation. (4) [2.5 pts] Is this the best possible clustering result? Briefly explain your answer. (Extra-credit 1 pt) Do you have any comments (0.5 pts) and suggestions (0.5 pts) about this course? 3
4 Stat 432 Mini-Midterm I 10:00 10:50a Feb 25, 2019 Question 3 (understand k-means) (1). [2.5 points] Plot the points. ## [1] x2 x1 (2). [2.5 points] We include those two updates in a function kmeans_1step defined below. And then we run for one iteration of those two steps and output the results. # pairwise distance function # cited from pdist <- function(a,b) { an = apply(a, 1, function(rvec) crossprod(rvec,rvec)) bn = apply(b, 1, function(rvec) crossprod(rvec,rvec)) m = nrow(a) n = nrow(b) tmp = matrix(rep(an, n), nrow=m) tmp = tmp + matrix(rep(bn, m), nrow=m, byrow=true) sqrt( tmp - 2 * tcrossprod(a,b) ) kmeans_1step=function(c,x,sync=false,plot=false){ # given the cluster assignment, update the cluster means cntrs=sort(unique(c)) K=length(cntrs) m=matrix(0,k,dim(x)[2]) for(k in 1:K)m[k,]=colMeans(x[C==cntrs[k],]) # given the cluster means, update the cluster assignment pdist_xm=pdist(x,m) C=apply(pdist_xm,1,which.min) if(sync){ # synchronize the results 1
5 for(k in 1:K)m[k,]=colMeans(x[C==cntrs[k],]) if(plot){ plot(x[,1], x[,2], col = C, pch = 19) points(m[,1],m[,2],col=cntrs,pch=4);text(x[,1], x[,2],1:length(c),pos=3) return(list(cluster=c,centers=m,dist2cntr=pdist_xm)) # initialization x <- cbind(x1,x2) C <- C0 # now we run one iteration and output the result upd <- kmeans_1step(c,x,true,true) x[, 2] 3 1 # the centers of the clusters (m <- upd$centers) ## [,1] [,2] ## [1,] ## [2,] # the cluster assignment (C <- upd$cluster) ## [1] Focus on the ideas and steps. x[, 1] (3). [2.5 points] Now we iterate the two-step updates in (1) until it does not change. Then we output the final result. # iterate until the cluster assignment does not change num_iter=0 while(1){ C0 <- C upd <- kmeans_1step(c,x) C <- upd$cluster
6 num_iter <- num_iter+1 if(identical(c0,c))break # output the final result C ## [1] plot(x1, x2, col = C, pch = 19) x2 x1 In this instance, the algorithm converges in 1 iteration. No, C and cluster means will not update after (2) in this case. This is because they have been stablized without further updates. (4). [2.5 points] Now we generate another random initialization of C and repeat the steps. # record the results in (b) C1 <- C set.seed(1) # generate another initial value of the cluster assignments (C = sample(1:2, n, replace = TRUE)) ## [1] # repeat the kmeans updates in (a) num_iter=0 while(1){ C0 <- C upd <- kmeans_1step(c,x) (C <- upd$cluster) num_iter <- num_iter+1 if(identical(c0,c))break # new cluster assignment C ## [1] # plot results on the same graph plot(x1, x2, col = C1, pch = 19) 3
7 points(x1, x2, col = C, pch = 1, cex=2) legend('bottomright',legend=c("init-asgn 1","init-asgn 2"),pch=c(19,1)) x2 init asgn 1 init asgn 2 iterations. x1 In this instance, the algorithm converges in 2 Note that kmeans algorithm depends on the initial cluster assignment. Therefore you may not get the same results for different initializations, as illustrated here. However, if you change the random seed number to be 10, you can get the same results as in (2). Again there is no unique answer. Therefore, unless we enumnerate all cases of possible assignments, we never know which is the best. In another words, the current cluster assignment obtain in (3) may not be the best. 4
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography Estimating homography from point correspondence
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationTake Home Exam # 2 Machine Vision
1 Take Home Exam # 2 Machine Vision Date: 04/26/2018 Due : 05/03/2018 Work with one awesome/breathtaking/amazing partner. The name of the partner should be clearly stated at the beginning of your report.
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationStat 4510/7510 Homework 6
Stat 4510/7510 1/11. Stat 4510/7510 Homework 6 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationPlease write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.
CS 189 Spring 016 Introduction to Machine Learning Final Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your two-page cheat sheet. Electronic
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationCSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.
CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct
More informationParallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade
Parallelization in the Big Data Regime 5: Data Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 23 Announcements...
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationCSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning
CSE 40171: Artificial Intelligence Learning from Data: Unsupervised Learning 32 Homework #6 has been released. It is due at 11:59PM on 11/7. 33 CSE Seminar: 11/1 Amy Reibman Purdue University 3:30pm DBART
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationPackage msgps. February 20, 2015
Type Package Package msgps February 20, 2015 Title Degrees of freedom of elastic net, adaptive lasso and generalized elastic net Version 1.3 Date 2012-5-17 Author Kei Hirose Maintainer Kei Hirose
More informationCS 231A Computer Vision (Winter 2015) Problem Set 2
CS 231A Computer Vision (Winter 2015) Problem Set 2 Due Feb 9 th 2015 11:59pm 1 Fundamental Matrix (20 points) In this question, you will explore some properties of fundamental matrix and derive a minimal
More informationEffectiveness of Sparse Features: An Application of Sparse PCA
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationPackage FisherEM. February 19, 2015
Type Package Title The Fisher-EM algorithm Version 1.4 Date 2013-06-21 Author Charles Bouveyron and Camille Brunet Package FisherEM February 19, 2015 Maintainer Camille Brunet
More informationLast time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression
Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationFeature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22
Feature selection Javier Béjar cbea LSI - FIB Term 2011/2012 Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/2012 1 / 22 Outline 1 Dimensionality reduction 2 Projections 3 Attribute selection
More informationModelling and Visualization of High Dimensional Data. Sample Examination Paper
Duration not specified UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Modelling and Visualization of High Dimensional Data Sample Examination Paper Examination date not specified Time: Examination
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationHomework 3: QR and Eigenproblems
Homework 3: QR and Eigenproblems CS 205A: Mathematical Methods for Robotics, Vision, and Graphics (Spring 2017) Stanford University Due Thursday, May 4, 11:59pm Textbook problems: 5.11 (35 points), 6.10
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationReminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches
Reminder: Lecture 20: The Eight-Point Algorithm F = -0.00310695-0.0025646 2.96584-0.028094-0.00771621 56.3813 13.1905-29.2007-9999.79 Readings T&V 7.3 and 7.4 Essential/Fundamental Matrix E/F Matrix Summary
More informationUnsupervised Clustering of Bitcoin Transaction Data
Unsupervised Clustering of Bitcoin Transaction Data Midyear Report 1 AMSC 663/664 Project Advisor: Dr. Chris Armao By: Stefan Poikonen Bitcoin: A Brief Refresher 2 Bitcoin is a decentralized cryptocurrency
More informationData Mining Algorithms In R/Clustering/K-Means
1 / 7 Data Mining Algorithms In R/Clustering/K-Means Contents 1 Introduction 2 Technique to be discussed 2.1 Algorithm 2.2 Implementation 2.3 View 2.4 Case Study 2.4.1 Scenario 2.4.2 Input data 2.4.3 Execution
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Unsupervised learning Daniel Hennes 29.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Supervised learning Regression (linear
More informationPenalized regression Statistical Learning, 2011
Penalized regression Statistical Learning, 2011 Niels Richard Hansen September 19, 2011 Penalized regression is implemented in several different R packages. Ridge regression can, in principle, be carried
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationComparison of Linear Regression with K-Nearest Neighbors
Comparison of Linear Regression with K-Nearest Neighbors Rebecca C. Steorts, Duke University STA 325, Chapter 3.5 ISL Agenda Intro to KNN Comparison of KNN and Linear Regression K-Nearest Neighbors vs
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More informationCourse # and Section #: MTH 120: WN 110 Start Date: Feb. 2, The following CHECKED items are ALLOWED to be taken in the testing room by students;
DELTA COLLEGE ACADEMIC TESTING CENTER Printed Name of Student Taking Exam EXAM COVERSHEET Instructor s Name: Jim Ham Exam #: Course # and Section #: MTH 2: WN Start Date: Feb. 2, 22 Type of Exam: Out-of
More informationLab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD
Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD Goals. The goal of the first part of this lab is to demonstrate how the SVD can be used to remove redundancies in data; in this example
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationPackage cwm. R topics documented: February 19, 2015
Package cwm February 19, 2015 Type Package Title Cluster Weighted Models by EM algorithm Version 0.0.3 Date 2013-03-26 Author Giorgio Spedicato, Simona C. Minotti Depends R (>= 2.14), MASS Imports methods,
More informationCSE 446 Bias-Variance & Naïve Bayes
CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationChapter 6: Linear Model Selection and Regularization
Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the
More informationSolution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013
Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation
More informationMIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns
MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns This is a closed-book exam. You should have no material on your desk other than the exam itself and a pencil or pen.
More informationSingular Value Decomposition, and Application to Recommender Systems
Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation
More informationCPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017
CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other
More informationCS 231A: Computer Vision (Winter 2018) Problem Set 2
CS 231A: Computer Vision (Winter 2018) Problem Set 2 Due Date: Feb 09 2018, 11:59pm Note: In this PS, using python2 is recommended, as the data files are dumped with python2. Using python3 might cause
More informationData Mining: Unsupervised Learning. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel
Data Mining: Unsupervised Learning Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Learning how k-means clustering works 2 Understanding dimensionality reduction
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationLarge-Scale Face Manifold Learning
Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random
More informationMidterm Exam Fundamentals of Computer Graphics (COMP 557) Thurs. Feb. 19, 2015 Professor Michael Langer
Midterm Exam Fundamentals of Computer Graphics (COMP 557) Thurs. Feb. 19, 2015 Professor Michael Langer The exam consists of 10 questions. There are 2 points per question for a total of 20 points. You
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value
More informationA Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS)
A Spectral-based Clustering Algorithm for Categorical Data Using Data Summaries (SCCADDS) Eman Abdu eha90@aol.com Graduate Center The City University of New York Douglas Salane dsalane@jjay.cuny.edu Center
More informationData Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\
Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization
More informationCSC 411 Lecture 18: Matrix Factorizations
CSC 411 Lecture 18: Matrix Factorizations Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 18-Matrix Factorizations 1 / 27 Overview Recall PCA: project data
More informationMachine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD
Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD Due: Friday, February 6, 2015, at 4pm (Submit via NYU Classes) Instructions: Your answers to the questions
More informationPackage munfold. R topics documented: February 8, Type Package. Title Metric Unfolding. Version Date Author Martin Elff
Package munfold February 8, 2016 Type Package Title Metric Unfolding Version 0.3.5 Date 2016-02-08 Author Martin Elff Maintainer Martin Elff Description Multidimensional unfolding using
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More informationOptimization in the Big Data Regime 5: Parallelization? Sham M. Kakade
Optimization in the Big Data Regime 5: Parallelization? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 21 Announcements...
More informationLast time... Bias-Variance decomposition. This week
Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional
More informationAN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS
AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 0: Course information Cho-Jui Hsieh UC Davis April 3, 2018 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/
More informationStat 4510/7510 Homework 4
Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader
More informationPS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:
Economics 1660: Big Data PS 6: Regularization Prof. Daniel Björkegren PART A: (Source: HTF page 95) The Ridge regression problem is: : β "#$%& = argmin (y # β 2 x #4 β 4 ) 6 6 + λ β 4 #89 Consider the
More informationEfficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid
Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of
More information2 Second Derivatives. As we have seen, a function f (x, y) of two variables has four different partial derivatives: f xx. f yx. f x y.
2 Second Derivatives As we have seen, a function f (x, y) of two variables has four different partial derivatives: (x, y), (x, y), f yx (x, y), (x, y) It is convenient to gather all four of these into
More informationHW Assignment 3 (Due by 9:00am on Mar 6)
HW Assignment 3 (Due by 9:00am on Mar 6) 1 Theory (150 points) 1. [Tied Weights, 50 points] Write down the gradient computation for a (non-linear) auto-encoder with tied weights i.e., W (2) = (W (1) )
More informationGradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent
Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Slide credit: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
More informationImage Compression using Singular Value Decomposition
Applications of Linear Algebra 1/41 Image Compression using Singular Value Decomposition David Richards and Adam Abrahamsen Introduction The Singular Value Decomposition is a very important process. In
More informationFeature extraction techniques to use in cereal classification
Feature extraction techniques to use in cereal classification Ole Mathis Kruse, IMT 2111 2005 1 Problem Is it possible to discriminate between different species- or varieties of cereal grains - using image
More informationPrelims Data Analysis TT 2018 Sheet 7
Prelims Data Analysis TT 208 Sheet 7 At the end of this exercise sheet there are Optional Practical Exercises in R and Matlab. It is strongly recommended that students do these exercises, but students
More informationNIC FastICA Implementation
NIC-TR-2004-016 NIC FastICA Implementation Purpose This document will describe the NIC FastICA implementation. The FastICA algorithm was initially created and implemented at The Helsinki University of
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Neural Computation : Lecture 14 John A. Bullinaria, 2015 1. The RBF Mapping 2. The RBF Network Architecture 3. Computational Power of RBF Networks 4. Training
More informationIntroduction to Automated Text Analysis. bit.ly/poir599
Introduction to Automated Text Analysis Pablo Barberá School of International Relations University of Southern California pablobarbera.com Lecture materials: bit.ly/poir599 Today 1. Solutions for last
More informationTwo-view geometry Computer Vision Spring 2018, Lecture 10
Two-view geometry http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 10 Course announcements Homework 2 is due on February 23 rd. - Any questions about the homework? - How many of
More informationInterlude: Solving systems of Equations
Interlude: Solving systems of Equations Solving Ax = b What happens to x under Ax? The singular value decomposition Rotation matrices Singular matrices Condition number Null space Solving Ax = 0 under
More informationPackage irlba. January 11, 2018
Type Package Package irlba January 11, 2018 Title Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices Version 2.3.2 Date 2018-1-9 Description
More informationYelp Recommendation System
Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop
More informationPackage r.jive. R topics documented: April 12, Type Package
Type Package Package r.jive April 12, 2017 Title Perform JIVE Decomposition for Multi-Source Data Version 2.1 Date 2017-04-11 Author Michael J. O'Connell and Eric F. Lock Maintainer Michael J. O'Connell
More informationLECTURE 11: LINEAR MODEL SELECTION PT. 2. October 18, 2017 SDS 293: Machine Learning
LECTURE 11: LINEAR MODEL SELECTION PT. 2 October 18, 2017 SDS 293: Machine Learning Announcements 1/2 CS Internship Lunch Presentations Come hear where Computer Science majors interned in Summer 2017!
More informationThe Singular Value Decomposition: Let A be any m n matrix. orthogonal matrices U, V and a diagonal matrix Σ such that A = UΣV T.
Section 7.4 Notes (The SVD) The Singular Value Decomposition: Let A be any m n matrix. orthogonal matrices U, V and a diagonal matrix Σ such that Then there are A = UΣV T Specifically: The ordering of
More informationLecture 27: Learning from relational data
Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on supervised learning
More informationSTAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression
STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,
More informationnag regsn mult linear upd model (g02ddc)
1. Purpose nag regsn mult linear upd model () nag regsn mult linear upd model () calculates the regression parameters for a general linear regression model. It is intended to be called after nag regsn
More informationECE 204 Numerical Methods for Computer Engineers MIDTERM EXAMINATION /4:30-6:00
ECE 4 Numerical Methods for Computer Engineers ECE 4 Numerical Methods for Computer Engineers MIDTERM EXAMINATION --7/4:-6: The eamination is out of marks. Instructions: No aides. Write your name and student
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More information