Practice Questions for Midterm

Similar documents
Midterm Oct 18, Name: Fall 2016 Midterm Exam Andrew ID: Time Limit: 80 Minutes

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Lecture #11: The Perceptron

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Combine the PA Algorithm with a Proximal Classifier

All lecture slides will be available at CSC2515_Winter15.html

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

Problem 1: Complexity of Update Rules for Logistic Regression

6.034 Quiz 2, Spring 2005

Evaluating Classifiers

COMS 4771 Support Vector Machines. Nakul Verma

A Brief Look at Optimization

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

Logistic Regression

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

Perceptron: This is convolution!

732A54/TDDE31 Big Data Analytics

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

10 Introduction to Distributed Computing

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

The exam is closed book, closed notes except your one-page cheat sheet.

Parallel learning of content recommendations using map- reduce

Instructor: Jessica Wu Harvey Mudd College

Announcements: projects

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Exam Review Session. William Cohen

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Multi-Class Logistic Regression and Perceptron

CS 229 Midterm Review

Distributed Machine Learning: An Intro. Chen Huang

15.1 Optimization, scaling, and gradient descent in Spark

CS 6453: Parameter Server. Soumya Basu March 7, 2017

Distributed Machine Learning" on Spark

PARALLELIZED IMPLEMENTATION OF LOGISTIC REGRESSION USING MPI

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

[3-5] Consider a Combiner that tracks local counts of followers and emits only the local top10 users with their number of followers.

Detecting Malicious URLs. Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker. Presented by Gaspar Modelo-Howard September 29, 2010.

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

10-701/15-781, Fall 2006, Final

FastText. Jon Koss, Abhishek Jindal

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Class 6 Large-Scale Image Classification

Linear Classification and Perceptron

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD

CMPT 882 Week 3 Summary

CS570: Introduction to Data Mining

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Machine Learning: Think Big and Parallel

Evaluating Classifiers

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD

3 Perceptron Learning; Maximum Margin Classifiers

Constrained optimization

Orange3 Educational Add-on Documentation

Data-Intensive Computing with MapReduce

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Natural Language Processing

ECS289: Scalable Machine Learning

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Function Algorithms: Linear Regression, Logistic Regression

Harp-DAAL for High Performance Big Data Computing

Network Traffic Measurements and Analysis

MapReduce Algorithms

Deep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers

Big Data Analytics CSCI 4030

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Support vector machines

Kernels and Clustering

MLlib and Distributing the " Singular Value Decomposition. Reza Zadeh

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Midterm Review. CS230 Fall 2018

Scaled Machine Learning at Matroid

CS 8520: Artificial Intelligence

To earn the extra credit, one of the following has to hold true. Please circle and sign.

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

ActiveClean: Interactive Data Cleaning For Statistical Modeling. Safkat Islam Carolyn Zhang CS 590

Structured Learning. Jun Zhu

Conflict Graphs for Parallel Stochastic Gradient Descent

Case Study 1: Estimating Click Probabilities

Logistic Regression. May 28, Decision boundary is a property of the hypothesis and not the data set e z. g(z) = g(z) 0.

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Transcription:

Practice Questions for Midterm - 10-605 Oct 14, 2015 (version 1) 10-605 Name: Fall 2015 Sample Questions Andrew ID: Time Limit: n/a Grade Table (for teacher use only) Question Points Score 1 6 2 6 3 15 4 6 5 4 6 5 7 4 8 4 9 8 Total: 58 1

10-605 Sample Questions - Page 2 of 5 10/16/2015 Review questions from previous years From http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2014-final.pdf questions 1.1-1.2, 1.5-1.7; 3.1-3.2, 3.4; 4; 5. From http://www.cs.cmu.edu/~wcohen/10-605/practice-questions/s2015-final.pdf questions 1, 4, 5, 10, 14, 15, 16. Parallel learning methods 1. (6 points) Recall that iterative parameter mixing (IPM) algorithm for perceptrons works as follows: First, divide the data into s shards, and initialize a weight vector w 0 to zero. Then, in each iteration t, run, in parallel, a perceptron for one pass over a single shard, starting with weight vector w t 1 ; and average the final weight vectors for each shard to create the next weight vector w t. Assume that the average is unweighted, i.e., uniform mixing. Mark the statements as true or false. The mistake bound for IPM for perceptrons shows that the number of iterations needed to converge does not depend on the number of shards. If the original perceptron algorithm makes at most m mistakes while training on the data, and there are s shards, then IPM for perceptrons will proveably make at most m mistakes during training. If the original perceptron algorithm makes at most m mistakes while training on the data, and there are s shards, then IPM for perceptrons will proveably make at most s m mistakes during training. 2. (6 points) Recall that the AllReduce operation combines a reduce operation with a broadcast operation. (a) AllReduce is useful in iterative parameter mixing (IPM). In one sentence, what part of IPM would it be useful for?

10-605 Sample Questions - Page 3 of 5 10/16/2015 (b) AllReduce typically communicates information along a k-ary spanning tree of worker nodes. In one or two sentences, what are the advantages of this, rather than communicating from each worker to a single central node? 3. (15 points) In the following scenarios, how will you perform the perceptron updates on given training data? (a) Number of training instances = 10,000 and dimension of feature vector = 20. (b) Number of training instances = 10,000,000 and dimension of feature vector = 20. (c) Number of training instances = 1,000 and dimension of feature vector = 10,000. Hashing and Stochastic gradient 4. (6 points) Mark the statements as true or false. Using stochastic gradient descent on logistic regression with a hashing trick is a way to learn classifers that are not linearly separable, because feature hashing is a type of kernel. The hashing trick reduces the memory required to store a classifier. The hashing trick makes it faster to apply a classifier to an instance. 5. (4 points) Mark the statements as true or false. DSGD for matrix factorization is an approximate version of matrix factorization using SGD. We cannot use the DSGD algorithm for matrix factorization if the entries in the matrix are negative (e.g. movie ratings from -5 to 5).

10-605 Sample Questions - Page 4 of 5 10/16/2015 6. (5 points) You joined a company which works on finding similar images. You started out with working on a cosine similarity based approach between the image pixel vectors. During this experiment, you found that it takes a lot of time to do this. Can you optimize on the time taken? Map-reduce 7. (4 points) In the default setting of Hadoop MapReduce jobs, which of the following are true for the input of a reducer? The values associated with a key appear in sorted order: i.e. each value is strictly larger than the previous value. Neither the keys nor values are in any predictable order. The keys given to a reducer are sorted but the values associated with each key are in no predictable order. 8. (4 points) In a MapReduce job with M mappers and N reducers, which is a better guess as to how many pairs of machines will transfer data? (Pick one) M+N: data will be copied from the M mappers to the head node, then from the head node to the N reducers. M*N: data will be transferred directly from each mapper to each reducer. 9. (8 points) Map-reduce implementation. Briefly describe how to use MapReduce pattern to compute the left Outer Join of two tables A, B by column c. Recall that the result of a left outer join for tables A and B always contains all records of the left table (A), even if the join-condition does not find any matching record in the right table (B). You can assume that c is a primary key i.e., for any value of c, there is either no tuple in A such that tuple.c has that value, or only one tuple in A that has that value. You can also assume that the mapper s input includes all tuples in A and B, that in each call to the mapper, the value will hold a tuple, and that the function fromtablea(tuple) is true iff tuple is from table A. Mapper:

10-605 Sample Questions - Page 5 of 5 10/16/2015 Reducer: