Artificial Neural Networks (Feedforward Nets)

Size: px
Start display at page:

Download "Artificial Neural Networks (Feedforward Nets)"

Transcription

1 Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x Spring 1

2 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x n Spring 2

3 Linear Classifier Single Perceptron Unit h( x ) = #( w! x + b) " #( w! x) %( z) # 1 z $ 0 = "! 0 else x 2 w y x w 0 w 1 w n x 1 w 2 w 3 x 0 =1 x 1 x 2 x 3... x n Spring 3

4 Beyond Linear Separability 1 0 Not linearly separable Spring 4

5 Beyond Linear Separability Not linearly separable Spring 5

6 Beyond Linear Separability Not linearly separable Spring 6

7 Multi-Layer Perceptron x x 1 x 2 y x 3 y 1 y 2 x u input hidden hidden output More powerful than single layer. Lower layers transform the input problem into more tractable (linearly separable) problems for subsequent layers Spring 7

8 XOR Problem o 1 Not linearly separable o 2 o 1 w 11 x w 1 13 w 12 w 21 x 2 w 22 w 01-1 w 02-1 w 23 o 2 w 03-1 y w 01 = 3/2 w 11 = w 12 = 1 x 1 x 2 o 1 w 02 = 1/2 w 21 = w 22 = Spring 8

9 XOR Problem o 1 Not linearly separable o 2 o 1 w 11 x w 1 13 w 12 w 21 x 2 w 22 w 01-1 w 02-1 w 23 o 2 w 03-1 y w 01 = 3/2 w 11 = w 12 = 1 x 1 x 2 o 1 o 2 w 02 = 1/2 w 21 = w 22 = Spring 9

10 XOR Problem o 1 Not linearly separable o 2 o 1 w 11 x w 1 13 w 12 w 21 x 2 w 22 w 01-1 w 02-1 w 23 o 2 w 03-1 y w 01 = 3/2 w 11 = w 12 = 1 w 02 = 1/2 w 21 = w 22 = 1 w 03 = 1/2 w 31 = -1, w 32 = 1 x x o o y o Linearly separable o Spring 10

11 Multi-Layer Perceptron Learning Any set of training points can be separated by a three-layer perceptron network. Almost any set of points separable by two-layer perceptron network. But, no efficient learning rule is known Spring 11

12 Multi-Layer Perceptron Learning Any set of training points can be separated by a three-layer perceptron network. Almost any set of points separable by two-layer perceptron network. But, no efficient learning rule is known. May need an exponential number of units. Two hidden layers and one output layer One hidden layer and one output layer Spring 12

13 Multi-Layer Perceptron Learning Any set of training points can be separated by a three-layer perceptron network. Almost any set of points separable by two-layer perceptron network. But, no efficient learning rule is known. Could we use gradient ascent/descent? We would need smoothness: small change in weights produces small change in output. Threshold function is not smooth Spring 13

14 Multi-Layer Perceptron Learning Any set of training points can be separated by a three-layer perceptron network. Almost any set of points separable by two-layer perceptron network. But, no efficient learning rule is known. Could we use gradient ascent/descent? We would need smoothness: small change in weights produces small change in output. Threshold function is not smooth. Use a smooth threshold function! Spring 14

15 Sigmoid Unit y s(z) w w 0 w w n z = 1 n! wixi s( z) = " z i 1 + e -1 x 1 x 2 x n Spring 15

16 Sigmoid Unit y y w 0 w 1 w 2 x 1-1 x 2 y=1/2 x 1 x Spring 16

17 Training y z 3 y( x, w) w is a vector of weights x is a vector of inputs -1 z 1 w 01 w 03 w 13 y 1 w 23 y 2 w 21 w 22 z 2 w 02-1 w 11 w 12-1 x 1 x 2 y = z 1 z 2 s( w13s( w11x1 + w21x2! w01) + w23s( w12x1 + w22x2! w02)! w03) z Spring 17

18 Training y z 3 y( x, w) w is a vector of weights x is a vector of inputs y i is desired output: Error over the training set for a given weight vector: E = 1 2 i!( y( x, w) " i Our goal is to find weight vector that minimizes error y z 1 z 2 i w 01 2 ) -1 z 1 w 03 w 13 y 1 w 23 y 2 w 21 w 22 z 2 w 02-1 w 11 w 12-1 x 1 x 2 y = s( w13s( w11x1 + w21x2! w01) + w23s( w12x1 + w22x2! w02)! w03) z Spring 18

19 E " ( w w = E 1 2 = Gradient Descent i!( y( x, w) " i w # w " $! we y i 2 ) i i i!( y( x, w) # y )" wy( x, w) i & ' y ' y y = $,, %' w1 ' w n #! " Error on training set Gradient of Error Gradient Descent Spring 19

20 Training Neural Nets without overfitting, hopefully Given: Data set, desired outputs and a neural net with m weights. Find a setting for the weights that will give good predictive performance on new data. Estimate expected performance on new data. 1. Split data set (randomly) into three subsets: Training set used for picking weights Validation set used to stop training Test set used to evaluate performance 2. Pick random, small weights as initial values 3. Perform iterative minimization of error over training set. 4. Stop when error on validation set reaches a minimum (to avoid overfitting). 5. Repeat training (from step 2) several times (avoid local minima) 6. Use best weights to compute error on test set, which is estimate of performance on new data. Do not repeat training to improve this. Can use cross-validation if data set is too small to divide into three subsets Spring 50

21 Training vs. Test Error Spring 51

22 Training vs. Test Error Spring 52

23 Overfitting is not unique to neural nets 1-Nearest Neighbors Decision Trees Spring 53

24 Overfitting in SVM Radial Kernel!=0.1 Radial Kernel!= Spring 54

25 On-line vs off-line There are two approaches to performing the error minimization: On-line training present x i and y i* (chosen randomly from the training set). Change the weights to reduce the error on this instance. Repeat. Off-line training change weights to reduce the total error on training set (sum over all instances). On-line training is an approximation to gradient descent since the gradient based on one instance is noisy relative to the full gradient (based on all instances). This can be beneficial in pushing the system out of shallow local minima Spring 55

26 Feature Selection In many machine learning applications, there are huge numbers of features text classification (# words) gene arrays (5,000 50,000) images (512 x 512 pixels) Spring 03 1

27 Feature Selection In many machine learning applications, there are huge numbers of features text classification (# words) gene arrays (5,000 50,000) images (512 x 512 pixels) Too many features make algorithms run slowly risk overfitting Spring 03 2

28 Feature Selection In many machine learning applications, there are huge numbers of features text classification (# words) gene arrays (5,000 50,000) images (512 x 512 pixels) Too many features make algorithms run slowly risk overfitting Find a smaller feature space subset of existing features new features constructed from old ones Spring 03 3

29 Spring 03 4 Feature Ranking For each feature, compute a measure of its relevance to the output Choose the k features with the highest rankings Correlation between feature j and output Correlation measures how much x tends to deviate from its mean on the same examples on which y deviates from its mean!!! " " " " = i i i j i j i i j i j y y x x y y x x j R 2 2 ) ( ) ( ) )( ( ) (! = i i j x j n x 1! = i i y n y 1

30 Correlations in Heart Data Top 15 of 25 thal= cp= thal= ca= oldpeak 0.42 thalach exang 0.42 slope= slope= cp= sex 0.28 ca= cp= ca= age ca = 0 no 0 1 thal = 1 yes ca=0 exang chest-pain age < oldpk< Spring 03 5

31 Correlations in MPG > 22 data cyl= displacement weight horsepower cyl= origin= model-year 0.44 origin= cyl= acceleration 0.35 origin= displacement > weight > year > 78.5 weight > Spring 03 6

32 XOR Bites Back As usual, functions with XOR in them will cause us trouble Each feature will, individually, have a correlation of 0 (it occurs positively as much as negatively for positive outputs) To solve XOR, we need to look at groups of features together Spring 03 7

33 Subset Selection Consider subsets of variables too hard to consider all possible subsets wrapper methods: use training set or crossvalidation error to measure the goodness of using different feature subsets with your classifier greedily construct a good subset by adding or subtracting features one by one Spring 03 8

34 Forward Selection Given a particular classifier you want to use F = {} For each f j Train classifier with inputs F + {f j } Add f j that results in lowest-error classifier to F Continue until F is the right size, or error has quit decreasing Spring 03 9

35 Forward Selection Given a particular classifier you want to use F = {} For each f j Train classifier with inputs F + {f j } Add f j that results in lowest-error classifier to F Continue until F is the right size, or error has quit decreasing Decision trees, by themselves, do something similar to this Spring 03 10

36 Forward Selection Given a particular classifier you want to use F = {} For each f j Train classifier with inputs F + {f j } Add f j that results in lowest-error classifier to F Continue until F is the right size, or error has quit decreasing Decision trees, by themselves, do something similar to this Trouble with XOR Spring 03 11

37 Backward Elimination Given a particular classifier you want to use F = all features For each f j Train classifier with inputs F - {f j } Remove f j that results in lowest-error classifier from F Continue until F is the right size, or error increases too much Spring 03 12

38 Forward Selection on Auto Data crossvalidation accuracy Forward Selection - Auto way X-val Accuracy Series Number Features number of features added Spring 03 13

39 Backward Elimination on Auto Data crossvalidation accuracy Backward Selection - Auto way X-val Accuracy Series Number of Features Eliminated number of features eliminated Spring 03 14

40 Forward Selection on Heart Data crossvalidation accuracy Forward Selection - Hear way X-val Accuracy Series Number of features number of features added Spring 03 15

41 Backward Elimination on Heart Data Backward Elimination - Heart crossvalidation accuracy way X-val Accuray Number of features eliminated number of features eliminated Spring 03 16

42 Recursive Feature Elimination Train a linear SVM or neural network Remove the feature with the smallest weight Repeat More efficient than regular backward elimination Requires only one training phase per feature Spring 03 17

43 Clustering Form clusters of inputs Map the clusters into outputs Given a new example, find its cluster, and generate the associated output Spring 03 18

44 Clustering Form clusters of inputs Map the clusters into outputs Given a new example, find its cluster, and generate the associated output Spring 03 19

45 Clustering Form clusters of inputs Map the clusters into outputs Given a new example, find its cluster, and generate the associated output Spring 03 20

46 Clustering Criteria small distances between points within a cluster large distances between clusters Need a distance measure, as in nearest neighbor Spring 03 21

47 Tries to minimize K-Means Clustering k $ $ j=1 i#s j x i " µ j 2 squared dist from point to mean # of clusters elements of cluster j mean of elts in cluster j Only gets, greedily, to a local optimum Spring 03 22

48 Choose k K-means Algorithm Randomly choose k points Cj to be cluster centers Spring 03 23

49 Choose k K-means Algorithm Randomly choose k points Cj to be cluster centers Loop Partition the data into k classes Sj according to which of the Cj they re closest to For each Sj, compute the mean of its elements and let that be the new cluster center Spring 03 24

50 Choose k K-means Algorithm Randomly choose k points Cj to be cluster centers Loop Partition the data into k classes Sj according to which of the Cj they re closest to For each Sj, compute the mean of its elements and let that be the new cluster center Stop when centers quit moving Guaranteed to terminate Spring 03 25

51 Choose k K-means Algorithm Randomly choose k points Cj to be cluster centers Loop Partition the data into k classes Sj according to which of the Cj they re closest to For each Sj, compute the mean of its elements and let that be the new cluster center Stop when centers quit moving Guaranteed to terminate If a cluster becomes empty, re-initialize the center Spring 03 26

52 K-Means Example Spring 03 27

53 K-Means Example Spring 03 28

54 K-Means Example Spring 03 29

55 K-Means Example Spring 03 30

56 K-Means Example Spring 03 31

57 K-Means Example Spring 03 32

58 K-Means Example Spring 03 33

59 K-Means Example Spring 03 34

60 K-Means Example Spring 03 35

61 K-Means Example Spring 03 36

62 Principal Components Analysis Given an n-dimensional real-valued space, data are often nearly restricted to a lower-dimensional subspace PCA helps us find such a subspace whose coordinates are linear functions of the originals Spring 03 37

63 Principal Components Analysis Given an n-dimensional real-valued space, data are often nearly restricted to a lower-dimensional subspace PCA helps us find such a subspace whose coordinates are linear functions of the originals Spring 03 38

64 Principal Components Analysis Given an n-dimensional real-valued space, data are often nearly restricted to a lower-dimensional subspace PCA helps us find such a subspace whose coordinates are linear functions of the originals botany/ordinate/pca.htm Spring 03 39

65 Cartoon of algorithm Normalize the data (subtract mean, divide by stdev) Spring 03 40

66 Cartoon of algorithm Normalize the data (subtract mean, divide by stdev) Find the line along which the data has the most variability: that s the first principal component Spring 03 41

67 Cartoon of algorithm Normalize the data (subtract mean, divide by stdev) Find the line along which the data has the most variability: that s the first principal component Project the data into the n-1 dimensional space orthogonal to the line Repeat Spring 03 42

68 Cartoon of algorithm Normalize the data (subtract mean, divide by stdev) Find the line along which the data has the most variability: that s the first principal component Project the data into the n-1 dimensional space orthogonal to the line Repeat Result is a new orthogonal set of axes First k give a lower-d space that represents the variability of the data as well as possible Spring 03 43

69 Cartoon of algorithm Normalize the data (subtract mean, divide by stdev) Find the line along which the data has the most variability: that s the first principal component Project the data into the n-1 dimensional space orthogonal to the line Repeat Result is a new orthogonal set of axes First k give a lower-d space that represents the variability of the data as well as possible Really: find the eigenvectors of the covariance matrix with the k largest eigenvalues Spring 03 44

70 Linear Transformations Only There are fancier methods that can find this structure Spring 03 45

71 Insensitive to Classification Task Spring 03 46

72 Insensitive to Classification Task There are fancier methods that can take class into account Spring 03 47

73 Validating a Classifier predicted y 0 1 true y 0 A B 1 C D Spring 03 48

74 Validating a Classifier true y predicted y A B 1 C D false positive type 1 error Spring 03 49

75 Validating a Classifier false negative type 2 error true y predicted y A B 1 C D false positive type 1 error Spring 03 50

76 false negative type 2 error Validating a Classifier true y predicted y A B 1 C D false positive type 1 error sensitivity: P(predict 1 actual 1) = D/(C+D) true positive rate (TP) Spring 03 51

77 false negative type 2 error Validating a Classifier true y predicted y A B 1 C D false positive type 1 error sensitivity: P(predict 1 actual 1) = D/(C+D) true positive rate (TP) specificity: P(predict 0 actual 0) = A/(A+B) Spring 03 52

78 false negative type 2 error Validating a Classifier true y predicted y A B 1 C D false positive type 1 error sensitivity: P(predict 1 actual 1) = D/(C+D) true positive rate (TP) specificity: P(predict 0 actual 0) = A/(A+B) false-alarm rate: P(predict 1 actual 0) = B/(A+B) false positive rate (FP) Spring 03 53

79 Cost Sensitivity Predict whether a patient has pseuditis based on blood tests Disease is often fatal if left untreated Treatment is cheap and side-effect free Spring 03 54

80 Cost Sensitivity Predict whether a patient has pseuditis based on blood tests Disease is often fatal if left untreated Treatment is cheap and side-effect free Which classifier to use? Classifier 1: TP = 0.9, FP = Spring 03 55

81 Cost Sensitivity Predict whether a patient has pseuditis based on blood tests Disease is often fatal if left untreated Treatment is cheap and side-effect free Which classifier to use? Classifier 1: TP = 0.9, FP = 0.4 Classifier 2: TP = 0.7, FP = Spring 03 56

82 Build Costs into Classifier Assess costs of both types of error use a different splitting criterion for decision trees make error function for neural nets asymmetric; different costs for each kind of error use different values of C for SVMs depending on kind of error Spring 03 57

83 Tunable Classifiers Classifiers that have a threshold (naïve Bayes, neural nets, SVMs) can be adjusted, post learning, by changing the threshold, to make different tradeoffs between type 1 and type 2 errors Spring 03 58

84 Tunable Classifiers Classifiers that have a threshold (naïve Bayes, neural nets, SVMs) can be adjusted, post learning, by changing the threshold, to make different tradeoffs between type 1 and type 2 errors C 1,C 2 : costs of errors P: percentage of positive examples x: tunable threshold TP(x): true positive rate at threshold x FP(x): false positive rate at threshold x Expected Cost = C 2 P(1-TP(x)) + C 1 (1-P)FP(x) choose x to minimize expected cost Spring 03 59

85 ROC Curves receiver operating characteristics ideal 1 TP 0 FP Spring 03 60

86 ROC Curves receiver operating characteristics ideal 1 always output 1 TP 0 always output 0 FP Spring 03 61

87 ROC Curves receiver operating characteristics ideal 1 always output 1 TP parametric function of x 0 always output 0 FP Spring 03 62

88 ROC Curves receiver operating characteristics ideal 1 always output 1 TP blue curve dominates red 0 always output 0 FP Spring 03 63

89 Many more issues! Missing data Many examples in one class, few in other (fraud detection) Expensive data (active learning) Spring 03 64

6.034 Notes: Section 8.1

6.034 Notes: Section 8.1 6.034 Notes: Section 8.1 Slide 8.1.1 There is no easy way to characterize which particular separator the perceptron algorithm will end up with. In general, there can be many separators for a data set.

More information

Dimensionality Reduction, including by Feature Selection.

Dimensionality Reduction, including by Feature Selection. Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models

Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models Predicting Diabetes and Heart Disease Using Diagnostic Measurements and Supervised Learning Classification Models Kunal Sharma CS 4641 Machine Learning Abstract Supervised learning classification algorithms

More information

5 Learning hypothesis classes (16 points)

5 Learning hypothesis classes (16 points) 5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Examine each window of an image Classify object class within each window based on a training set images Example: A Classification Problem Categorize

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions ENEE 739Q: STATISTICAL AND NEURAL PATTERN RECOGNITION Spring 2002 Assignment 2 Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions Aravind Sundaresan

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Examine each window of an image Classify object class within each window based on a training set images Slide credits for this chapter: Frank

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Classification. Slide sources:

Classification. Slide sources: Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

Tutorials Case studies

Tutorials Case studies 1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Why MultiLayer Perceptron/Neural Network? Objective: Attributes:

Why MultiLayer Perceptron/Neural Network? Objective: Attributes: Why MultiLayer Perceptron/Neural Network? Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Supervised vs unsupervised clustering

Supervised vs unsupervised clustering Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

More information

5 Machine Learning Abstractions and Numerical Optimization

5 Machine Learning Abstractions and Numerical Optimization Machine Learning Abstractions and Numerical Optimization 25 5 Machine Learning Abstractions and Numerical Optimization ML ABSTRACTIONS [some meta comments on machine learning] [When you write a large computer

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial

More information

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R. Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric

More information

CMPT 882 Week 3 Summary

CMPT 882 Week 3 Summary CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being

More information

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

Features: representation, normalization, selection. Chapter e-9

Features: representation, normalization, selection. Chapter e-9 Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 14 Python Exercise on knn and PCA Hello everyone,

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Bioinformatics - Lecture 07

Bioinformatics - Lecture 07 Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

A Systematic Overview of Data Mining Algorithms

A Systematic Overview of Data Mining Algorithms A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning

More information

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3 Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning Partitioning Data IRDS: Evaluation, Debugging, and Diagnostics Charles Sutton University of Edinburgh Training Validation Test Training : Running learning algorithms Validation : Tuning parameters of learning

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

CS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016

CS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016 CS 231A CA Session: Problem Set 4 Review Kevin Chen May 13, 2016 PS4 Outline Problem 1: Viewpoint estimation Problem 2: Segmentation Meanshift segmentation Normalized cut Problem 1: Viewpoint Estimation

More information

Fall 09, Homework 5

Fall 09, Homework 5 5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

DI TRANSFORM. The regressive analyses. identify relationships

DI TRANSFORM. The regressive analyses. identify relationships July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15 Perceptrons and Backpropagation Fabio Zachert Cognitive Modelling WiSe 2014/15 Content History Mathematical View of Perceptrons Network Structures Gradient Descent Backpropagation (Single-Layer-, Multilayer-Networks)

More information

2. On classification and related tasks

2. On classification and related tasks 2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

More information

Please write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.

Please write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours. CS 189 Spring 016 Introduction to Machine Learning Final Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your two-page cheat sheet. Electronic

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Machine Learning with MATLAB --classification

Machine Learning with MATLAB --classification Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which

More information

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning CS325 Artificial Intelligence Cengiz Spring 2013 Unsupervised Learning Missing teacher No labels, y Just input data, x What can you learn with it? Unsupervised Learning Missing teacher No labels, y Just

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

IBL and clustering. Relationship of IBL with CBR

IBL and clustering. Relationship of IBL with CBR IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed

More information

3 Nonlinear Regression

3 Nonlinear Regression 3 Linear models are often insufficient to capture the real-world phenomena. That is, the relation between the inputs and the outputs we want to be able to predict are not linear. As a consequence, nonlinear

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu Chiang Frank Wang 王鈺強, Associate Professor

More information

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS535 Big Data Fall 2017 Colorado State University   10/10/2017 Sangmi Lee Pallickara Week 8- A. CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE

More information

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen Lecture 2: Feature selection Feature Selection feature selection (also called variable selection): choosing k < d important

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

4. Feedforward neural networks. 4.1 Feedforward neural network structure

4. Feedforward neural networks. 4.1 Feedforward neural network structure 4. Feedforward neural networks 4.1 Feedforward neural network structure Feedforward neural network is one of the most common network architectures. Its structure and some basic preprocessing issues required

More information

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information