c SL&DM Hastie & Tibshirani Marc h 26, 2002 Sup ervised Learning: 29 ' $ Khan data BL EWS NB RMS & %
|
|
- Rafe Simpson
- 6 years ago
- Views:
Transcription
1 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 28 Classification of microarray samples Example: small round blue cell tumors; Khan et al, Nature Medicine, 2001 ffl Tumors classified as BL (Burkitt lymphoma), EWS (Ewing), NB (neuroblastoma) and RMS (rhabdomyosarcoma). ffl There are 63 training samples and 25 test samples, although five of the latter were not SRBCTs genes ffl Khan et al report zero training and test errors, using a complex neural network model. Decided that 96 genes were important". ffl Upon close examination, network is linear. It's essentially extracting linear principal components, and classifying in their subspace. ffl But even principal components is unnecessarily complicated for this problem!
2 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 29 Khan data BL EWS NB RMS
3 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 31 Class centroids BL EWS NB RMS Test sample Gene Gene Gene Gene Gene Average Expression Average Expression Average Expression Average Expression Average expression
4 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 32 Nearest Shrunken Centroids Idea: shrink each class centroid towards the overall centroid. First normalize by the within-class standard deviation for each gene. Details ffl Let x ij be the expression for genes i = 1; 2;:::p and samples j = 1; 2;:::n. ffl We have classes 1; 2;:::K, and let C k be indices of the n k samples in class k. ffl The ith component of the centroid for class k is μx ik = Pj2Ck x ij=n k, the mean expression value in class k for gene i; the ith component of the overall centroid is μx i = Pn j=1 x ij=n.
5 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 33 ffl Let d ik = (μx ik μx i )=s i where s i is the pooled within-class standard deviation for gene i: s 2 i = 1 n K X k X i2ck (x ij μx ik ) 2 : ffl Shrink each d ik towards zero, giving d 0 ik and new shrunken centroids or prototypes μx 0 ik = μx i + s i d 0 ik ffl The shrinkage is by soft-thresholding: (0,0) ffl Choose by cross-validation. d 0 ik = sign(d ik )(jd ik j ) +
6 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 34 K-Fold Cross-Validation Primary method for estimating a tuning parameter. Divide the data into K roughly equal parts Test Train Train Train Train ffl for each k = 1; 2;:::K, fit the model with parameter to the other K 1 parts, and compute its error in predicting the kth part. Average this error over the K parts to give the estimate CV ( ). ffl do this for many values of. Draw the curve CV ( ) and choose the value of that makes CV ( ) smallest. Typically we use K = 5 or 10.
7 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 35 Results Number of genes te te tr Error 0.4 cv te te te cv tr 0.2 te te tr cv te 0.0 te te cv cv cv te te te te te te te tr tr tr tr cv tr cv tr cv tr cv te te Amount of Shrinkage Delta
8 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 36 Advantages ffl Simple, includes nearest centroid classifier as a special case. ffl Thresholding denoises large effects, and sets small ones to zero, thereby selecting genes. ffl with more than two classes, method can select different genes, and different numbers of genes for each class.
9 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 37 The genes that matter BL EWS NB RMS
10 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 38 Estimated Class Probabilities Training Data 1.0 BL EWS NB RMS Probability Sample Test Data Probability O BL EWS NB RMS O O O O Sample
11 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 39 Class probabilities ffl For a test sample x Λ = (x Λ 1 ;xλ 2 ;:::xλ p). We define the discriminant score for class k ffi k (x Λ ) = px i=1 (x Λ i μx0 ik )2 s 2 i ffl The classification rule is then 2logß k C(x Λ ) = ` if ffi`(x Λ ) = min k ffi k (x Λ ) ffl estimates of the class probabilities, by analogy to Gaussian linear discriminant analysis, are ^p k (x Λ ) = P e 2 1 ffi k(x Λ ) K`=1 e 1 2 ffi`(xλ ) ffl Still very simple. In statistical parlance, this is a restricted version of a naive Bayes classifier (also called idiot's Bayes!)
12 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 40 Adaptive threshold scaling ffl idea: define class-dependent scaling factors k for each class: d ik = μx ik μx i m k k s i : (1) ffl Use smaller factors for hard-to-classify classes => same test error with fewer total number of genes ffl Adaptive procedure: start with all k = 1, and then reduce k by 10 for the class k with largest area under training error curve. ffl repeat 20 times and choose solution with smallest area under curve for all classes ffl can dramatically reduce total number of genes used, without increasing error rate
13 SLDM cflhastie Tibshirani March 26, 2002 Supervised Learning: 41 Lymphoma data Scaling factors changed from (1; 1; 1) to (1:9; 1; 1:5) Error Error Size te tetetete tetetetetetetetetetetetetetetetetetetetetetetetetete te tr tr tr tr tr tetetetetete tr tr tr tr tr tr tetetete tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr Amount of Shrinkage Size te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr te tr Amount of Shrinkage
Supervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationClassification by Nearest Shrunken Centroids and Support Vector Machines
Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationEstimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification
1 Estimating Error-Dimensionality Relationship for Gene Expression Based Cancer Classification Feng Chu and Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological niversity Singapore
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationChapter 6: Linear Model Selection and Regularization
Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the
More informationClassification with PAM and Random Forest
5/7/2007 Classification with PAM and Random Forest Markus Ruschhaupt Practical Microarray Analysis 2007 - Regensburg Two roads to classification Given: patient profiles already diagnosed by an expert.
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationMachine Learning. Classification
10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationUnsupervised Learning
Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision
More informationClassification of High Dimensional Data By Two-way Mixture Models
Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department The Pennsylvania State University 1 Outline Goals Two-way mixture model approach Background: mixture discriminant
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationTitle: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data
Supplementary material for Manuscript BIOINF-2005-1602 Title: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data Appendix A. Testing K-Nearest Neighbor and Support
More informationFinal Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm
Final Exam Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm Instructions: you will submit this take-home final exam in three parts. 1. Writeup. This will be a complete
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationTUBE: Command Line Program Calls
TUBE: Command Line Program Calls March 15, 2009 Contents 1 Command Line Program Calls 1 2 Program Calls Used in Application Discretization 2 2.1 Drawing Histograms........................ 2 2.2 Discretizing.............................
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationBayes Classifiers and Generative Methods
Bayes Classifiers and Generative Methods CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Stages of Supervised Learning To
More informationSD 372 Pattern Recognition
SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationClass Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays
Statistical Science 2003, Vol. 18, No. 1, 104 117 Institute of Mathematical Statistics, 2003 Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays Robert Tibshirani, Trevor
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationSVM Classification in -Arrays
SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationPattern Recognition ( , RIT) Exercise 1 Solution
Pattern Recognition (4005-759, 20092 RIT) Exercise 1 Solution Instructor: Prof. Richard Zanibbi The following exercises are to help you review for the upcoming midterm examination on Thursday of Week 5
More informationRandom Forests and Boosting
Random Forests and Boosting Tree-based methods are simple and useful for interpretation. However they typically are not competitive with the best supervised learning approaches in terms of prediction accuracy.
More informationQuiz Section Week 8 May 17, Machine learning and Support Vector Machines
Quiz Section Week 8 May 17, 2016 Machine learning and Support Vector Machines Another definition of supervised machine learning Given N training examples (objects) {(x 1,y 1 ), (x 2,y 2 ),, (x N,y N )}
More informationInteractive Text Mining with Iterative Denoising
Interactive Text Mining with Iterative Denoising, PhD kegiles@vcu.edu www.people.vcu.edu/~kegiles Assistant Professor Department of Statistics and Operations Research Virginia Commonwealth University Interactive
More informationDI TRANSFORM. The regressive analyses. identify relationships
July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationVECTOR SPACE CLASSIFICATION
VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture
More informationSupervised Clustering of Yeast Gene Expression Data
Supervised Clustering of Yeast Gene Expression Data In the DeRisi paper five expression profile clusters were cited, each containing a small number (7-8) of genes. In the following examples we apply supervised
More informationAutomatic clustering based on an information-theoretic approach with application to spectral anomaly detection
Automatic clustering based on an information-theoretic approach with application to spectral anomaly detection Mark J. Carlotto 1 General Dynamics, Advanced Information Systems Abstract An information-theoretic
More informationModule 4. Non-linear machine learning econometrics: Support Vector Machine
Module 4. Non-linear machine learning econometrics: Support Vector Machine THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction When the assumption of linearity
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More information4. Feedforward neural networks. 4.1 Feedforward neural network structure
4. Feedforward neural networks 4.1 Feedforward neural network structure Feedforward neural network is one of the most common network architectures. Its structure and some basic preprocessing issues required
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationRank Measures for Ordering
Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationLasso. November 14, 2017
Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................
More informationMachine Learning Implementation in live-cell tracking
Machine Learning Implementation in live-cell tracking Bo Gu Dec.1th 14 Abstract While quantitative biology has gradually become the major trend of biology, researchers have put their eyes on analysis tools
More informationCANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.
CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationMachine learning techniques for binary classification of microarray data with correlation-based gene selection
Machine learning techniques for binary classification of microarray data with correlation-based gene selection By Patrik Svensson Master thesis, 15 hp Department of Statistics Uppsala University Supervisor:
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationData Analytics. Qualification Exam, May 18, am 12noon
CS220 Data Analytics Number assigned to you: Qualification Exam, May 18, 2014 9am 12noon Note: DO NOT write any information related to your name or KAUST student ID. 1. There should be 12 pages including
More information7 Techniques for Data Dimensionality Reduction
7 Techniques for Data Dimensionality Reduction Rosaria Silipo KNIME.com The 2009 KDD Challenge Prediction Targets: Churn (contract renewals), Appetency (likelihood to buy specific product), Upselling (likelihood
More informationPackage impute. April 12, Index 5. A function to impute missing expression data
Title impute: Imputation for microarray data Version 1.52.0 Package impute April 12, 2018 Author Trevor Hastie, Robert Tibshirani, Balasubramanian Narasimhan, Gilbert Chu Description Imputation for microarray
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationThe Basics of Decision Trees
Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting
More informationGene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients
1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public
More informationClaNC: The Manual (v1.1)
ClaNC: The Manual (v1.1) Alan R. Dabney June 23, 2008 Contents 1 Installation 3 1.1 The R programming language............................... 3 1.2 X11 with Mac OS X....................................
More informationPattern Classification Algorithms for Face Recognition
Chapter 7 Pattern Classification Algorithms for Face Recognition 7.1 Introduction The best pattern recognizers in most instances are human beings. Yet we do not completely understand how the brain recognize
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on supervised learning
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationEmpowering Multiple Instance Histopathology Cancer Diagnosis by Cell Graphs
Empowering Multiple Instance Histopathology Cancer Diagnosis by Cell Graphs Anonymous Authors No Institute Given Abstract. We introduce a probabilistic classifier that combines multiple instance learning
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationCART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology
CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.
More informationMicroarray Analysis Classification by SVM and PAM
Microarray Analysis Classification by SVM and PAM Rainer Spang and Florian Markowetz Practical Microarray Analysis 2003 Max-Planck-Institute for Molecular Genetics Dept. Computational Molecular Biology
More informationClassification by Support Vector Machines
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III
More informationRESAMPLING METHODS. Chapter 05
1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation
More informationInterpretable Dimension Reduction for Classifying Functional Data
Interpretable Dimension Reduction for Classifying Functional Data TIAN SIVA TIAN GARETH M JAMES Abstract Classification problems involving a categorical class label Y and a functional predictor X(t) are
More informationCS6716 Pattern Recognition
CS6716 Pattern Recognition Prototype Methods Aaron Bobick School of Interactive Computing Administrivia Problem 2b was extended to March 25. Done? PS3 will be out this real soon (tonight) due April 10.
More informationClustering and Classification. Basic principles of clustering. Clustering. Classification
Classification Clustering and Classification Task: assign objects to classes (groups) on the basis of measurements made on the objects Jean Yee Hwa Yang University of California, San Francisco http://www.biostat.ucsf.edu/jean/
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More informationCSE 446 Bias-Variance & Naïve Bayes
CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework
More informationStatistics 202: Statistical Aspects of Data Mining
Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 11 = Chapter 8 Agenda: 1)Reminder about final exam 2)Finish Chapter 5 3)Chapter 8 1 Class Project The class project is due
More informationEmpowering Multiple Instance Histopathology Cancer Diagnosis by Cell Graphs
Empowering Multiple Instance Histopathology Cancer Diagnosis by Cell Graphs Melih Kandemir 1, Chong Zhang 2, Fred A. Hamprecht 1 1 Heidelberg University HCI/IWR, Germany 2 CellNetworks, Heidelberg University,
More information