Introduction to R and Statistical Data Analysis
|
|
- Darren Emery Miller
- 6 years ago
- Views:
Transcription
1 Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu
2 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor, etc. Principle component analysis and clustering (9) PCA, k-means clustering, hierarchical clustering Random numbers (10) random number generators, distributions Statistical tests (11) t-test, Wilcoxon test, multiple test correction. ANOVA and Linear regression (12) ANOVA, linear regression Look Look for for corresponding scripts scripts at at
3 8. DESCRIPTIVE STATISTICS IN R Center, Variation, Dependency
4 9. PCA AND CLUSTERING 9.1. Iris Data from R.A.Fisher The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Aylmer Fisher (1936) as an example of discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the geographic variation of Iris flowers in the Gaspé Peninsula. The dataset consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample, they are the length and the width of sepal and petal, in centimeters. Based on the combination of the four features, Fisher developed a linear discriminant model to distinguish the species from each other. Iris setosa Iris versicolor Iris virginica
5 9. PCA AND CLUSTERING 9.1. Data Presentation iris str(iris) ## plot iris data x11() plot(iris[,-5]) plot(iris[,-5], col = iris[,5]) Sepal.Length Sepal.Width Petal.Length Petal.Width
6 9. PCA AND CLUSTERING 9.2 Principle Component Analysis (PCA) Principal Principal component component analysis analysis (PCA) (PCA) is is a vector vector space space transform transform used used to to reduce reduce multidimensional multidimensional data data sets sets to to lower lower dimensions dimensions for for analysis. analysis. It It selects selects the the coordinates coordinates along along which which the the variation variation of of the the data data is is bigger. bigger genes 2 dimensions For the simplicity let us consider 2 parametric situation both in terms of data and resulting PCA. Scatter plot in natural coordinates Scatter plot in PC Variable 2 Second component Variable 1 First component Instead of using 2 natural parameters for the classification, we can use the first component! nt!
7 9. PCA AND CLUSTERING 9.2. Data Transformation for PCA Data = as.matrix(iris[,-5]) row.names(data) = as.character(iris[,5]) classes = as.integer(iris[,5]) ## plot data in 3d library(scatterplot3d) x11() scatterplot3d(iris[,1],iris[,2],iris[,3], pch=19,color=classes, main = "Iris", xlab = names(iris)[1], ylab = names(iris)[2], zlab = names(iris)[3]) legend(4,7,levels(iris$species), col=c(1,2,3),pch=19) Petal.Length setosa versicolor virginica Iris Sepal.Width Sepal.Length
8 9. PCA AND CLUSTERING 9.2. PCA
9 9. PCA AND CLUSTERING 9.3. k-means Clustering k-means k-means Clustering Clustering k-means k-means clustering clustering is is a method method of of cluster cluster analysis analysis which which aims aims to to partition ppartition n observations observations into into k k clusters clusters in in which which each each observation observation belongs belongs to to the the cluster clusterwith with the the nearest nearest mean. mean. 1) k initial "means" (in this case k=3) are randomly selected from the data set (shown in color). 2) k clusters are created by associating every observation with the nearest mean. 3) The centroid of each of the k clusters becomes the new means. 4) Steps 2 and 3 are repeated until convergence has been reached.
10 clusters = kmeans(x=data,centers=3,nstart=10)$cluster x11() plot(pc$x[,1],pc$x[,2],col = classes,pch=clusters) legend(2,1.4,levels(iris$species),col=c(1,2,3),pch=19) legend(-2.5,1.4,c("c1","c2","c3"),col=4,pch=c(1,2,3)) 9. PCA AND CLUSTERING 9.3. k-means Clustering PC$x[, 2] c1 c2 c3 setosa versicolor virginica PC$x[, 1]
11 9. PCA AND CLUSTERING 9.4. Hierarchical Clustering Hierarchical Hierarchical Clustering Clustering Hierarchical Hierarchical clustering clustering creates creates a hierarchy hierarchy of of clusters clusters which which may may y be be represented represented in in a tree tree structure structure called called a dendrogram. dendrogram. The The root root of of the the tree tree consists consists of of a single single cluster cluster containing containing all all observations, observations, and and the the leaves leaves correspond correspond to to individual individual observations. observations. Algorithms Algorithms for for hierarchical hierarchical clustering clustering are are generally generally either either agglomerative, agglomerative,, in in which which one one starts starts at at the the leaves leaves and and successively successively merges merges clusters clusters together; together; or or divisive, divisive,, in in which which one one starts starts at at the the root root and and recursively recursively splits splits the the clusters. clusters. Elements Agglomerative Divisive Dendrogram Distance: Euclidean
12 9. PCA AND CLUSTERING 9.4. Hierarchical Clustering ## use heatmap heatmap(data) ## use heatmap with colors color = character(length(classes)) color[classes == 1] = "black" color[classes == 2] = "red" color[classes == 3] = "green" heatmap(data,rowsidecolors=color) Iris setosa Iris versicolor Iris virginica
13 9. PCA AND CLUSTERING 9.5. Example: Task 8a Acute lymphoblastic leukemia (ALL), is a form of leukemia, or cancer of the white blood cells characterized by excess lymphoblasts. all_data.xls contains the results of full-trancript profiling for ALL patients and healthy donors using Affymetrix microarrays. The data were downloaded from ArrayExpress repository and normalized. The expression values in the table are in log 2 scale.
14 10. RANDOM NUMBERS AND DISTRIBUTIONS See Source Code
15 11. STATISTICAL TESTS See Source Code
16 12. ANOVA and LINEAR REGRESSION Why ANOVA? Means for for more than 2 populations We We have measurements for for 5 conditions. Are Are the the means for for these conditions equal? If we would use pairwise comparisons, what will be the probability of getting error? 5 5! Number of comparisons: C2 = = 10 2!3! Probability of an error: : 1 (0.95) 1 10 = 0.4 Validation of of the the effects We We assume that that we we have several factors affecting our our data. Which factors are are most significant? Which can can be be neglected? ANOVA example from Partek
17 12. ANOVA and LINEAR REGRESSION Meaning of ANOVA H 0 : 0 : µ 1 = 1 µ 2 = 2 µ 3 3 H a : a : not not all all 3 means means are are equal equal Depression level m 1 m 2 m Measures
18 12. ANOVA and LINEAR REGRESSION ANOVA in R: Fast and Simple salaries.txt
19 12. ANOVA and LINEAR REGRESSION Regression Model and Regression Line Regression Regression model model The The equation equation describing describing how how y y is is related related to to x x and and an an error error term; term; in in simple simple linear linear regression, regression, the the regression regression model model is is y = β 0 β + 0 β 1 βx 1 + εε Regression Regression equation equation The The equation equation that that describes describes how how the the mean mean or or expected expected value value of of the the dependent dependent variable variable is is related related to to the the independent independent variable; variable; in in simple simple linear linear regression, regression, E(y) E(y) =β =β β 1 βx 1 Number of cells Temperature Model for a simple linear regression: y( x) β x + β + ε = 1 0
20 12. ANOVA and LINEAR REGRESSION Comparison of ANOVA and Linear Regression Depression level m 1 m 2 m 3 Number of cells Measures Temperature SST = SSTR + SSE SST = SSR + SSE
21 12. ANOVA and LINEAR REGRESSION Linear Regression in R cells.txt y x
22 REGRESSION ANALYSIS Solution of Task 12 leukemia.txt Survival data1$survival data1$wbc
23 Thank you for your attention Questions?
Introduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationData Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47
Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise
More informationMULTIVARIATE ANALYSIS USING R
MULTIVARIATE ANALYSIS USING R B N Mandal I.A.S.R.I., Library Avenue, New Delhi 110 012 bnmandal @iasri.res.in 1. Introduction This article gives an exposition of how to use the R statistical software for
More informationk Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)
k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationComputational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................
More informationBL5229: Data Analysis with Matlab Lab: Learning: Clustering
BL5229: Data Analysis with Matlab Lab: Learning: Clustering The following hands-on exercises were designed to teach you step by step how to perform and understand various clustering algorithm. We will
More informationMATH5745 Multivariate Methods Lecture 13
MATH5745 Multivariate Methods Lecture 13 April 24, 2018 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 1 / 33 Cluster analysis. Example: Fisher iris data Fisher (1936) 1 iris data consists of
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationStats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More informationK-Means Clustering 3/3/17
K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationKTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn
KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in
More informationExperimental Design + k- Nearest Neighbors
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3
More informationAdvanced Statistics 1. Lab 11 - Charts for three or more variables. Systems modelling and data analysis 2016/2017
Advanced Statistics 1 Lab 11 - Charts for three or more variables 1 Preparing the data 1. Run RStudio Systems modelling and data analysis 2016/2017 2. Set your Working Directory using the setwd() command.
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More informationLinear discriminant analysis and logistic
Practical 6: classifiers Linear discriminant analysis and logistic This practical looks at two different methods of fitting linear classifiers. The linear discriminant analysis is implemented in the MASS
More informationClojure & Incanter. Introduction to Datasets & Charts. Data Sorcery with. David Edgar Liebke
Data Sorcery with Clojure & Incanter Introduction to Datasets & Charts National Capital Area Clojure Meetup 18 February 2010 David Edgar Liebke liebke@incanter.org Outline Overview What is Incanter? Getting
More informationInput: Concepts, Instances, Attributes
Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationA Data Explorer System and Rulesets of Table Functions
A Data Explorer System and Rulesets of Table Functions Kunihiko KANEKO a*, Ashir AHMED b*, Seddiq ALABBASI c* * Department of Advanced Information Technology, Kyushu University, Motooka 744, Fukuoka-Shi,
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationData analysis case study using R for readily available data set using any one machine learning Algorithm
Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Interpretation and Comparison of Multidimensional Data Partitions Esa Alhoniemi and Olli Simula Neural Networks Research Centre Helsinki University of Technology P. O.Box 5400 FIN-02015 HUT, Finland esa.alhoniemi@hut.fi
More informationFitting Classification and Regression Trees Using Statgraphics and R. Presented by Dr. Neil W. Polhemus
Fitting Classification and Regression Trees Using Statgraphics and R Presented by Dr. Neil W. Polhemus Classification and Regression Trees Machine learning methods used to construct predictive models from
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationData Mining: Exploring Data
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar But we start with a brief discussion of the Friedman article and the relationship between Data
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:
More informationPerformance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms
Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms Binoda Nand Prasad*, Mohit Rathore**, Geeta Gupta***, Tarandeep Singh**** *Guru Gobind Singh Indraprastha University,
More informationHsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction
Support Vector Machine With Data Reduction 1 Table of Contents Summary... 3 1. Introduction of Support Vector Machines... 3 1.1 Brief Introduction of Support Vector Machines... 3 1.2 SVM Simple Experiment...
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationManuel Oviedo de la Fuente and Manuel Febrero Bande
Supervised classification methods in by fda.usc package Manuel Oviedo de la Fuente and Manuel Febrero Bande Universidade de Santiago de Compostela CNTG (Centro de Novas Tecnoloxías de Galicia). Santiago
More informationarulescba: Classification for Factor and Transactional Data Sets Using Association Rules
arulescba: Classification for Factor and Transactional Data Sets Using Association Rules Ian Johnson Southern Methodist University Abstract This paper presents an R package, arulescba, which uses association
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationHierarchical Clustering Lecture 9
Hierarchical Clustering Lecture 9 Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall 1 Lecture 9: Required Reading Witten et al. (2011:
More informationEFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION OF MULTIVARIATE DATA SET
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION OF MULTIVARIATE DATA SET C. Lakshmi Devasena 1 1 Department of Computer Science and Engineering, Sphoorthy Engineering College,
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationWorking with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan
Working with Unlabeled Data Clustering Analysis Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan chanhl@mail.cgu.edu.tw Unsupervised learning Finding centers of similarity using
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationWork 2. Case-based reasoning exercise
Work 2. Case-based reasoning exercise Marc Albert Garcia Gonzalo, Miquel Perelló Nieto November 19, 2012 1 Introduction In this exercise we have implemented a case-based reasoning system, specifically
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More information4. Ad-hoc I: Hierarchical clustering
4. Ad-hoc I: Hierarchical clustering Hierarchical versus Flat Flat methods generate a single partition into k clusters. The number k of clusters has to be determined by the user ahead of time. Hierarchical
More informationImplement a Mining Web Document through New Data Clustering Algorithm
Implement a Mining Web Document through New Data Clustering Algorithm 1 Najim Sheikh, 2 Kasim Ali Saiyed, 3 Ajeet Malviya, 4 Swapnil Sharma Dikshit 1 M-tech Scholar, RTMNU Nagpur. 2 M-tech Scholar, RGPV
More informationNearest Neighbor Classification
Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationAdvanced Graphics in R
Advanced Graphics in R Laurel Stell February 7, 8 Introduction R Markdown file and slides Download in easy steps: http://web.stanford.edu/ lstell/ Click on Data Studio presentation: Advanced graphics in
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationIdentification Of Iris Flower Species Using Machine Learning
Identification Of Iris Flower Species Using Machine Learning Shashidhar T Halakatti 1, Shambulinga T Halakatti 2 1 Department. of Computer Science Engineering, Rural Engineering College,Hulkoti 582205
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules
More informationAn Approach to Identify the Number of Clusters
An Approach to Identify the Number of Clusters Katelyn Gao Heather Hardeman Edward Lim Cristian Potter Carl Meyer Ralph Abbey July 11, 212 Abstract In this technological age, vast amounts of data are generated.
More informationDEPARTMENT OF BIOSTATISTICS UNIVERSITY OF COPENHAGEN. Graphics. Compact R for the DANTRIP team. Klaus K. Holst
Graphics Compact R for the DANTRIP team Klaus K. Holst 2012-05-16 The R Graphics system R has a very flexible and powerful graphics system Basic plot routine: plot(x,y,...) low-level routines: lines, points,
More informationComparative Study of Clustering Algorithms using R
Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationIntroduction to R for Epidemiologists
Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression
More informationStatistical Methods in AI
Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More informationComparision between Quad tree based K-Means and EM Algorithm for Fault Prediction
Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Swapna M. Patil Dept.Of Computer science and Engineering,Walchand Institute Of Technology,Solapur,413006 R.V.Argiddi Assistant
More informationChuck Cartledge, PhD. 20 January 2018
Big Data: Data Analysis Boot Camp Visualizing the Iris Dataset Chuck Cartledge, PhD 20 January 2018 1/31 Table of contents (1 of 1) 1 Intro. 2 Histograms Background 3 Scatter plots 4 Box plots 5 Outliers
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationData Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science
Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 06 07 Department of CS - DM - UHD Road map Cluster Analysis: Basic
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationDATA VISUALIZATION WITH GGPLOT2. Coordinates
DATA VISUALIZATION WITH GGPLOT2 Coordinates Coordinates Layer Controls plot dimensions coord_ coord_cartesian() Zooming in scale_x_continuous(limits =...) xlim() coord_cartesian(xlim =...) Original Plot
More informationIterated Consensus Clustering: A Technique We Can All Agree On
Iterated Consensus Clustering: A Technique We Can All Agree On Mindy Hong, Robert Pearce, Kevin Valakuzhy, Carl Meyer, Shaina Race Abstract Cluster Analysis is a field of Data Mining used to extract underlying
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationEPL451: Data Mining on the Web Lab 5
EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationData Mining Practical Machine Learning Tools and Techniques
Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules
More informationHierarchical clustering
Hierarchical clustering Rebecca C. Steorts, Duke University STA 325, Chapter 10 ISL 1 / 63 Agenda K-means versus Hierarchical clustering Agglomerative vs divisive clustering Dendogram (tree) Hierarchical
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationHands on Datamining & Machine Learning with Weka
Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze
More informationMeasure of Distance. We wish to define the distance between two objects Distance metric between points:
Measure of Distance We wish to define the distance between two objects Distance metric between points: Euclidean distance (EUC) Manhattan distance (MAN) Pearson sample correlation (COR) Angle distance
More informationApplication of Fuzzy Logic Akira Imada Brest State Technical University
A slide show of our Lecture Note Application of Fuzzy Logic Akira Imada Brest State Technical University Last modified on 29 October 2016 (Contemporary Intelligent Information Techniques) 2 I. Fuzzy Basic
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationMultiple Dimensional Visualization
Multiple Dimensional Visualization Dimension 1 dimensional data Given price information of 200 or more houses, please find ways to visualization this dataset 2-Dimensional Dataset I also know the distances
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationImage Analysis Lecture Segmentation. Idar Dyrdal
Image Analysis Lecture 9.1 - Segmentation Idar Dyrdal Segmentation Image segmentation is the process of partitioning a digital image into multiple parts The goal is to divide the image into meaningful
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationOliver Dürr. Statistisches Data Mining (StDM) Woche 5. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften
Statistisches Data Mining (StDM) Woche 5 Oliver Dürr Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften oliver.duerr@zhaw.ch Winterthur, 17 Oktober 2017 1 Multitasking
More informationUSE IBM IN-DATABASE ANALYTICS WITH R
USE IBM IN-DATABASE ANALYTICS WITH R M. WURST, C. BLAHA, A. ECKERT, IBM GERMANY RESEARCH AND DEVELOPMENT Introduction To process data, most native R functions require that the data first is extracted from
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationnetzen - a software tool for the analysis and visualization of network data about
Architect and main contributor: Dr. Carlos D. Correa Other contributors: Tarik Crnovrsanin and Yu-Hsuan Chan PI: Dr. Kwan-Liu Ma Visualization and Interface Design Innovation (ViDi) research group Computer
More informationA Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton
A Tour of Sweave Max Kuhn Pfizer Global R&D Non Clinical Statistics Groton March 14, 2011 Creating Data Analysis Reports For most projects where we need a written record of our work, creating the report
More informationClustering k-mean clustering
Clustering k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: partition genes into distinct sets with high homogeneity and high separation Clustering (unsupervised)
More information