Package suprahex. August 13, Type Package
|
|
- Willa Clark
- 5 years ago
- Views:
Transcription
1 Type Package Package suprahex August 13, 2018 Title suprahex: a supra-hexagonal map for analysing tabular omics data Version Date Author Hai Fang and Julian Gough Maintainer Hai Fang <hfang@well.ox.ac.uk> Depends R (>= 3.3), hexbin Imports ape, MASS, grdevices, graphics, stats, utils A supra-hexagonal map is a giant hexagon on a 2-dimensional grid seamlessly consisting of smaller hexagons. It is supposed to train, analyse and visualise a highdimensional omics input data. The suprahex is able to carry out gene clustering/metaclustering and sample correlation, plus intuitive visualisations to facilitate exploratory analysis. More importantly, it allows for overlaying additional data onto the trained map to explore relations between input and additional data. So with suprahex, it is also possible to carry out multilayer omics data comparisons. Newly added utilities are advanced heatmap visualisation and tree-based analysis of sample relationships. Uniquely to this package, users can ultrafastly understand any tabular omics data, both scientifically and artistically, especially in a sample-specific fashion but without loss of information on large genes. URL Collate spipeline.r shexgrid.r stopology.r sinitial.r strainology.r strainseq.r strainbatch.r sbmh.r sneighdirect.r sneighany.r shexdist.r sdistance.r sdmat.r sdmatminima.r sdmatcluster.r scompreorder.r swritedata.r smapoverlay.r vishexpattern.r vishexgrid.r vishexmapping.r vishexcomp.r viscolormap.r viscolorbar.r visvp.r vishexmulcomp.r vishexanimate.r viscompreorder.r visdmatcluster.r viskernels.r viscoloralpha.r visheatmap.r visheatmapadv.r vistreebootstrap.r vistreebsclust.r visdmatheatmap.r vishexbarplot.r shexpolygon.r shexgridvariant.r License GPL-2 biocviews Software, Clustering, Visualization, GeneExpression git_url git_branch RELEASE_3_7 git_last_commit 9f28a1d git_last_commit_date Date/Publication
2 2 R topics documented: R topics documented: Fang Golub sbmh scompreorder sdistance sdmat sdmatcluster sdmatminima shexdist shexgrid shexgridvariant shexpolygon sinitial smapoverlay sneighany sneighdirect spipeline stopology strainbatch strainology strainseq swritedata viscoloralpha viscolorbar viscolormap viscompreorder visdmatcluster visdmatheatmap visheatmap visheatmapadv vishexanimate vishexbarplot vishexcomp vishexgrid vishexmapping vishexmulcomp vishexpattern viskernels vistreebootstrap vistreebsclust visvp Xiang Index 70
3 Fang 3 Fang Human embryo gene expression dataset from Fang et al. (2010) Human embryo dataset contains gene expression levels (5441 genes and 18 embryo samples) from Fang et al. (2010). data(fang) References Fang: a gene expression matrix of 5441 genes x 18 samples, involving six successive stages, each with three replicates. Fang.sampleinfo: a matrix containing the information of the 18 samples for the expression matrix Fang. The three columns correspond to the sample information: "Name", "Stage" and "Replicate". Fang.geneinfo: a matrix containing the information of the 5441 genes for the expression matrix Fang. The three columns correspond to the gene information: "AffyID", "EntrezGene" and "Symbol". Fang et al. (2010). Transcriptome analysis of early organogenesis in human embryos. Developmental Cell, 19(1): Golub Leukemia gene expression dataset from Golub et al. (1999) Leukemia dataset (learning set) contains gene expression levels (3051 genes and 38 patient samples) from Golub et al. (1999). This dataset has been pre-processed: capping into floor of 100 and ceiling of 16000; filtering by exclusion of genes with max/min <= 5 or max min <= 500, where max and min refer respectively to the maximum and minimum intensities for a particular gene across mrna samples; 2-base logarithmic transformation. data(golub) Golub: a gene expression matrix of 3051 genes x 38 samples. These samples include 11 acute myeloid leukemia (AML) and 27 acute lymphoblastic leukemia (ALL) which can be further subtyped into 19 B-cell ALL and 8 T-cell ALL.
4 4 sbmh References Golub et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, Vol. 286: sbmh Function to identify the best-matching hexagons/rectangles for the input data sbmh is supposed to identify the best-matching hexagons/rectangles (BMH) for the input data. sbmh(smap, data, which_bmh = c("best", "worst", "all")) smap data which_bmh an object of class "smap" or a codebook matrix a data frame or matrix of input data which BMH is requested. It can be a vector consisting of any integer values from [1, nhex]. Alternatively, it can also be one of "best", "worst" and "all" choices. Here, "best" is equivalent to 1, "worst" for nhex, and "all" for seq(1, nhex) a list with following components: bmh: the requested BMH matrix of dlen x length(which_bmh), where dlen is the total number of rows of the input data qerr: the corresponding matrix of quantization errors (i.e., the distance between the input data and their BMH), with the same dimensions as "bmh" above mqe: the mean quantization error for the "best" BMH call: the call that produced this result "which_bmh" upon request can be a vector consisting of any integer values from [1, nhex] spipeline
5 scompreorder 5 # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) from this input matrix, determine nhex=5*sqrt(nrow(data))=50, # but it returns nhex=61, via "shexgrid(nhex=50)", to make sure a supra-hexagonal grid stopol <- stopology(data=data, lattice="hexa", shape="suprahex") # 3) initialise the codebook matrix using "uniform" method si <- sinitial(data=data, stopol=stopol, init="uniform") # 4) define trainology at "rough" stage st_rough <- strainology(smap=si, data=data, stage="rough") # 5) training at "rough" stage sm_rough <- strainbatch(smap=si, data=data, strain=st_rough) # 6) define trainology at "finetune" stage st_finetune <- strainology(smap=si, data=data, stage="finetune") # 7) training at "finetune" stage sm_finetune <- strainbatch(smap=sm_rough, data=data, strain=st_rough) # 8) find the best-matching hexagons/rectangles for the input data response <- sbmh(smap=sm_finetune, data=data, which_bmh="best") scompreorder Function to reorder component planes scompreorder is supposed to reorder component planes for the input map/data. It returns an object of class "sreorder". It is realized by using a new map grid (with sheet shape consisting of a rectangular lattice) to train component plane vectors (either column-wise vectors of codebook/data matrix or the covariance matrix thereof). As a result, similar component planes are placed closer to each other. It is highly recommend to use trained map (i.e. codebook matrix) as input if data matrix is hugely big to save computational costs. scompreorder(smap, xdim = NULL, ydim = NULL, amplifier = NULL, metric = c("none", "pearson", "spearman", "kendall", "euclidean", "manhattan", "cos", "mi"), init = c("linear", "uniform", "sample"), algorithm = c("sequential", "batch"), alphatype = c("invert", "linear", "power"), neighkernel = c("gaussian", "bubble", "cutgaussian", "ep", "gamma")) smap xdim ydim an object of class "smap" or input data frame/matrix an integer specifying x-dimension of the grid an integer specifying y-dimension of the grid
6 6 scompreorder amplifier metric init algorithm alphatype neighkernel an integer specifying the amplifier (3 by default) of the number of component planes. The product of the component number and the amplifier constitutes the number of rectangles in the sheet grid distance metric used to define the similarity between component planes. It can be "none", which means directly using column-wise vectors of codebook/data matrix. Otherwise, first calculate the covariance matrix from the codebook/data matrix. The distance metric used for calculating the covariance matrix between component planes can be: "pearson" for pearson correlation, "spearman" for spearman rank correlation, "kendall" for kendall tau rank correlation, "euclidean" for euclidean distance, "manhattan" for cityblock distance, "cos" for cosine similarity, "mi" for mutual information. See sdistance for details an initialisation method. It can be one of "uniform", "sample" and "linear" initialisation methods the training algorithm. It can be one of "sequential" and "batch" algorithm. By default, it uses sequential algorithm. If the input data contains a large number of samples but not a great amount of zero entries, then it is reasonable to use batch algorithm for its fast computations (probably also without the compromise of accuracy) the alpha type. It can be one of "invert", "linear" and "power" alpha types the training neighbor kernel. It can be one of "gaussian", "bubble", "cutgaussian", "ep" and "gamma" kernels an object of class "sreorder", a list with following components: nhex: the total number of rectanges in the grid xdim: x-dimension of the grid ydim: y-dimension of the grid uorder: the unique order/placement for each component plane that is reordered to the "sheet"- shape grid with rectangular lattice coord: a matrix of nhex x 2, with each row corresponding to the coordinates of each "uorder" rectangle in the 2D map grid call: the call that produced this result All component planes are uniquely placed within a "sheet"-shape rectangle grid: Each component plane mapped to the "sheet"-shape grid with rectangular lattice is determinied iteratively in an order from the best matched to the next compromised one. If multiple compoments are hit in the same rectangular lattice, the worse one is always sacrificed by moving to the next best one till all components are placed somewhere exclusively on their own. The size of "sheet"-shape rectangle grid depends on the input arguments: How the input parameters are used to determine nhex is taken priority in the following order: "xdim & ydim" > "nhex" > "data". If both of xdim and ydim are given, nhex = xdim ydim.
7 sdistance 7 If only data is input, nhex = 5 sqrt(dlen), where dlen is the number of rows of the input data. After nhex is determined, xy-dimensions of rectangle grid are then determined according to the square root of the two biggest eigenvalues of the input data. stopology, spipeline, sbmh, sdistance, viscompreorder # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) colnames(data) <- paste(rep('s',10), seq(1:10), sep="") # 2) get trained using by default setup smap <- spipeline(data=data) # 3) reorder component planes in different ways # 3a) directly using column-wise vectors of codebook matrix sreorder <- scompreorder(smap=smap, amplifier=2, metric="none") # 3b) according to covariance matrix of pearson correlation of codebook matrix sreorder <- scompreorder(smap=smap, amplifier=2, metric="pearson") # 3c) according to covariance matrix of pearson correlation of input matrix sreorder <- scompreorder(smap=data, amplifier=2, metric="pearson") sdistance Function to compute the pairwise distance for a given data matrix sdistance is supposed to compute and return the distance matrix between the rows of a data matrix using a specified distance metric sdistance(data, metric = c("pearson", "spearman", "kendall", "euclidean", "manhattan", "cos", "mi")) data metric a data frame or matrix of input data distance metric used to calculate a symmetric distance matrix. See below for options available dist: a symmetric distance matrix of nrow x nrow, where nrow is the number of rows of input data matrix
8 8 sdistance The distance metrics are supported: "pearson": Pearson correlation. that two curves that have identical shape, but different magnitude will still have a correlation of 1 "spearman": Spearman rank correlation. As a nonparametric version of the pearson correlation, it calculates the correlation between the ranks of the data values in the two vectors (more robust against outliers) "kendall": Kendall tau rank correlation. Compared to spearman rank correlation, it goes a step further by using only the relative ordering to calculate the correlation. For all pairs of data points (x i, y i ) and (x j, y j ), it calls a pair of points either as concordant (Nc in total) if (x i x j ) (y i y j ) > 0, or as discordant (Nd in total) if (x i x j ) (y i y j ) < 0. Finally, it calculates gamma coefficient (Nc Nd)/(Nc + Nd) as a measure of association which is highly resistant to tied data "euclidean": Euclidean distance. Unlike the correlation-based distance measures, it takes the magnitude into account (input data should be suitably normalized "manhattan": Cityblock distance. The distance between two vectors is the sum of absolute value of their differences along any coordinate dimension "cos": Cosine similarity. As an uncentered version of pearson correlation, it is a measure of similarity between two vectors of an inner product space, i.e., measuring the cosine of the angle between them (using a dot product and magnitude) "mi": Mutual information (MI). M I provides a general measure of dependencies between variables, in particular, positive, negative and nonlinear correlations. The caclulation of M I is implemented via applying adaptive partitioning method for deriving equal-probability bins (i.e., each bin contains approximately the same number of data points). The number of bins is heuristically determined (the lower bound): 1 + log2(n), where n is the length of the vector. Because M I increases with entropy, we normalize it to allow comparison of different pairwise clone similarities: 2 M I/[H(x) + H(y)], where H(x) and H(y) stand for the entropy for the vector x and y, respectively sdmatcluster # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) calculate distance matrix using different metric smap <- spipeline(data=data) # 2a) using "pearson" metric dist <- sdistance(data=data, metric="pearson") # 2b) using "cos" metric # dist <- sdistance(data=data, metric="cos") # 2c) using "spearman" metric # dist <- sdistance(data=data, metric="spearman") # 2d) using "kendall" metric # dist <- sdistance(data=data, metric="kendall") # 2e) using "euclidean" metric # dist <- sdistance(data=data, metric="euclidean") # 2f) using "manhattan" metric
9 sdmat 9 # dist <- sdistance(data=data, metric="manhattan") # 2g) using "mi" metric # dist <- sdistance(data=data, metric="mi") sdmat Function to calculate distance matrix in high-dimensional input space but according to neighborhood relationships in 2D output space sdmat is supposed to calculate distance (measured in high-dimensional input space) to neighbors (defined by based on 2D output space) for each of hexagons/rectangles sdmat(smap, which_neigh = 1, distmeasure = c("median", "mean", "min", "max")) smap which_neigh distmeasure an object of class "smap" which neighbors in 2D output space are used for the calculation. By default, it sets to "1" for direct neighbors, and "2" for neighbors within neighbors no more than 2, and so on distance measure used to calculate distances in high-dimensional input space dmat: a vector with the length of nhex. It stores the distance a hexaon/rectangle is away from its output-space-defined neighbors in high-dimensional input space "which_neigh" is defined in output 2D space, but "distmeasure" is defined in high-dimensional input space sneighany # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) get trained using by default setup smap <- spipeline(data=data) # 3) calculate "median" distances in INPUT space to different neighbors in 2D OUTPUT space # 3a) using direct neighbors in 2D OUTPUT space dmat <- sdmat(smap=smap, which_neigh=1, distmeasure="median") # 3b) using no more than 2-topological neighbors in 2D OUTPUT space # dmat <- sdmat(smap=smap, which_neigh=2, distmeasure="median")
10 10 sdmatcluster sdmatcluster Function to partition a grid map into clusters sdmatcluster is supposed to obtain clusters from a grid map. It returns an object of class "sbase". sdmatcluster(smap, which_neigh = 1, distmeasure = c("median", "mean", "min", "max"), clusterlinkage = c("average", "complete", "single", "bmh"), reindexseed = c("hclust", "svd", "none")) smap which_neigh distmeasure an object of class "smap" which neighbors in 2D output space are used for the calculation. By default, it sets to "1" for direct neighbors, and "2" for neighbors within neighbors no more than 2, and so on distance measure used to calculate distances in high-dimensional input space. It can be one of "median", "mean", "min" and "max" measures clusterlinkage cluster linkage used to derive clusters. It can be "bmh", which accumulates a cluster just based on best-matching hexagons/rectanges but can not ensure each cluster is continuous. Instead, each cluster is continuous when using regiongrowing algorithm with one of "average", "complete" and "single" linkages reindexseed the way to index seed. It can be "hclust" for reindexing seeds according to hierarchical clustering of patterns seen in seeds, "svd" for reindexing seeds according to svd of patterns seen in seeds, or "none" for seeds being simply increased by the hexagon indexes (i.e. always in an increasing order as hexagons radiate outwards) an object of class "sbase", a list with following components: seeds: the vector to store cluster seeds, i.e., a list of local minima (in 2D output space) of distance matrix (in input space). They are represented by the indexes of hexagons/rectangles bases: the vector with the length of nhex to store the cluster memberships/bases, where nhex is the total number of hexagons/rectanges in the grid call: the call that produced this result The first item in the return "seeds" is the first cluster, whose memberships are those in the return "bases" that equals 1. The same relationship is held for the second item, and so on spipeline, sdmatminima, sbmh, sneighdirect, sdistance, visdmatcluster
11 sdmatminima 11 # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) get trained using by default setup smap <- spipeline(data=data) # 3) partition the grid map into clusters based on different criteria # 3a) based on "bmh" criterion # sbase <- sdmatcluster(smap=smap, which_neigh=1, distmeasure="median", clusterlinkage="bmh") # 3b) using region-growing algorithm with linkage "average" sbase <- sdmatcluster(smap=smap, which_neigh=1, distmeasure="median", clusterlinkage="average") # 4) visualise clusters/bases partitioned from the smap visdmatcluster(smap,sbase) sdmatminima Function to identify local minima (in 2D output space) of distance matrix (in high-dimensional input space) sdmatminima is supposed to identify local minima of distance matrix (resulting from sdmat). The criterion of being local minima is that the distance associated with a hexagon/rectangle is always smaller than its direct neighbors (i.e., 1-neighborhood) sdmatminima(smap, which_neigh = 1, distmeasure = c("median", "mean", "min", "max")) smap which_neigh distmeasure an object of class "smap" which neighbors in 2D output space are used for the calculation. By default, it sets to "1" for direct neighbors, and "2" for neighbors within neighbors no more than 2, and so on distance measure used to calculate distances in high-dimensional input space. It can be one of "median", "mean", "min" and "max" measures minima: a vector to store a list of local minima (represented by the indexes of hexogans/rectangles Do not get confused by "which_neigh" and the criteria of being local minima. Both of them deal with 2D output space. However, "which_neigh" is used to assist in the calculation of distance matrix (so can be 1-neighborhood or more); instead, the criterion of being local minima is only 1-neighborhood in the strictest sense
12 12 shexdist sdmat, sneighany # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) get trained using by default setup smap <- spipeline(data=data) # 3) identify local minima of distance matrix based on "median" distances and direct neighbors minima <- sdmatminima(smap=smap, which_neigh=1, distmeasure="median") shexdist Function to calculate distances between hexagons/rectangles in a 2D grid shexdist is supposed to calculate euclidian distances between each pair of hexagons/rectangles in a 2D grid of input "stopol" or "smap" object. It returns a symmetric matrix containing pairwise distances. shexdist(sobj) sobj an object of class "stopol" or "sinit" or "smap" dist: a symmetric matrix of nhex x nhex, containing pairwise distances, where nhex is the total number of hexagons/rectanges in the grid The return matrix has rows/columns ordered in the same order as the "coord" matrix of the input object does. stopology, sinitial
13 shexgrid 13 # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) from this input matrix, determine nhex=5*sqrt(nrow(data))=50, # but it returns nhex=61, via "shexgrid(nhex=50)", to make sure a supra-hexagonal grid stopol <- stopology(data=data, lattice="hexa", shape="suprahex") # 3) initialise the codebook matrix using "uniform" method si <- sinitial(data=data, stopol=stopol, init="uniform") # 4) calculate distances between hexagons/rectangles in a 2D grid based on different objects # 4a) based on an object of class "stopol" dist <- shexdist(sobj=stopol) # 4b) based on an object of class "smap" dist <- shexdist(sobj=si) shexgrid Function to define a supra-hexagonal grid shexgrid is supposed to define a supra-hexagonal map grid. A supra-hexagon is a giant hexagon, which seamlessly consists of smaller hexagons. Due to the symmetric nature, it can be uniquely determined by specifying the radius away from the grid centroid. This function takes input the grid radius (or the number of hexagons in the grid, but will be adjusted to meet the definition of supra-hexagon), and returns a list (see below) containing: the grid radius, the total number of hexagons in the grid, the 2D coordinates of the grid centroid, the step for each hexogan away from the grid centroid, and the 2D coordinates of all hexagons in the grid. shexgrid(r = NULL, nhex = NULL) r nhex an integer specifying the radius in a supra-hexagonal grid the number of input hexagons in the grid an object of class "shex", a list with following components: r: the grid radius nhex: the total number of hexagons in the grid. It may differ from the input value; actually it is always no less than the input one to ensure a supra-hexagonal grid exactly formed centroid: the 2D coordinates of the grid centroid stepcentroid: a vector with the length of nhex. It stores how many steps a hexagon is awawy from the grid centroid ( 1 for the centroid itself). Starting with the centroid, it orders outward. Also, for those hexagons of the same step, it orders from the rightmost in an anti-clock wise
14 14 shexgridvariant anglecentroid: a vector with the length of nhex. It stores the angle a hexagon is in terms of the grid centroid ( 0 for the centroid itself). For those hexagons of the same step, it orders from the rightmost in an anti-clock wise coord: a matrix of nhex x 2 with each row specifying the 2D coordinates of a hexagon in the grid. The order of rows is the same as centroid above call: the call that produced this result The relationships among return values: nhex = r (r 1)/2 centroid = coord[1, ] stepcentroid[1] = 1 stepcentroid[2 : nhex] = unlist(sapply(2 : r, function(x)(c((1 + 6 x (x 1)/2 6 (x 1) + 1) : (1 + 6 x (x 1)/2)) >= 1) x)) spipeline # The supra-hexagonal grid is exactly determined by specifying the radius. shex <- shexgrid(r=2) # The grid is determined according to the number of input hexagons (after being adjusted). # The return res$nhex is always no less than the input one. # It ensures a supra-hexagonal grid is exactly formed. shex <- shexgrid(nhex=12) # Ignore input nhex if r is also given shex <- shexgrid(r=3, nhex=100) # By default, r=3 if no parameters are specified shex <- shexgrid() shexgridvariant Function to define a variant of a supra-hexagonal grid shexgridvariant is supposed to define a variant of a supra-hexagonal map grid. In essence, it is the subset of the supra-hexagon. shexgridvariant(r = NULL, nhex = NULL, shape = c("suprahex", "triangle", "diamond", "hourglass", "trefoil", "ladder", "butterfly", "ring", "bridge"))
15 shexgridvariant 15 r nhex shape an integer specifying the radius in a supra-hexagonal grid the number of input hexagons in the grid the grid shape, either "suprahex" for the suprahex itself, or its variants (including "triangle" for the triangle-shaped variant, "diamond" for the diamondshaped variant, "hourglass" for the hourglass-shaped variant, "trefoil" for the trefoil-shaped variant, "ladder" for the ladder-shaped variant, "butterfly" for the butterfly-shaped variant, "ring" for the ring-shaped variant, and "bridge" for the bridge-shaped variant) an object of class "shex", a list with following components: none r: the grid radius nhex: the total number of hexagons in the grid. It may differ from the input value; actually it is always no less than the input one to ensure a supra-hexagonal grid exactly formed centroid: the 2D coordinates of the grid centroid stepcentroid: a vector with the length of nhex. It stores how many steps a hexagon is awawy from the grid centroid ( 1 for the centroid itself). Starting with the centroid, it orders outward. Also, for those hexagons of the same step, it orders from the rightmost in an anti-clock wise anglecentroid: a vector with the length of nhex. It stores the angle a hexagon is in terms of the grid centroid ( 0 for the centroid itself). For those hexagons of the same step, it orders from the rightmost in an anti-clock wise coord: a matrix of nhex x 2 with each row specifying the 2D coordinates of a hexagon in the grid. The order of rows is the same as centroid above call: the call that produced this result shexgrid # For "suprahex" shape itself shex <- shexgridvariant(r=6, shape="suprahex") ## Not run: library(ggplot2) #geom_polygon(color="black", fill=na) # For "suprahex" shape itself shex <- shexgridvariant(r=6, shape="suprahex") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_suprahex <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) +
16 16 shexgridvariant coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="suprahex (r=6; xdim=ydim=11)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "triangle" shape shex <- shexgridvariant(r=6, shape="triangle") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_triangle <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="triangle (r=6; xdim=ydim=6)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "diamond" shape shex <- shexgridvariant(r=6, shape="diamond") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_diamond <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="diamond (r=6; xdim=6, ydim=11)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "hourglass" shape shex <- shexgridvariant(r=6, shape="hourglass") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_hourglass <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="hourglass (r=6; xdim=6, ydim=11)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "trefoil" shape shex <- shexgridvariant(r=6, shape="trefoil") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_trefoil <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="trefoil (r=6; xdim=ydim=11)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "ladder" shape shex <- shexgridvariant(r=6, shape="ladder") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_ladder <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) +
17 shexpolygon 17 labs(title="ladder (r=6; xdim=11, ydim=6)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "butterfly" shape shex <- shexgridvariant(r=6, shape="butterfly") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_butterfly <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="butterfly (r=6; xdim=ydim=11)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "ring" shape shex <- shexgridvariant(r=6, shape="ring") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_ring <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="ring (r=6; xdim=ydim=11)") + theme(plot.title=element_text(hjust=0.5,size=8)) # For "bridge" shape shex <- shexgridvariant(r=6, shape="bridge") df_polygon <- shexpolygon(shex) df_coord <- data.frame(shex$coord, index=1:nrow(shex$coord)) gp_bridge <- ggplot(data=df_polygon, aes(x,y,group=index)) + geom_polygon(aes(fill=factor(stepcentroid%%2))) + coord_fixed(ratio=1) + theme_void() + theme(legend.position="none") + geom_text(data=df_coord, aes(x,y,label=index), color="white", size=3) + labs(title="bridge (r=6; xdim=11, ydim=6)") + theme(plot.title=element_text(hjust=0.5,size=8)) # combined visuals library(gridextra) grid.arrange(grobs=list(gp_suprahex, gp_ring, gp_diamond, gp_trefoil, gp_butterfly, gp_hourglass, gp_ladder, gp_bridge, gp_triangle), layout_matrix=rbind(c(1,1,2,2,3),c(1,1,2,2,3),c(4,4,5,5,6),c(4,4,5,5,6),c(7,7,8,8,9)), nrow=5, ncol=5) ## End(Not run) shexpolygon Function to extract polygon location per hexagon within a suprahexagonal grid shexpolygon is supposed to extract polygon location per hexagon within a supra-hexagonal grid shexpolygon(sobj, area.size = 1)
18 18 sinitial sobj area.size an object of class "smap" or "sinit" or "stopol" or "shex" an inteter or a vector specifying the area size of each hexagon a data frame of five columns ( x, y, index, stepcentroid, anglecentroid ) storing polygon location per hexagon. None shexgridvariant, spipeline sobj <- stopology(xdim=4, ydim=4, lattice="hexa", shape="suprahex") df_polygon <- shexpolygon(sobj, area.size=1) sinitial Function to initialise a sinit object given a topology and input data sinitial is supposed to initialise an object of class "sinit" given a topology and input data. As a matter of fact, it initialises the codebook matrix (in input high-dimensional space). The return object inherits the topology information (i.e., a "stopol" object from stopology), along with initialised codebook matrix and method used. sinitial(data, stopol, init = c("linear", "uniform", "sample")) data stopol init a data frame or matrix of input data an object of class "stopol" (see stopology) an initialisation method. It can be one of "uniform", "sample" and "linear" initialisation methods
19 sinitial 19 an object of class "sinit", a list with following components: nhex: the total number of hexagons/rectanges in the grid xdim: x-dimension of the grid ydim: y-dimension of the grid r: the hypothetical radius of the grid lattice: the grid lattice shape: the grid shape coord: a matrix of nhex x 2, with each row corresponding to the coordinates of a hexagon/rectangle in the 2D map grid init: an initialisation method codebook: a codebook matrix of nhex x ncol(data), with each row corresponding to a prototype vector in input high-dimensional space call: the call that produced this result The initialisation methods include: "uniform": the codebook matrix is uniformly initialised via randomly taking any values within the interval [min, max] of each column of input data "sample": the codebook matrix is initialised via randomly sampling/selecting input data "linear": the codebook matrix is linearly initialised along the first two greatest eigenvectors of input data stopology # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) from this input matrix, determine nhex=5*sqrt(nrow(data))=50, # but it returns nhex=61, via "shexgrid(nhex=50)", to make sure a supra-hexagonal grid stopol <- stopology(data=data, lattice="hexa", shape="suprahex") # 3) initialise the codebook matrix using different mehtods # 3a) using "uniform" method si_uniform <- sinitial(data=data, stopol=stopol, init="uniform") # 3b) using "sample" method # si_sample <- sinitial(data=data, stopol=stopol, init="sample") # 3c) using "linear" method # si_linear <- sinitial(data=data, stopol=stopol, init="linear")
20 20 smapoverlay smapoverlay Function to overlay additional data onto the trained map for viewing the distribution of that additional data smapoverlay is supposed to overlay additional data onto the trained map for viewing the distribution of that additional data. It returns an object of class "smap". It is realised by first estimating the hit histogram weighted by the neighborhood kernel, and then calculating the distribution of the additional data over the map (similarly weighted by the neighborhood kernel). The final overlaid distribution of additional data is normalised by the hit histogram. smapoverlay(smap, data, additional) smap data additional an object of class "smap" a data frame or matrix of input data a numeric vector or numeric matrix used to overlay onto the trained map. It must have the length (if being vector) or row number (if matrix) being equal to the number of rows in input data an object of class "smap", a list with following components: nhex: the total number of hexagons/rectanges in the grid xdim: x-dimension of the grid ydim: y-dimension of the grid r: the hypothetical radius of the grid lattice: the grid lattice shape: the grid shape coord: a matrix of nhex x 2, with rows corresponding to the coordinates of all hexagons/rectangles in the 2D map grid init: an initialisation method neighkernel: the training neighborhood kernel codebook: a codebook matrix of nhex x ncol(additional), with rows corresponding to overlaid vectors hits: a vector of nhex, each element meaning that a hexagon/rectangle contains the number of input data vectors being hit wherein mqe: the mean quantization error for the "best" BMH call: the call that produced this result Weighting by neighbor kernel is to avoid rigid overlaying by only focusing on the best-matching map nodes as there may exist several closest best-matching nodes for an input data vector.
21 sneighany 21 spipeline, sbmh, shexdist, vishexmulcomp # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) colnames(data) <- paste(rep('s',10), seq(1:10), sep="") # 2) get trained using by default setup smap <- spipeline(data=data) # 3) overlay additional data onto the trained map # here using the first two columns of the input "data" as "additional" # codebook in "soverlay" is the same as the first two columns of codebook in "smap" soverlay <- smapoverlay(smap=smap, data=data, additional=data[,1:2]) # 4) viewing the distribution of that additional data vishexmulcomp(soverlay) sneighany Function to calculate any neighbors for each hexagon/rectangle in a grid sneighany is supposed to calculate any neighbors for each hexagon/rectangle in a regular 2D grid. It returns a matrix with rows for the self, and columns for its any neighbors. sneighany(sobj) sobj an object of class "stopol" or "sinit" or "smap" aneigh: a matrix of nhex x nhex, containing distance info in terms of any neighbors, where nhex is the total number of hexagons/rectanges in the grid The return matrix has rows for the self, and columns for its neighbors. The non-zeros mean the distance away from its neighbors, and the zeros for the self-self. It has rows/columns ordered in the same order as the "coord" matrix of the input object does. sneighdirect
22 22 sneighdirect # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) from this input matrix, determine nhex=5*sqrt(nrow(data))=50, # but it returns nhex=61, via "shexgrid(nhex=50)", to make sure a supra-hexagonal grid stopol <- stopology(data=data, lattice="hexa", shape="suprahex") # 3) initialise the codebook matrix using "uniform" method si <- sinitial(data=data, stopol=stopol, init="uniform") # 4) calculate any neighbors based on different objects # 4a) based on an object of class "stopol" aneigh <- sneighany(sobj=stopol) # 4b) based on an object of class "smap" # aneigh <- sneighany(sobj=si) sneighdirect Function to calculate direct neighbors for each hexagon/rectangle in a grid sneighdirect is supposed to calculate direct neighbors for each hexagon/rectangle in a regular 2D grid. It returns a matrix with rows for the self, and columns for its direct neighbors. sneighdirect(sobj) sobj an object of class "stopol" or "sinit" or "smap" dneigh: a matrix of nhex x nhex, containing presence/absence info in terms of direct neighbors, where nhex is the total number of hexagons/rectanges in the grid The return matrix has rows for the self, and columns for its direct neighbors. The "1" means the presence of direct neighbors, "0" for the absence. It has rows/columns ordered in the same order as the "coord" matrix of the input object does. shexdist
23 spipeline 23 # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) from this input matrix, determine nhex=5*sqrt(nrow(data))=50, # but it returns nhex=61, via "shexgrid(nhex=50)", to make sure a supra-hexagonal grid stopol <- stopology(data=data, lattice="hexa", shape="suprahex") # 3) initialise the codebook matrix using "uniform" method si <- sinitial(data=data, stopol=stopol, init="uniform") # 4) calculate direct neighbors based on different objects # 4a) based on an object of class "stopol" dneigh <- sneighdirect(sobj=stopol) # 4b) based on an object of class "smap" # dneigh <- sneighdirect(sobj=si) spipeline Function to setup the pipeline for completing ab initio training given the input data spipeline is supposed to finish ab inito training for the input data. It returns an object of class "smap". spipeline(data = NULL, xdim = NULL, ydim = NULL, nhex = NULL, lattice = c("hexa", "rect"), shape = c("suprahex", "sheet", "triangle", "diamond", "hourglass", "trefoil", "ladder", "butterfly", "ring", "bridge"), scale = 5, init = c("linear", "uniform", "sample"), algorithm = c("batch", "sequential"), alphatype = c("invert", "linear", "power"), neighkernel = c("gaussian", "bubble", "cutgaussian", "ep", "gamma"), finetunesustain = FALSE, verbose = TRUE) data xdim ydim nhex lattice shape a data frame or matrix of input data an integer specifying x-dimension of the grid an integer specifying y-dimension of the grid the number of hexagons/rectangles in the grid the grid lattice, either "hexa" for a hexagon or "rect" for a rectangle the grid shape, either "suprahex" for a supra-hexagonal grid or "sheet" for a hexagonal/rectangle sheet. Also supported are suprahex s variants (including "triangle" for the triangle-shaped variant, "diamond" for the diamond-shaped variant, "hourglass" for the hourglass-shaped variant, "trefoil" for the trefoilshaped variant, "ladder" for the ladder-shaped variant, "butterfly" for the butterflyshaped variant, "ring" for the ring-shaped variant, and "bridge" for the bridgeshaped variant)
24 24 spipeline scale init algorithm alphatype neighkernel the scaling factor. Only used when automatically estimating the grid dimension from input data matrix. By default, it is 5 (big map). Other suggested values: 1 for small map, and 3 for median map an initialisation method. It can be one of "uniform", "sample" and "linear" initialisation methods the training algorithm. It can be one of "sequential" and "batch" algorithm. By default, it uses batch algorithm purely because of its fast computations (probably also without the compromise of accuracy). However, it is highly recommended not to use batch algorithm if the input data contain lots of zeros; it is because matrix multiplication used in the batch algorithm can be problematic in this context. If much computation resource is at hand, it is alwasy safe to use the sequential algorithm the alpha type. It can be one of "invert", "linear" and "power" alpha types the training neighborhood kernel. It can be one of "gaussian", "bubble", "cutgaussian", "ep" and "gamma" kernels finetunesustain logical to indicate whether sustain the "finetune" training. If true, it will repeat the "finetune" stage until the mean quantization error does get worse. By default, it sets to true verbose logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display an object of class "smap", a list with following components: nhex: the total number of hexagons/rectanges in the grid xdim: x-dimension of the grid ydim: y-dimension of the grid r: the hypothetical radius of the grid lattice: the grid lattice shape: the grid shape coord: a matrix of nhex x 2, with rows corresponding to the coordinates of all hexagons/rectangles in the 2D map grid polygon: a data frame of three columns ( x, y, id ) storing polygon location per hexagon in the 2D map grid init: an initialisation method neighkernel: the training neighborhood kernel codebook: a codebook matrix of nhex x ncol(data), with rows corresponding to prototype vectors in input high-dimensional space hits: a vector of nhex, each element meaning that a hexagon/rectangle contains the number of input data vectors being hit wherein mqe: the mean quantization error for the "best" BMH call: the call that produced this result
25 spipeline 25 The pipeline sequentially consists of: i) stopology used to define the topology of a grid (with "suprahex" shape by default ) according to the input data; ii) sinitial used to initialise the codebook matrix given the pre-defined topology and the input data (by default using "uniform" initialisation method); iii) strainology and strainseq used to get the grid map trained at both "rough" and "finetune" stages. If instructed, sustain the "finetune" training until the mean quantization error does get worse; iv) sbmh used to identify the best-matching hexagons/rectangles (BMH) for the input data, and these response data are appended to the resulting object of "smap" class. References Hai Fang and Julian Gough. (2014) suprahex: an R/Bioconductor package for tabular omics data analysis using a supra-hexagonal map. Biochemical and Biophysical Research Communications, 443(1), stopology, sinitial, strainology, strainseq, strainbatch, sbmh, vishexmulcomp # 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) colnames(data) <- paste(rep('s',10), seq(1:10), sep="") # 2) get trained using by default setup but with different neighborhood kernels # 2a) with "gaussian" kernel smap <- spipeline(data=data, neighkernel="gaussian") # 2b) with "bubble" kernel # smap <- spipeline(data=data, neighkernel="bubble") # 2c) with "cutgaussian" kernel # smap <- spipeline(data=data, neighkernel="cutgaussian") # 2d) with "ep" kernel # smap <- spipeline(data=data, neighkernel="ep") # 2e) with "gamma" kernel # smap <- spipeline(data=data, neighkernel="gamma") # 3) visualise multiple component planes of a supra-hexagonal grid vishexmulcomp(smap, colormap="jet", ncolors=20, zlim=c(-1,1), gp=grid::gpar(cex=0.8)) # 4) get trained using by default setup but using the shape "butterfly" smap <- spipeline(data=data, shape="trefoil", algorithm=c("batch","sequential")[2]) vishexmulcomp(smap, colormap="jet", ncolors=20, zlim=c(-1,1), gp=grid::gpar(cex=0.8))
26 26 stopology stopology Function to define the topology of a map grid stopology is supposed to define the topology of a 2D map grid. The topological shape can be either a supra-hexagonal grid or a hexagonal/rectangle sheet. It returns an object of "stopol" class, containing: the total number of hexagons/rectangles in the grid, the grid xy-dimensions, the grid lattice, the grid shape, and the 2D coordinates of all hexagons/rectangles in the grid. The 2D coordinates can be directly used to measure distances between any pair of lattice hexagons/rectangles. stopology(data = NULL, xdim = NULL, ydim = NULL, nhex = NULL, lattice = c("hexa", "rect"), shape = c("suprahex", "sheet", "triangle", "diamond", "hourglass", "trefoil", "ladder", "butterfly", "ring", "bridge"), scale = 5) data xdim ydim nhex lattice shape scale a data frame or matrix of input data an integer specifying x-dimension of the grid an integer specifying y-dimension of the grid the number of hexagons/rectangles in the grid the grid lattice, either "hexa" for a hexagon or "rect" for a rectangle the grid shape, either "suprahex" for a supra-hexagonal grid or "sheet" for a hexagonal/rectangle sheet. Also supported are suprahex s variants (including "triangle" for the triangle-shaped variant, "diamond" for the diamond-shaped variant, "hourglass" for the hourglass-shaped variant, "trefoil" for the trefoilshaped variant, "ladder" for the ladder-shaped variant, "butterfly" for the butterflyshaped variant, "ring" for the ring-shaped variant, and "bridge" for the bridgeshaped variant) the scaling factor. Only used when automatically estimating the grid dimension from input data matrix. By default, it is 5 (big map). Other suggested values: 1 for small map, and 3 for median map an object of class "stopol", a list with following components: nhex: the total number of hexagons/rectanges in the grid. It is not always the same as the input nhex (if any); see "" below for the explaination xdim: x-dimension of the grid ydim: y-dimension of the grid r: the hypothetical radius of the grid lattice: the grid lattice shape: the grid shape
27 stopology 27 coord: a matrix of nhex x 2, with each row corresponding to the coordinates of a hexagon/rectangle in the 2D map grid call: the call that produced this result The output of nhex depends on the input arguments and grid shape: How the input parameters are used to determine nhex is taken priority in the following order: "xdim & ydim" > "nhex" > "data" If both of xdim and ydim are given, nhex = xdim ydim for the "sheet" shape, r = (min(xdim, ydim) + 1)/2 for the "suprahex" shape If only data is input, nhex = scale sqrt(dlen), where dlen is the number of rows of the input data, and scale can be 5 (big map), 3 (median map) and 1 (normal map) With nhex in hand, it depends on the grid shape: For "sheet" shape, xy-dimensions of sheet grid is determined according to the square root of the two biggest eigenvalues of the input data For "suprahex" shape, see shexgrid for calculating the grid radius r. The xdim (and ydim) is related to r via xdim = 2 r 1 shexgrid, vishexmapping # For "suprahex" shape stopol <- stopology(xdim=3, ydim=3, lattice="hexa", shape="suprahex") # Error: "The suprahex shape grid only allows for hexagonal lattice" # stopol <- stopology(xdim=3, ydim=3, lattice="rect", shape="suprahex") # For "sheet" shape with hexagonal lattice stopol <- stopology(xdim=3, ydim=3, lattice="hexa", shape="sheet") # For "sheet" shape with rectangle lattice stopol <- stopology(xdim=3, ydim=3, lattice="rect", shape="sheet") # By default, nhex=19 (i.e., r=3; xdim=ydim=5) for "suprahex" shape stopol <- stopology(shape="suprahex") # By default, xdim=ydim=5 (i.e., nhex=25) for "sheet" shape stopol <- stopology(shape="sheet") # Determine the topolopy of a supra-hexagonal grid based on input data # 1) generate an iid normal random matrix of 100x10 data <- matrix(rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) from this input matrix, determine nhex=5*sqrt(nrow(data))=50, # but it returns nhex=61, via "shexgrid(nhex=50)", to make sure a supra-hexagonal grid stopol <- stopology(data=data, lattice="hexa", shape="suprahex") # stopol <- stopology(data=data, lattice="hexa", shape="trefoil") # do visualisation vishexmapping(stopol,mappingtype="indexes")
9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationMeasure of Distance. We wish to define the distance between two objects Distance metric between points:
Measure of Distance We wish to define the distance between two objects Distance metric between points: Euclidean distance (EUC) Manhattan distance (MAN) Pearson sample correlation (COR) Angle distance
More informationThe som Package. September 19, Description Self-Organizing Map (with application in gene clustering)
The som Package September 19, 2004 Version 0.3-4 Date 2004-09-18 Title Self-Organizing Map Author Maintainer Depends R (>= 1.9.0) Self-Organizing Map (with application in gene clustering) License GPL version
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationHierarchical Clustering
What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationPackage twilight. August 3, 2013
Package twilight August 3, 2013 Version 1.37.0 Title Estimation of local false discovery rate Author Stefanie Scheid In a typical microarray setting with gene expression data observed
More informationFuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.
Chapter 4 Fuzzy Segmentation 4. Introduction. The segmentation of objects whose color-composition is not common represents a difficult task, due to the illumination and the appropriate threshold selection
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationCHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE
32 CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE 3.1 INTRODUCTION In this chapter we present the real time implementation of an artificial neural network based on fuzzy segmentation process
More informationPackage SC3. September 29, 2018
Type Package Title Single-Cell Consensus Clustering Version 1.8.0 Author Vladimir Kiselev Package SC3 September 29, 2018 Maintainer Vladimir Kiselev A tool for unsupervised
More informationCANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.
CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known
More informationPackage DiffCorr. August 29, 2016
Type Package Package DiffCorr August 29, 2016 Title Analyzing and Visualizing Differential Correlation Networks in Biological Data Version 0.4.1 Date 2015-03-31 Author, Kozo Nishida Maintainer
More informationClustering analysis of gene expression data
Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains
More informationResting state network estimation in individual subjects
Resting state network estimation in individual subjects Data 3T NIL(21,17,10), Havard-MGH(692) Young adult fmri BOLD Method Machine learning algorithm MLP DR LDA Network image Correlation Spatial Temporal
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationSimulation of WSN in NetSim Clustering using Self-Organizing Map Neural Network
Simulation of WSN in NetSim Clustering using Self-Organizing Map Neural Network Software Recommended: NetSim Standard v11.0, Visual Studio 2015/2017, MATLAB 2016a Project Download Link: https://github.com/netsim-tetcos/wsn_som_optimization_v11.0/archive/master.zip
More informationFoundations of Machine Learning CentraleSupélec Fall Clustering Chloé-Agathe Azencot
Foundations of Machine Learning CentraleSupélec Fall 2017 12. Clustering Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning objectives
More informationHow do microarrays work
Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid
More informationSimulation of WSN in NetSim Clustering using Self-Organizing Map Neural Network
Simulation of WSN in NetSim Clustering using Self-Organizing Map Neural Network Software Recommended: NetSim Standard v11.1 (32/64bit), Visual Studio 2015/2017, MATLAB (32/64 bit) Project Download Link:
More informationPackage SC3. November 27, 2017
Type Package Title Single-Cell Consensus Clustering Version 1.7.1 Author Vladimir Kiselev Package SC3 November 27, 2017 Maintainer Vladimir Kiselev A tool for unsupervised
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationPackage macorrplot. R topics documented: October 2, Title Visualize artificial correlation in microarray data. Version 1.50.
Package macorrplot October 2, 2018 Title Visualize artificial correlation in microarray data Version 1.50.0 Author Alexander Ploner Description Graphically displays correlation
More informationDistance-based Methods: Drawbacks
Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationClustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search
Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu /2/8 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 Task: Given a large number (N in the millions or
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/28/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationQuiz Section Week 8 May 17, Machine learning and Support Vector Machines
Quiz Section Week 8 May 17, 2016 Machine learning and Support Vector Machines Another definition of supervised machine learning Given N training examples (objects) {(x 1,y 1 ), (x 2,y 2 ),, (x N,y N )}
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationDistance Measures for Gene Expression (and other high-dimensional) Data
Distance Measures for Gene Expression (and other high-dimensional) Data Utah State University Spring 04 STAT 5570: Statistical Bioinformatics Notes. References Chapter of Bioconductor Monograph (course
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationMultivariate Analysis
Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data
More informationPackage EMDomics. November 11, 2018
Package EMDomics November 11, 2018 Title Earth Mover's Distance for Differential Analysis of Genomics Data The EMDomics algorithm is used to perform a supervised multi-class analysis to measure the magnitude
More informationClustering Jacques van Helden
Statistical Analysis of Microarray Data Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation
More informationPackage lhs. R topics documented: January 4, 2018
Package lhs January 4, 2018 Version 0.16 Date 2017-12-23 Title Latin Hypercube Samples Author [aut, cre] Maintainer Depends R (>= 3.3.0) Suggests RUnit Provides a number of methods
More informationRoad map. Basic concepts
Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?
More informationPerformance Evaluation of Clustering Methods in Microarray Data
American Journal of Bioinformatics Research 2016, 6(1): 19-25 DOI: 10.5923/j.bioinformatics.20160601.03 Performance Evaluation of Clustering Methods in Microarray Data Md. Siraj-Ud-Doulah *, Md. Bipul
More informationPackage seg. February 15, 2013
Package seg February 15, 2013 Version 0.2-4 Date 2013-01-21 Title A set of tools for residential segregation research Author Seong-Yun Hong, David O Sullivan Maintainer Seong-Yun Hong
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/10/2017)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationThe C Clustering Library
The C Clustering Library The University of Tokyo, Institute of Medical Science, Human Genome Center Michiel de Hoon, Seiya Imoto, Satoru Miyano 29 April 2018 The C Clustering Library for cdna microarray
More information9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis
Introduction ti to K-means Algorithm Wenan Li (Emil Li) Sep. 5, 9 Outline Introduction to Clustering Analsis K-means Algorithm Description Eample of K-means Algorithm Other Issues of K-means Algorithm
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationContents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results
Statistical Analysis of Microarray Data Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation of clustering results Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationExploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray
Exploratory Data Analysis using Self-Organizing Maps Madhumanti Ray Content Introduction Data Analysis methods Self-Organizing Maps Conclusion Visualization of high-dimensional data items Exploratory data
More informationBehavioral Data Mining. Lecture 18 Clustering
Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationClassification of Hyperspectral Breast Images for Cancer Detection. Sander Parawira December 4, 2009
1 Introduction Classification of Hyperspectral Breast Images for Cancer Detection Sander Parawira December 4, 2009 parawira@stanford.edu In 2009 approximately one out of eight women has breast cancer.
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationData Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\
Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured
More informationData Clustering. Danushka Bollegala
Data Clustering Danushka Bollegala Outline Why cluster data? Clustering as unsupervised learning Clustering algorithms k-means, k-medoids agglomerative clustering Brown s clustering Spectral clustering
More informationNotes. Reminder: HW2 Due Today by 11:59PM. Review session on Thursday. Midterm next Tuesday (10/09/2018)
1 Notes Reminder: HW2 Due Today by 11:59PM TA s note: Please provide a detailed ReadMe.txt file on how to run the program on the STDLINUX. If you installed/upgraded any package on STDLINUX, you should
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationMotion estimation for video compression
Motion estimation for video compression Blockmatching Search strategies for block matching Block comparison speedups Hierarchical blockmatching Sub-pixel accuracy Motion estimation no. 1 Block-matching
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationExploratory Analysis: Clustering
Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents
More informationThis research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight
This research aims to present a new way of visualizing multi-dimensional data using generalized scatterplots by sensitivity coefficients to highlight local variation of one variable with respect to another.
More information[7.3, EA], [9.1, CMB]
K-means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] Outline Introduction K-means Algorithm Example How K-means partitions? K-means Demo Relevant Issues Application: Cell Neulei Detection Summary
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More information