Detecting Novel Associations in Large Data Sets
|
|
- Michael Johns
- 5 years ago
- Views:
Transcription
1 Detecting Novel Associations in Large Data Sets J. Hjelmborg Department of Biostatistics 5. februar 2013 Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar / 22
2 Overview 1 Introduction 2 Heuristics 3 Examples 4 Estimation 5 Project proposal 6 Conclusion Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar / 22
3 Review of paper(s) Detecting Novel Associations in Large Data Sets David N. Reshef et al. Science 334, 1518 (2011). A Correlation for the 21st Century Terry Speed. Science 334, 1502 (2011).
4 Summary The maximal information coefficient MIC is a measure of two-variable dependence designed specifically for rapid exploration of many-dimensional data sets. MIC is part of a larger family of maximal information-based nonparametric exploration (MINE) statistics, which can be used not only to identify important relationships in data sets but also to characterize them.
5 Measuring dependence Given many-dimensional dataset. Search for any association between pairs of variables X and Y and rank these. Generality: Any interesting association should be captured by the statistic, not even only all functional dependencies. Equitability: Relationships of different types with same amount of noise should have similar scores. In particular, functional dependence with similar R 2 values should have similar scores.
6 Classic Measure of Uncertainty Given discrete random variable X on states {1,..., M} with probabilities {p 1,..., p M }. H(p 1,..., p M ) = M k=1 p k log(p k ) measures the uncertainty of X. -the entropy, the only function satisfying the axioms of uncertainty. see Amber (1986) or graduate level textbook in physics or computer science. -the minimum average number of "yes and no"questions required to determine the result of one observation of X. Measure of information conveyed about X by Y : I(X Y ) = H(X) H(X Y )
7 Definition Let D R 2 be a finite set of ordered pairs. Partitioning elements of D into bins induces an x y grid, G xy covering D. For a grid G, let D G be the distribution induced by the points in D on the cells of G. Definition of Maximal Information Coefficient Define I (D, x, y) = max{i(d G )}, where the maximum is over all grids G with x columns and y rows. Define the characteristic matrix, M(D), with entries M(D) x,y = Define MIC(D) = max xy<b(n) {M(D) x,y }. I (D,x,y) log min{x,y}
8 Overview 1 Introduction 2 Heuristics 3 Examples 4 Estimation 5 Project proposal 6 Conclusion Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar / 22
9 (A) For each pair (x,y) find the x-by-y grid with the highest induced mutual information. (B) Characteristic Matrix that stores, for each resolution, the best grid at that resolution and its normalized score. (C) MIC corresponds to the highest point on this surface of normalized scores.
10 Properties Paper: The following statements are formalized and proved: The MIC of data sampled from a distribution (X; Y ), where X and Y are continuous random variables, converges to 0 as sample size grows if and only if X and Y are statistically independent. (Theorem 1) The MIC of a noiseless functional relationship converges to 1 as sample size grows, provided the function governing the relationship is nowhere-constant. (Theorem 3) More generally, the MIC of data sampled a finite union of images of nowhere-flat, nowherevertical differentiable curves will approach 1 as sample size grows. (Theorem 4) For any nowhere-constant function, a set of points drawn from the curve defined by the function and then vertically perturbed will receive an MIC that is lower bounded in terms of the amount of perturbation, given a large enough sample size. Moreover, this lower bound can be stated in terms of R 2. (Theorem 5)
11 Overview 1 Introduction 2 Heuristics 3 Examples 4 Estimation 5 Project proposal 6 Conclusion Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar / 22
12 Example 1: Characteristic matrices
13
14
15
16
17 Overview 1 Introduction 2 Heuristics 3 Examples 4 Estimation 5 Project proposal 6 Conclusion Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar / 22
18 Estimation of MIC The space of grids that must be searched to compute each entry of the characteristic matrix grows exponentially with the number of data points. For efficiency a heuristic dynamic programming algorithm is used to approximate MIC in practice. In paper: B(n) = n 0.6, which is found to work well in practice. In paper: The FDR is controlled for all analyses using the Benjamini and Hochberg procedure.
19 Overview 1 Introduction 2 Heuristics 3 Examples 4 Estimation 5 Project proposal 6 Conclusion Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar / 22
20 Project proposal epigenetic twin data gwas (gwas18)
21 Overview 1 Introduction 2 Heuristics 3 Examples 4 Estimation 5 Project proposal 6 Conclusion Biostat (Biostatistics) Detecting Novel Associations in Large Data Sets 5. februar / 22
22 Conclusion Identifying interesting relationships between pairs of variables in large data sets! MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. Paper: Application of MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
Supporting Online Material for
Corrected 5 February 22; see below www.sciencemag.org/cgi/content/full/334/662/58/dc Supporting Online Material for Detecting Novel Associations in Large Data Sets David N. Reshef, * Yakir A. Reshef, *
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More information0x1A Great Papers in Computer Security
CS 380S 0x1A Great Papers in Computer Security Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs380s/ C. Dwork Differential Privacy (ICALP 2006 and many other papers) Basic Setting DB= x 1 x
More informationConvexization in Markov Chain Monte Carlo
in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationSome results on Interval probe graphs
Some results on Interval probe graphs In-Jen Lin and C H Wu Department of Computer science Science National Taiwan Ocean University, Keelung, Taiwan ijlin@mail.ntou.edu.tw Abstract Interval Probe Graphs
More informationInformation-Theoretic Co-clustering
Information-Theoretic Co-clustering Authors: I. S. Dhillon, S. Mallela, and D. S. Modha. MALNIS Presentation Qiufen Qi, Zheyuan Yu 20 May 2004 Outline 1. Introduction 2. Information Theory Concepts 3.
More informationUniversity of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2
Assignment goals Use mutual information to reconstruct gene expression networks Evaluate classifier predictions Examine Gibbs sampling for a Markov random field Control for multiple hypothesis testing
More informationSEEK User Manual. Introduction
SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationInteger Programming Theory
Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More information4 Generating functions in two variables
4 Generating functions in two variables (Wilf, sections.5.6 and 3.4 3.7) Definition. Let a(n, m) (n, m 0) be a function of two integer variables. The 2-variable generating function of a(n, m) is F (x,
More informationAlgorithms, Games, and Networks February 21, Lecture 12
Algorithms, Games, and Networks February, 03 Lecturer: Ariel Procaccia Lecture Scribe: Sercan Yıldız Overview In this lecture, we introduce the axiomatic approach to social choice theory. In particular,
More informationSummary: A Tutorial on Learning With Bayesian Networks
Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationLecture Notes 2: The Simplex Algorithm
Algorithmic Methods 25/10/2010 Lecture Notes 2: The Simplex Algorithm Professor: Yossi Azar Scribe:Kiril Solovey 1 Introduction In this lecture we will present the Simplex algorithm, finish some unresolved
More informationRandom Tilings. Thomas Fernique. Moscow, Spring 2011
Random Tilings Thomas Fernique Moscow, Spring 2011 1 Random tilings 2 The Dimer case 3 Random assembly 4 Random sampling 1 Random tilings 2 The Dimer case 3 Random assembly 4 Random sampling Quenching
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationModern Multidimensional Scaling
Ingwer Borg Patrick Groenen Modern Multidimensional Scaling Theory and Applications With 116 Figures Springer Contents Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional Scaling
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationFinite Math Linear Programming 1 May / 7
Linear Programming Finite Math 1 May 2017 Finite Math Linear Programming 1 May 2017 1 / 7 General Description of Linear Programming Finite Math Linear Programming 1 May 2017 2 / 7 General Description of
More informationAlgorithms: Decision Trees
Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders
More informationFEATURE SELECTION TECHNIQUES
CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,
More informationSTATISTICS FOR PSYCHOLOGISTS
STATISTICS FOR PSYCHOLOGISTS SECTION: JAMOVI CHAPTER: USING THE SOFTWARE Section Abstract: This section provides step-by-step instructions on how to obtain basic statistical output using JAMOVI, both visually
More informationDependency detection with Bayesian Networks
Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationDimension Induced Clustering
Dimension Induced Clustering Aris Gionis Alexander Hinneburg Spiros Papadimitriou Panayiotis Tsaparas HIIT, University of Helsinki Martin Luther University, Halle Carnegie Melon University HIIT, University
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationInferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles
Supporting Information to Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles Ali Shojaie,#, Alexandra Jauhiainen 2,#, Michael Kallitsis 3,#, George
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationRANSAC and some HOUGH transform
RANSAC and some HOUGH transform Thank you for the slides. They come mostly from the following source Dan Huttenlocher Cornell U Matching and Fitting Recognition and matching are closely related to fitting
More informationAutomated Bioinformatics Analysis System on Chip ABASOC. version 1.1
Automated Bioinformatics Analysis System on Chip ABASOC version 1.1 Phillip Winston Miller, Priyam Patel, Daniel L. Johnson, PhD. University of Tennessee Health Science Center Office of Research Molecular
More informationPackage svapls. February 20, 2015
Package svapls February 20, 2015 Type Package Title Surrogate variable analysis using partial least squares in a gene expression study. Version 1.4 Date 2013-09-19 Author Sutirtha Chakraborty, Somnath
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationArtificial Intelligence for Robotics: A Brief Summary
Artificial Intelligence for Robotics: A Brief Summary This document provides a summary of the course, Artificial Intelligence for Robotics, and highlights main concepts. Lesson 1: Localization (using Histogram
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationVertical decomposition of a lattice using clique separators
Vertical decomposition of a lattice using clique separators Anne Berry, Romain Pogorelcnik, Alain Sigayret LIMOS UMR CNRS 6158 Ensemble Scientifique des Cézeaux Université Blaise Pascal, F-63 173 Aubière,
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationRobust image recovery via total-variation minimization
Robust image recovery via total-variation minimization Rachel Ward University of Texas at Austin (Joint work with Deanna Needell, Claremont McKenna College) February 16, 2012 2 Images are compressible
More informationInformation-based Biclustering for the Analysis of Multivariate Time Series Data
Information-based Biclustering for the Analysis of Multivariate Time Series Data Kevin Casey Courant Institute of Mathematical Sciences, New York University, NY 10003 August 6, 2007 1 Abstract A wide variety
More informationTuring Workshop on Statistics of Network Analysis
Turing Workshop on Statistics of Network Analysis Day 1: 29 May 9:30-10:00 Registration & Coffee 10:00-10:45 Eric Kolaczyk Title: On the Propagation of Uncertainty in Network Summaries Abstract: While
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationInformation Driven Healthcare:
Information Driven Healthcare: Machine Learning course Lecture: Feature selection I --- Concepts Centre for Doctoral Training in Healthcare Innovation Dr. Athanasios Tsanas ( Thanasis ), Wellcome Trust
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationMultidimensional scaling Based in part on slides from textbook, slides of Susan Holmes. October 10, Statistics 202: Data Mining
Multidimensional scaling Based in part on slides from textbook, slides of Susan Holmes October 10, 2012 1 / 1 Multidimensional scaling A visual tool Recall the PCA scores were X V = U where X = HX S 1/2
More informationAnalyzing Large Biological Datasets with an Improved. Algorithm for MIC
Analyzing Large Biological Datasets with an Improved Algorithm for MIC Shuliang Wang, Yiping Zhao School of Software, Beijing Institute of Technology, Beijing 100081, China Abstract: A computational framework
More informationFeature Selection for Image Retrieval and Object Recognition
Feature Selection for Image Retrieval and Object Recognition Nuno Vasconcelos et al. Statistical Visual Computing Lab ECE, UCSD Presented by Dashan Gao Scalable Discriminant Feature Selection for Image
More informationOn Fuzzy Topological Spaces Involving Boolean Algebraic Structures
Journal of mathematics and computer Science 15 (2015) 252-260 On Fuzzy Topological Spaces Involving Boolean Algebraic Structures P.K. Sharma Post Graduate Department of Mathematics, D.A.V. College, Jalandhar
More informationLiterature Review On Implementing Binary Knapsack problem
Literature Review On Implementing Binary Knapsack problem Ms. Niyati Raj, Prof. Jahnavi Vitthalpura PG student Department of Information Technology, L.D. College of Engineering, Ahmedabad, India Assistant
More informationTopology - I. Michael Shulman WOMP 2004
Topology - I Michael Shulman WOMP 2004 1 Topological Spaces There are many different ways to define a topological space; the most common one is as follows: Definition 1.1 A topological space (often just
More informationImage Coding and Data Compression
Image Coding and Data Compression Biomedical Images are of high spatial resolution and fine gray-scale quantisiation Digital mammograms: 4,096x4,096 pixels with 12bit/pixel 32MB per image Volume data (CT
More informationLeave-One-Out Support Vector Machines
Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm
More informationComparison of different preprocessing techniques and feature selection algorithms in cancer datasets
Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets Konstantinos Sechidis School of Computer Science University of Manchester sechidik@cs.man.ac.uk Abstract
More informationInterpretations and Models. Chapter Axiomatic Systems and Incidence Geometry
Interpretations and Models Chapter 2.1-2.4 - Axiomatic Systems and Incidence Geometry Axiomatic Systems in Mathematics The gold standard for rigor in an area of mathematics Not fully achieved in most areas
More informationAuxiliary Variational Information Maximization for Dimensionality Reduction
Auxiliary Variational Information Maximization for Dimensionality Reduction Felix Agakov 1 and David Barber 2 1 University of Edinburgh, 5 Forrest Hill, EH1 2QL Edinburgh, UK felixa@inf.ed.ac.uk, www.anc.ed.ac.uk
More informationFeature extraction. Bi-Histogram Binarization Entropy. What is texture Texture primitives. Filter banks 2D Fourier Transform Wavlet maxima points
Feature extraction Bi-Histogram Binarization Entropy What is texture Texture primitives Filter banks 2D Fourier Transform Wavlet maxima points Edge detection Image gradient Mask operators Feature space
More informationAMS /672: Graph Theory Homework Problems - Week V. Problems to be handed in on Wednesday, March 2: 6, 8, 9, 11, 12.
AMS 550.47/67: Graph Theory Homework Problems - Week V Problems to be handed in on Wednesday, March : 6, 8, 9,,.. Assignment Problem. Suppose we have a set {J, J,..., J r } of r jobs to be filled by a
More informationPart I. Graphical exploratory data analysis. Graphical summaries of data. Graphical summaries of data
Week 3 Based in part on slides from textbook, slides of Susan Holmes Part I Graphical exploratory data analysis October 10, 2012 1 / 1 2 / 1 Graphical summaries of data Graphical summaries of data Exploratory
More informationIntrinsic Dimensionality Estimation for Data Sets
Intrinsic Dimensionality Estimation for Data Sets Yoon-Mo Jung, Jason Lee, Anna V. Little, Mauro Maggioni Department of Mathematics, Duke University Lorenzo Rosasco Center for Biological and Computational
More informationMultidimensional Visualization and Clustering
Multidimensional Visualization and Clustering Presentation for Visual Analytics of Professor Klaus Mueller Xiaotian (Tim) Yin 04-26 26-20072007 Paper List HD-Eye: Visual Mining of High-Dimensional Data
More informationAM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 2 Wednesday, January 27th 1 Overview In our previous lecture we discussed several applications of optimization, introduced basic terminology,
More informationDrawing Semi-bipartite Graphs in Anchor+Matrix Style
2011 15th International Conference on Information Visualisation Drawing Semi-bipartite Graphs in Anchor+Matrix Style Kazuo Misue and Qi Zhou Department of Computer Science, University of Tsukuba Tsukuba,
More informationStatistical matching: conditional. independence assumption and auxiliary information
Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional
More informationClustering with Reinforcement Learning
Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationCommunication Complexity and Parallel Computing
Juraj Hromkovic Communication Complexity and Parallel Computing With 40 Figures Springer Table of Contents 1 Introduction 1 1.1 Motivation and Aims 1 1.2 Concept and Organization 4 1.3 How to Read the
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu /2/8 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 Task: Given a large number (N in the millions or
More information10. Clustering. Introduction to Bioinformatics Jarkko Salojärvi. Based on lecture slides by Samuel Kaski
10. Clustering Introduction to Bioinformatics 30.9.2008 Jarkko Salojärvi Based on lecture slides by Samuel Kaski Definition of a cluster Typically either 1. A group of mutually similar samples, or 2. A
More informationSparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach
1 Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach David Greiner, Gustavo Montero, Gabriel Winter Institute of Intelligent Systems and Numerical Applications in Engineering (IUSIANI)
More informationExcel Scientific and Engineering Cookbook
Excel Scientific and Engineering Cookbook David M. Bourg O'REILLY* Beijing Cambridge Farnham Koln Paris Sebastopol Taipei Tokyo Preface xi 1. Using Excel 1 1.1 Navigating the Interface 1 1.2 Entering Data
More informationORIE 6300 Mathematical Programming I September 2, Lecture 3
ORIE 6300 Mathematical Programming I September 2, 2014 Lecturer: David P. Williamson Lecture 3 Scribe: Divya Singhvi Last time we discussed how to take dual of an LP in two different ways. Today we will
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationCan we quantify the hardness of learning manipulation? Kris Hauser Department of Electrical and Computer Engineering Duke University
Can we quantify the hardness of learning manipulation? Kris Hauser Department of Electrical and Computer Engineering Duke University Robot Learning! Robot Learning! Google used 14 identical robots 800,000
More informationModern Multidimensional Scaling
Ingwer Borg Patrick J.F. Groenen Modern Multidimensional Scaling Theory and Applications Second Edition With 176 Illustrations ~ Springer Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional
More informationFeatures: representation, normalization, selection. Chapter e-9
Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features
More informationSynthetic Geometry. 1.1 Foundations 1.2 The axioms of projective geometry
Synthetic Geometry 1.1 Foundations 1.2 The axioms of projective geometry Foundations Def: A geometry is a pair G = (Ω, I), where Ω is a set and I a relation on Ω that is symmetric and reflexive, i.e. 1.
More informationSemi supervised clustering for Text Clustering
Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering
More information15.082J and 6.855J. Lagrangian Relaxation 2 Algorithms Application to LPs
15.082J and 6.855J Lagrangian Relaxation 2 Algorithms Application to LPs 1 The Constrained Shortest Path Problem (1,10) 2 (1,1) 4 (2,3) (1,7) 1 (10,3) (1,2) (10,1) (5,7) 3 (12,3) 5 (2,2) 6 Find the shortest
More informationDistance-based Methods: Drawbacks
Distance-based Methods: Drawbacks Hard to find clusters with irregular shapes Hard to specify the number of clusters Heuristic: a cluster must be dense Jian Pei: CMPT 459/741 Clustering (3) 1 How to Find
More informationPersistent Homology and Nested Dissection
Persistent Homology and Nested Dissection Don Sheehy University of Connecticut joint work with Michael Kerber and Primoz Skraba A Topological Data Analysis Pipeline A Topological Data Analysis Pipeline
More informationAnalyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient
IEEE TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, MANUSCRIPT ID 1 Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient Chao Wang, Member IEEE, Xi Li,
More informationarxiv: v2 [cs.lg] 14 Aug 2013
Equitability Analysis of the Maximal Information Coefficient, with Comparisons arxiv:3.634v2 [cs.lg] 4 Aug 23 David N. Reshef Department of Electrical Engineering and Computer Science Harvard-MIT Division
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationDISCRETE MATHEMATICS
DISCRETE MATHEMATICS WITH APPLICATIONS THIRD EDITION SUSANNA S. EPP DePaul University THOIVISON * BROOKS/COLE Australia Canada Mexico Singapore Spain United Kingdom United States CONTENTS Chapter 1 The
More informationPackage nettools. August 29, 2016
Package nettools August 29, 2016 Type Package Title A Network Comparison Framework Version 1.0.1 Date 2014-09-02 Maintainer Michele Filosi Depends R (>= 2.14.1), methods Imports parallel,
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,
More informationApproximate Level-Crossing Probabilities for Interactive Visualization of Uncertain Isocontours
Approximate Level-Crossing Probabilities Kai Pöthkow, Christoph Petz & Hans-Christian Hege Working with Uncertainty Workshop, VisWeek 2011 Zuse Institute Berlin Previous Work Pfaffelmoser, Reitinger &
More informationScott Smith Advanced Image Processing March 15, Speeded-Up Robust Features SURF
Scott Smith Advanced Image Processing March 15, 2011 Speeded-Up Robust Features SURF Overview Why SURF? How SURF works Feature detection Scale Space Rotational invariance Feature vectors SURF vs Sift Assumptions
More informationCopyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.
Iterative Improvement Algorithm design technique for solving optimization problems Start with a feasible solution Repeat the following step until no improvement can be found: change the current feasible
More informationPractical OmicsFusion
Practical OmicsFusion Introduction In this practical, we will analyse data, from an experiment which aim was to identify the most important metabolites that are related to potato flesh colour, from an
More informationMissing Data Analysis for the Employee Dataset
Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients
More information