Visually-Motivated Characterizations of Point Sets Embedded in High-Dimensional Geometric Spaces: Current Projects
|
|
- Rosa Johnson
- 5 years ago
- Views:
Transcription
1 Visually-Motivated Characterizations of Point Sets Embedded in High-Dimensional Geometric Spaces: Current Projects Leland Wilkinson SYSTAT University of Illinois at Chicago (Computer Science) Northwestern University (Statistics) Robert Grossman University of Illinois at Chicago (Computer Science) Adilson Motter Northwestern University (Physics) 1
2 Subprojects Scagnostics Classification Venn/Euler Diagrams Treemaps 2
3 Scagnostics Wilkinson, Anand, and Grossman (2006) characterize a scatterplot (2D point set) with nine measures. We base our measures on three geometric graphs. Convex Hull Alpha Shape Minimum Spanning Tree 3
4 Each geometric graph is a subset of the Delaunay Triangulation 4
5 Convex Hull 5
6 Alpha Shape 6
7 Minimum Spanning Tree 7
8 Shape Convex: area of alpha shape divided by area of convex hull Skinny: ratio of perimeter to area of the alpha shape Stringy: ratio of 2-degree vertices in MST to number of vertices > 1-degree 8
9 Trend Monotonic: squared Spearman correlation coefficient 9
10 Density Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where quantiles are on MST edge lengths Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the length of runt-cutting edge (red) longest edge in runt largest runt Outlying: proportion of total MST length due to edges adjacent to outliers 10
11 Density Sparse: 90th percentile of distribution of edge lengths in MST Striated: proportion of all vertices in the MST that are degree-2 and have a cosine between adjacent edges less than
12 Software (Wilkinson and Anand) 12
13 Fishing Expeditions in Visual Analytics We have used the empirical distribution of Scagnostics (Wilkinson and Wills, JCGS, 2008), the False Discovery Rate (FDR) statistic, and automated statistical modeling to develop algorithms for reducing false discoveries when people use visual-analytic software (Wilkinson and Wills, IVS, 2009). 40 Mean Elevation Temp(F) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Day Temperature Day These data are fake. They are a random walk produced by a random number generator. These data are real. They are temperature measurements of a cow over 80-days. The data are periodic. 13
14 Distribution of Scagnostics Monotonic Stringy Skinny Convex Striated Sparse Clumpy Skewed Outlying Measure 14
15 False Discovery Rate (FDR) Benjamini and Hochberg, JRSS, 1995 Sort a list of p values and define p0 = 0 P = [p 1,..., p m ] Compute BH index Use adjusted p value (indexed by rbh) as cutoff Based on uniform distribution of p values, so tests may be heterogeneous 15
16 Scagnostics on Graphs Adilson Motter is developing scagnostics for (V, E) graphs. This effort is similar to what we did for scatterplots. One wants a relatively small set of measures that are relatively independent and that nicely characterize a wide universe of real directed and undirected graphs. The goal is to classify real graphs, recognize anomalies, and locate exemplars in subgraphs of large graphs. 16
17 Classification Based on our analysis of how people visually classify 2D point sets, we have developed a high-dimensional, linear complexity, supervised classifier called Linf. This classifier, using the L-infinity metric, rivals or outperforms leading classifiers on 10 difficult datasets. 17
18 Visual Classification Anand, Wilkinson, and Tuan (ICDM, 2009) developed a visual classifier to analyze how people distinguish target point sets from background point sets. We designed the software to behave like a video game. 18
19 Description Regions Weighted L-infinity norm x = sup(w 1 x 1, w 2 x 2,... w n x n ) Hypercube Description Region (HDR) The set of points less than a fixed distance from a single point using the L-infinity norm. Composite Hypercube Description Region (CHDR) Union of HDRs. 19
20 The Linf Algorithm 1.Transform t(x) = sgn(x)sqrt(abs(x)) 2.Project 3.Bin Pick next target class (one against all). Compute 25 random, integer-weighted, 2D projections. Pick best 5 projections on a class-separation measure. b = 2log 2 (n) Compute purity of target class instances in each bin. 4.Cover j j j j i 5.Repeat Store best cover in scoring list. Repeat steps 2-4 until no training instances left. 20
21 Results waveform S R L B T B R S L T vowel R T SB L B T R S L Dataset spect shuttle satellite orange10 optdigits L R TL SB T R L B S R LB T T SRL B R S B T S R T S B B B B T S L B S T R L R S T R T R S L L L Naive Bayes Support Vector Machine Classification Tree Random Forest Linf madelon T L B S R B R T L S Standardized Error cancer L S T R B BR T S L adult SR T LB B T R L S B : Naive Bayes L : Linf R : Random Forest S : Support Vector Machine T : Classification Tree Error (%) Time (sec) 21
22 Venn/Euler Diagrams How do we fit a Venn/Euler diagram to empirical data involving more than 3 sets? These diagrams are widely used in the bioinformatics community. We have developed an R function called venneuler() and have posted it on CRAN for general use. Euler diagram for 11 animals based on gene lists from the Agilent DNA oligo microarray database. The analysis was based on 247,412 gene symbols and 11 animal names. The stress for this solution is.06, with corresponding correlation of 0.97 between circle and intersection areas and counts of genes. The computation was completed in under 30 seconds on a 2.5 GHz MacBook Pro running the Java 1.5 Virtual Machine in 2GB of allocated memory. Mouse Rat Frog Pig Sheep Rabbit Horse Human Chicken Cow Dog 22
23 The venneuler() Algorithm 1.Make list of m intersections among n sets (m = 2 n ) 2.Make list of disk intersections (size disks proportional to Xi ). 3.Make list of disjoint counts and list of disjoint areas. 4.Estimate model. 5.Move disks and repeat 2-5 to minimize error, using gradient: 23
24 Calculating Areas
25 Treemaps We are investigating the ordering of treemaps. 25
26 Seriating a Tree Canada Italy France Sweden WGermany Belgium Denmark UK Austria Switzerland Czechoslov Poland Cuba Chile Argentina Uruguay Panama Colombia Peru Brazil Ecuador ElSalvador Guatemala Iraq Libya Ethiopia Haiti Somalia Afghanistan Guinea Permuted Data Matrix URBAN LITERACY LIFEEXPF LIFEEXPM GDP_CAP HEALTH EDUC MIL DEATH_RT BABYMORT BIRTH_RT B_TO_D Wilkinson and Friendly, TAS,
27 Fisher-Anderson Iris Data Petal Length + Petal Width Sepal Length + Sepal Width Setosa Versicolor Virginica Species X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80 X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91 X92 X93 X94 X95 X96 X97 X98 X99 X100 X101 X103 X104 X105 X106 X107 X109 X110 X111 X112 X113 X114 X115 X116 X118 X119 X120 X122 X123 X124 X125 X126 X127 X128 X130 X132 X134 X135 X136 X137 X139 X140 X141 X142 X145 X146 X147 X148 X149 X150 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80 X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91 X92 X93 X94 X95 X96 X97 X98 X99 X100 X100 X101 X103 X104 X105 X106 X107 X109 X110 X111 X112 X113 X114 X115 X116 X118 X119 X120 X122 X123 X124 X125 X126 X127 X128 X130 X132 X134 X135 X136 X137 X139 X140 X141 X142 X145 X146 X147 X148 X149 X150
FODAVA Partners Leland Wilkinson (SYSTAT & UIC) Robert Grossman (UIC) Adilson Motter (Northwestern) Anushka Anand, Troy Hernandez (UIC)
FODAVA Partners Leland Wilkinson (SYSTAT & UIC) Robert Grossman (UIC) Adilson Motter (Northwestern) Anushka Anand, Troy Hernandez (UIC) Visually-Motivated Characterizations of Point Sets Embedded in High-Dimensional
More informationFODAVA Partners Leland Wilkinson (SYSTAT, Advise Analytics & UIC) Anushka Anand, Tuan Nhon Dang (UIC)
FODAVA Partners Leland Wilkinson (SYSTAT, Advise Analytics & UIC) Anushka Anand, Tuan Nhon Dang (UIC) Anomaly Discovery through Visual Characterizations of Point Sets Embedded in High- Dimensional Geometric
More informationEvgeny Maksakov Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages: Advantages and disadvantages:
Today Problems with visualizing high dimensional data Problem Overview Direct Visualization Approaches High dimensionality Visual cluttering Clarity of representation Visualization is time consuming Dimensional
More informationTimeseer: Detecting interesting distributions in multiple time series data
Timeseer: Detecting interesting distributions in multiple time series data Tuan Nhon Dang University of Illinois at Chicago tdang@cs.uic.edu Leland Wilkinson University of Illinois at Chicago leland.wilkinson@systat.com
More informationLecture 6: Statistical Graphics
Lecture 6: Statistical Graphics Information Visualization CPSC 533C, Fall 2009 Tamara Munzner UBC Computer Science Mon, 28 September 2009 1 / 34 Readings Covered Multi-Scale Banking to 45 Degrees. Jeffrey
More informationVisualizing high-dimensional data:
Visualizing high-dimensional data: Applying graph theory to data visualization Wayne Oldford based on joint work with Catherine Hurley (Maynooth, Ireland) Adrian Waddell (Waterloo, Canada) Challenge p
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationLecture 3: Data Principles
Lecture 3: Data Principles Information Visualization CPSC 533C, Fall 2011 Tamara Munzner UBC Computer Science Mon, 19 September 2011 1 / 33 Papers Covered Chapter 2: Data Principles Polaris: A System for
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationMATH5745 Multivariate Methods Lecture 13
MATH5745 Multivariate Methods Lecture 13 April 24, 2018 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 1 / 33 Cluster analysis. Example: Fisher iris data Fisher (1936) 1 iris data consists of
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationScagnostics Distributions
Scagnostics Distributions Leland WILKINSON and Graham WILLS Scagnostics is a Tukey neologism for the term scatterplot diagnostics. Scagnostics are characterizations of the 2D distributions of orthogonal
More informationCS 584 Data Mining. Classification 1
CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for
More informationBL5229: Data Analysis with Matlab Lab: Learning: Clustering
BL5229: Data Analysis with Matlab Lab: Learning: Clustering The following hands-on exercises were designed to teach you step by step how to perform and understand various clustering algorithm. We will
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationHsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction
Support Vector Machine With Data Reduction 1 Table of Contents Summary... 3 1. Introduction of Support Vector Machines... 3 1.1 Brief Introduction of Support Vector Machines... 3 1.2 SVM Simple Experiment...
More informationUniversity of Florida CISE department Gator Engineering. Visualization
Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to
More informationAn L Norm Visual Classifier
An L Norm Visual Classifier Anushka Anand University of Illinois at Chicago Chicago, IL aanand2@lac.uic.edu Leland Wilkinson SYSTAT Inc. Chicago, IL leland.wilkinson@systat.com Dang Nhon Tuan University
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) 1 Classification:
More informationGRAPHICAL ALGORITHMS. UNIT _II Lecture-12 Slides No. 3-7 Lecture Slides No Lecture Slides No
GRAPHICAL ALGORITHMS UNIT _II Lecture-12 Slides No. 3-7 Lecture-13-16 Slides No. 8-26 Lecture-17-19 Slides No. 27-42 Topics Covered Graphs & Trees ( Some Basic Terminologies) Spanning Trees (BFS & DFS)
More informationInput: Concepts, Instances, Attributes
Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More information刘淇 School of Computer Science and Technology USTC
Data Exploration 刘淇 School of Computer Science and Technology USTC http://staff.ustc.edu.cn/~qiliuql/dm2013.html t t / l/dm2013 l What is data exploration? A preliminary exploration of the data to better
More informationData Mining: Exploring Data
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar But we start with a brief discussion of the Friedman article and the relationship between Data
More informationAnalytical Techniques for Anomaly Detection Through Features, Signal-Noise Separation and Partial-Value Association
Proceedings of Machine Learning Research 77:20 32, 2017 KDD 2017: Workshop on Anomaly Detection in Finance Analytical Techniques for Anomaly Detection Through Features, Signal-Noise Separation and Partial-Value
More informationOrange3 Educational Add-on Documentation
Orange3 Educational Add-on Documentation Release 0.1 Biolab Jun 01, 2018 Contents 1 Widgets 3 2 Indices and tables 27 i ii Widgets in Educational Add-on demonstrate several key data mining and machine
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2016
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2016 Admin Assignment 1 solutions will be posted after class. Assignment 2 is out: Due next Friday, but start early! Calculus and linear
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationKTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn
KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in
More informationKnowledge Discovery and Data Mining I
Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 8/9 Agenda. Introduction. Basics. Data
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level
More informationCPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018
CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2018 Admin Assignment 2 is due Friday. Assignment 1 grades available? Midterm rooms are now booked. October 18 th at 6:30pm (BUCH A102
More informationHands on Datamining & Machine Learning with Weka
Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze
More informationOut of Bundle Vodafone
Out of Bundle Vodafone Out of Bundle Charges - Vodafone The following charges apply once you have used your allowance, or if you call/text/browse outside of your allowance: Standard UK call charges Cost
More informationStatistical Methods in AI
Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing
More informationCHIRP: A new classifier based on Composite Hypercubes on Iterated Random Projections
CHIRP: A new classifier based on Composite Hypercubes on Iterated Random Projections Leland Wilkinson Anushka Anand Department of Computer Department of Computer Science Science University of Illinois
More informationA review of data complexity measures and their applicability to pattern classification problems. J. M. Sotoca, J. S. Sánchez, R. A.
A review of data complexity measures and their applicability to pattern classification problems J. M. Sotoca, J. S. Sánchez, R. A. Mollineda Dept. Llenguatges i Sistemes Informàtics Universitat Jaume I
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationStats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,
More informationFeatures: representation, normalization, selection. Chapter e-9
Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features
More informationData Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47
Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise
More informationVideo Cassette Player
3-861-061-12 (1) Video Cassette Player Operating Instructions Before operating the unit, please read this manual thoroughly, and retain it for future reference. GV-F700 1997 by Sony Corporation Operations
More informationData Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)
Data Exploration and Preparation Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann
More informationChapter 1, TUFTE STYLE GRIDDING FOR READABILITY. Chapter 5, SLICE (CROSS-SECTIONAL VIEWS)
Chapter, TUFTE STYLE GRIDDING FOR READABILITY Chapter 5, SLICE (CROSS-SECTIONAL VIEWS) Number of responses 8 7 6 5 4 3 2 9 8 7 6 5 4 3 2 Distribution of ethnicities in each income group of SF bay area
More informationPractical Machine Learning Agenda
Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data
More informationBoxplot
Boxplot By: Meaghan Petix, Samia Porto & Franco Porto A boxplot is a convenient way of graphically depicting groups of numerical data through their five number summaries: the smallest observation (sample
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationTransforming Scagnostics to Reveal Hidden Features
1624 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 20, NO. 12, DECEMBER 2014 Transforming Scagnostics to Reveal Hidden Features Tuan Nhon Dang, Leland Wilkinson Fig. 1. Scatterplots derived
More informationPackage Modeler. May 18, 2018
Version 3.4.3 Date 2018-05-17 Package Modeler May 18, 2018 Title Classes and Methods for Training and Using Binary Prediction Models Author Kevin R. Coombes Maintainer Kevin R. Coombes
More informationData analysis case study using R for readily available data set using any one machine learning Algorithm
Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning
More informationSome Open Problems in Graph Theory and Computational Geometry
Some Open Problems in Graph Theory and Computational Geometry David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science ICS 269, January 25, 2002 Two Models of Algorithms Research
More informationCourse: Geometry PAP Prosper ISD Course Map Grade Level: Estimated Time Frame 6-7 Block Days. Unit Title
Unit Title Unit 1: Geometric Structure Estimated Time Frame 6-7 Block 1 st 9 weeks Description of What Students will Focus on on the terms and statements that are the basis for geometry. able to use terms
More informationRegression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationMedical Image Analysis
Medical Image Analysis Instructor: Moo K. Chung mchung@stat.wisc.edu Lecture 10. Multiple Comparisons March 06, 2007 This lecture will show you how to construct P-value maps fmri Multiple Comparisons 4-Dimensional
More informationPackage EMDomics. November 11, 2018
Package EMDomics November 11, 2018 Title Earth Mover's Distance for Differential Analysis of Genomics Data The EMDomics algorithm is used to perform a supervised multi-class analysis to measure the magnitude
More informationInvestigating Country Differences in Mobile App User Behaviour and Challenges for Software Engineering. Soo Ling Lim
Investigating Country Differences in Mobile App User Behaviour and Challenges for Software Engineering Soo Ling Lim Analysis of app store data reveals what users do in the app store. We want to know why
More informationExperimental Design + k- Nearest Neighbors
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3
More informationExploring Data data exploration Exploratory Data Analysis
3 Exploring Data The previous chapter addressed high-level data issues that are important in the knowledge discovery process This chapter provides an introduction to data exploration, which is a preliminary
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationAccess Code and Phone Number
Algeria Dial International collect/reverse charge number: 1-212-559-5842 Argentina For phones using Telecom: Dial 0-800-555-4288; wait for prompt, then dial 866- For phones using Telefonica: Dial 0-800-222-1288;
More informationSVM Classification in -Arrays
SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What
More informationMathematics Scope & Sequence Grade 8 Revised: June 2015
Mathematics Scope & Sequence 2015-16 Grade 8 Revised: June 2015 Readiness Standard(s) First Six Weeks (29 ) 8.2D Order a set of real numbers arising from mathematical and real-world contexts Convert between
More informationCurriculum Connections (Fractions): K-8 found at under Planning Supports
Curriculum Connections (Fractions): K-8 found at http://www.edugains.ca/newsite/digitalpapers/fractions/resources.html under Planning Supports Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade
More informationAnalysis and Latent Semantic Indexing
18 Principal Component Analysis and Latent Semantic Indexing Understand the basics of principal component analysis and latent semantic index- Lab Objective: ing. Principal Component Analysis Understanding
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationDashboard. Jan 13, Jan 8, 2012 Comparing to: Site. 12,742 Visits % Bounce Rate. 00:05:26 Avg. Time on Site.
Dashboard 3 3 15 15 Jan 17 Feb 18 Mar 22 Apr 23 May 25 Jun 26 Jul 28 Aug 29 Sep 3 Nov 1 Dec 3 Ja Site Usage 12,742 4.3% Bounce Rate 39,496 Pageviews :5:26 Avg. Time on Site 3.1 Pages/Visit 61.73% % New
More informationBrookhaven School District Pacing Guide Fourth Grade Math
Brookhaven School District Pacing Guide Fourth Grade Math 11 days Aug. 7- Aug.21 1 st NINE WEEKS Chapter 1- Place value, reading, writing and comparing whole numbers, rounding, addition and subtraction
More informationWe have solid distributors in over 120 countries China China China Healthcare Equipment Medical Equipment Healthcare Equipment
Mali Ireland United Kingdom Holland Belgium Ukraine Belgium Germany Czech Republic Austria Slovakia France Hungary Moldova Switzerland Slovenia Albania Romania Spain Bulgaria Portugal Italy Montenegro
More informationCHIRP: A new classifier based on Composite Hypercubes on Iterated Random Projections
CHIRP: A new classifier based on Composite Hypercubes on Iterated Random Projections Leland Wilkinson Department of Computer Science University of Illinois at Chicago Chicago, IL 60606, USA leland.wilkinson@systat.com
More informationMinimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem
Minimum Spanning Trees (Forests) Given an undirected graph G=(V,E) with each edge e having a weight w(e) : Find a subgraph T of G of minimum total weight s.t. every pair of vertices connected in G are
More informationk Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)
k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationSummary. Machine Learning: Introduction. Marcin Sydow
Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationMiddle School Math Course 3 Correlation of the ALEKS course Middle School Math 3 to the Illinois Assessment Framework for Grade 8
Middle School Math Course 3 Correlation of the ALEKS course Middle School Math 3 to the Illinois Assessment Framework for Grade 8 State Goal 6: Number Sense 6.8.01: 6.8.02: 6.8.03: 6.8.04: 6.8.05: = ALEKS
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Visualization Value of Visualization Data And Image Models
More informationCisco Aironet In-Building Wireless Solutions International Power Compliance Chart
Cisco Aironet In-Building Wireless Solutions International Power Compliance Chart ADDITIONAL INFORMATION It is important to Cisco Systems that its resellers comply with and recognize all applicable regulations
More informationSupplementary Material
Supplementary Material Figure 1S: Scree plot of the 400 dimensional data. The Figure shows the 20 largest eigenvalues of the (normalized) correlation matrix sorted in decreasing order; the insert shows
More informationMATHEMATICS Grade 7 Advanced Standard: Number, Number Sense and Operations
Standard: Number, Number Sense and Operations Number and Number Systems A. Use scientific notation to express large numbers and numbers less than one. 1. Use scientific notation to express large numbers
More informationData Mining of Range-Based Classification Rules for Data Characterization
Data Mining of Range-Based Classification Rules for Data Characterization Achilleas Tziatzios A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer
More informationPredictive Analysis: Evaluation and Experimentation. Heejun Kim
Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training
More informationEPL451: Data Mining on the Web Lab 10
EPL451: Data Mining on the Web Lab 10 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Dimensionality Reduction Map points in high-dimensional (high-feature) space
More informationPITSCO Math Individualized Prescriptive Lessons (IPLs)
Orientation Integers 10-10 Orientation I 20-10 Speaking Math Define common math vocabulary. Explore the four basic operations and their solutions. Form equations and expressions. 20-20 Place Value Define
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationA Keypoint Descriptor Inspired by Retinal Computation
A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement
More informationIntroduction to R and Statistical Data Analysis
Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,
More informationTopic Code Indicators Srategies
Angle Relationships, Parallel Lines Angle Relationships, Interior/Exterior angles 8G2 8M8 Recognize the angles formed and the relationship between the angles when two lines intersect and when parallel
More informationVisual Computing. Lecture 2 Visualization, Data, and Process
Visual Computing Lecture 2 Visualization, Data, and Process Pipeline 1 High Level Visualization Process 1. 2. 3. 4. 5. Data Modeling Data Selection Data to Visual Mappings Scene Parameter Settings (View
More informationIntegrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.
Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square
More informationCenter for the Management of Information Technology March 9, 2012
Big Data @ comscore Turning Billions of Rows into Market Insight Michael Brown March 9 th, 2012 comscore is a Global Leader in Measuring the Digital World NASDAQ Clients SCOR Employees 1000+ Headquarters
More informationComputational Geometry. Algorithm Design (10) Computational Geometry. Convex Hull. Areas in Computational Geometry
Computational Geometry Algorithm Design (10) Computational Geometry Graduate School of Engineering Takashi Chikayama Algorithms formulated as geometry problems Broad application areas Computer Graphics,
More information