VIDAEXPERT: DATA ANALYSIS Here is the Statistics button.

Similar documents
Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Workload Characterization Techniques

Clustering and Dimensionality Reduction

Cluster Analysis for Microarray Data

Introduction to Artificial Intelligence

VIDAEXPERT: WORKING WITH DATASET. First, open a new project. Or, if you have saved project, click this

10-701/15-781, Fall 2006, Final

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Orange3-Prototypes Documentation. Biolab, University of Ljubljana

Intro to Artificial Intelligence

Vincent Mouchi, Quentin G. Crowley, Teresa Ubide, 2016

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Discriminate Analysis

Jeff Howbert Introduction to Machine Learning Winter

Unsupervised learning: Clustering & Dimensionality reduction. Theo Knijnenburg Jorma de Ronde

Predict Outcomes and Reveal Relationships in Categorical Data

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Lab 7 Statistics I LAB 7 QUICK VIEW

16 - Comparing Groups

Network Traffic Measurements and Analysis

Exploratory data analysis for microarrays

Calculating a PCA and a MDS on a fingerprint data set

DataAssist v2.0 Software User Instructions

Supervised vs unsupervised clustering

Probabilistic Analysis Tutorial

Distances, Clustering! Rafael Irizarry!

addition + =5+C2 adds 5 to the value in cell C2 multiplication * =F6*0.12 multiplies the value in cell F6 by 0.12

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Orange3 Educational Add-on Documentation

Data Mining: STATISTICA

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Work 2. Case-based reasoning exercise

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Statistical Pattern Recognition

Kernel Principal Component Analysis: Applications and Implementation

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Microsoft Office Excel 2007

RSG online program documentation

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Clustering and Visualisation of Data

User Documentation for IDVR (Importance driven volume rendering)

Feature Selection Using Principal Feature Analysis

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3

k-nn classification with R QMMA

User's Manual. TA320/TA520 Optical Disk Inter-Symbol Interference Analysis Software. IM E 1st Edition. IM E 1st Edition

Programming Exercise 7: K-means Clustering and Principal Component Analysis

Feature Selection for fmri Classification

Clustering and The Expectation-Maximization Algorithm

Metric Learning for Large-Scale Image Classification:

Clustering CS 550: Machine Learning

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

EXCEL 2003 DISCLAIMER:

Instance-based Learning

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

Statistical Pattern Recognition

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Cluster Analysis. Angela Montanari and Laura Anderlucci

Statistical Pattern Recognition

MSA220 - Statistical Learning for Big Data

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

CBioVikings. Richard Röttger. Copenhagen February 2 nd, Clustering of Biomedical Data

How to use FSBforecast Excel add in for regression analysis

Tips for working efficiently with Excel's fill handle

Application of K-Means Clustering Methodology to Cost Estimation

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

SETTLEMENT OF A CIRCULAR FOOTING ON SAND

UW Department of Chemistry Lab Lectures Online

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Visual Analytics Tools for the Global Change Assessment Model. Ross Maciejewski Arizona State University

INF 4300 Classification III Anne Solberg The agenda today:

FlowJo Software Lecture Outline:

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Metabolomic Data Analysis with MetaboAnalyst

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Figure 1. Double-click on the input and output pins to launch the BRDViewer.

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

Snap Shot. User Guide

Machine Learning in Biology

Cluster analysis. Agnieszka Nowak - Brzezinska

Linköping Institute of Technology Department of Mathematics/Division of Optimization Kaj Holmberg. Lab Information

A Dendrogram. Bioinformatics (Lec 17)

Using the DATAMINE Program

Basics: How to Calculate Standard Deviation in Excel

Getting Started with Microsoft Excel 2013

FANTOM: Functional and Taxonomic Analysis of Metagenomes

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Visual Representations for Machine Learning

Data Analytics for. Transmission Expansion Planning. Andrés Ramos. January Estadística II. Transmission Expansion Planning GITI/GITT

Transcription:

Here is the Statistics button. After creating dataset you can analyze it in different ways. First, you can calculate statistics. Open Statistics dialog, Common tabsheet, click Calculate. Min, Max: minimal and maximal field values after normalization. The numbers in parentheses with # symbol mean the numbers of the record where the maximum and minimum values were reached. The same with Mean. The number just below the Mean word means the number of object closest to the average point. Stand.dev.: standard deviations after normalization. The numbers in parentheses indicate how many objects are in the interval (m-σ,m+σ), where m average, σ standard deviation. The number just below the Stand.dev. word means overall standard deviation. Real min, Real max, Real mean Mean, Max, Mean values before normalization. You can print out all numbers in Excel.

Euclidean Distances matrix (after normalization). Choose Distances tabsheet, click Calculate. (Not very useful if you have thousands of objects. ;))

Correlations matrix. Red color indicate values greater than 0.8, blue greater then 0.6, green greater than 0.4.

First line - eigen values of the corresponding vector. Second line overall contribution in summary dispersion. From this you can determine effective linear dimension of the dataset. In this example Iris dataset is 2D dimensional with accuracy 0.05 and 3D dimensional with accuracy 0.01. Principal components analysis. Coordinates of principal vectors are shown. Red color indicate the maximum value. Green several big values. 1 Tip: print the table in Excel 0.5 to plot different diagrams. 0 1 2 3 4

Histogram for every color of points in the dataset. Choose the field in the left listbox. If you want to see the combined histogram, click Unified.

More sophisticated individual-object analysis. Choose an object in the left listbox. You can change the Mark of the object in the Field combobox. On the right diagram you can see distribution of other objects by their distances from the selected one. In the middle you can evaluate the probabilities of field values for the object. In Iris dataset you have 150 records. For the 119 th object (selected on the picture) you have probability of N2 field = 4/150, and probability of N2 in the same class = 1/150. It means that there are only 4 objects (including 119th) in the dataset with value of N2 in the same interval as the 119 th object has. And only 119 th object has this value of N2 in it s class (green).

Data analysis dialog On Clustering tabsheet select method of clustering, number of clusters (not for all methods) and click Analyze. You will see the results on the Map panel. Cancel will cancel the clustering. Remember colors will assign for every point it s current color (result of clustering). Numbers in table will put information about cluster s numbers in the datatable. Distances in table will put information about distances from the centroids in the datatable. Here you see the colors of the clusters, number of objects in every cluster and interclass deviation value (compactness).

Click this to switch between Map and Table panels. You can see records colored accordingly to the cluster number. If you have clicked the Numbers in table button, you will see the numbers of clusters. Tip: right-click on the table and print the table in Excel worksheet or CSV file for the further analysis.

Hierarchical clustering, constructing minimal spanning tree. There are two modes for the method. In Hierarchical, specify number of clusters method you have to specify the number of clusters. In Hierarchical method you need to specify Sensitivity parameter on the Sensitivity trackbar. Sensitivity is the maximal length of edge to be cut.

You can change sizes of the points accordingly to some criteria. For example, Distance from the closest node of the constructed net or Value of a field. Choose criteria, specify minimal and maximal size of the point and click Analyze.

Linear discriminative analysis. Select the color of the class to be separated from others, mark the coordinates you want to use in decision function and click Analyze. You will see the result. Other classes are indicated by black color. Big points indicate error level of classification. In addition you see the coefficient s values. For example, in this situation the decision function is f = -0.69+0.0032N1-0.14N2+0.42N3+0.57N4 (you should use normalized values of N1,N2,N3,N4).

Linear regression analysis. Select the field to be calculated, mark the coordinates you want to use in regression function and click Analyze. Big points indicate error level of calculation. For example, in this situation the function is N1 = 5.84+0.28N2+1.24N3-0.42N4 (you should use normalized values of N2,N3,N4, but the resulting value is not normalized). Try to change value of Quality to test accuracy level.

You can visualize values of two linear functions: from Linear Discriminative Analysis and from Linear Regression Analysis using map coloring. For example, on the picture you can see 0-1 valued linear decision function.

Select points by clicking on the Select button and dragging the mouse over the map. You can change properties of displaing of the selected points in the Selection dialog.

Click this to switch between Map and Table panels. Here you can see the selected points as records in the datatable.

Annotation dialog You can assign for some points labels with their short description. In Annotation dialog choose which points do you need to label, content of the labels and click Apply.