Orange3-Prototypes Documentation. Biolab, University of Ljubljana

Size: px
Start display at page:

Download "Orange3-Prototypes Documentation. Biolab, University of Ljubljana"

Transcription

1 Biolab, University of Ljubljana Dec 17, 2018

2

3 Contents 1 Widgets 1 2 Indices and tables 11 i

4 ii

5 CHAPTER 1 Widgets 1.1 Contingency Table Construct a contingency table from given data. Inputs Data input dataset Outputs Contingency Table data table with frequency counts Contingency Table computes occurrences (frequencies) of two discrete variables (rows and columns). 1. Attribute values placed in rows. 2. Attribute values placed in columns. 3. Click Apply to commit the changes. To communicate changes automatically tick Apply Automatically. 4. Access widget help and produce report. 1

6 1.1.1 Example Contingency table can be computed only for discrete variables, so we will use titanic data set as an example. Load the data in the File widget and pass it to Contingency Table. Say I want to know how many second class passengers on Titanic were children. Let us select status for rows and age for columns. We can observe the computed table in a Data Table widget. The answer to our question seems to be EnKlik Anketa Import data from EnKlikAnketa (1ka.si) public URL. Inputs None Outputs Data survey results The EnKlik Anketa widget retrieves survey results obtained from the EnKlikAnketa service. You need to create a public link to to retrieve the results. Go to the survey you wish to retrieve, then select Data (Podatki) tab and create a public link (javna povezava) at the top right corner. Then insert the link into the Public link URL field. The link should look something like this: podatki/123456/78a9b1cd/. 1. A public link to the survey results. To observe the results live, set the reload rate (5s - 5 min). 2. Attribute list. You can change the attribute type and role, just like in the File widget. 2 Chapter 1. Widgets

7 1.2. EnKlik Anketa 3

8 3. Survey meta information. 4. Tick the box on the left to commit the changes automatically. Alternatively, click Commit. 5. Access widget help Example EnKlik Anketa widget is great for observing results from online surveys. We have created a sample survey and imported it into the widget. We have 41 responses and we have asked 8 questions, 7 of which were recognized as features and 1 as a meta attribute. The widget sets questions from the survey as feature names. This, however, might be slighlty impractical for analytical purposes, as we can see in the Data Table. We will shorten the names with Edit Domain widget. Edit Domain enables us to change attribute names and even rename attribute values for discrete attributes. Now our attribute names are much easier to work with, as we can see in Data Table (1). 1.3 Neighbors Compute nearest neighbors in data according to reference Signals Inputs: 4 Chapter 1. Widgets

9 Data An input data set. Reference A reference data instance for neighbor computation. Outputs: Neighbors A data table of nearest neighbors according to reference Description The Neighbors widget computes nearest neighbors for a given reference and for a given distance measure. 1. Information on the input data. 2. Distance measure for computing neighbors. Supported measures are: Euclidean, Manhattan, Mahalanobis, Cosine, Jaccard, Spearman, absolute Spearman, Pearson, absolute Pearson. If Exclude references is ticked, reference data won t be included in the output. 3. Number of neighbors on the output. 4. Click Apply to commit the changes. To communicate changes automatically tick Apply Automatically. 5. Access widget help Examples In the first example, we used iris data and passed it to Neighbors and to Data Table. In Data Table, we selected an instance of iris, that will serve as our reference, meaning we wish to retrieve 10 closest examples to the select data instance. We connect Data Table to Neighbors as well. We can observe the results of neighbor computation in Data Table (1), where we can see 10 closest images to our selected iris flower. Another example requires the installation of Image Analytics add-on. We loaded 15 paintings from famous painters with Import Images widget and passed them to Image Embedding, where we selected Painters embedder Neighbors 5

10 Then the procedure is the same as above. We passed embedded images to Image Viewer and selected a painting from Monet to serve as our reference image. We passed the image to Neighbors, where we set the distance measure to cosine, ticked off Exclude reference and set the neighbors to 2. This allows us to find the actual closest neighbor to a reference painting and observe them side by side in Image Viewer (1). 1.4 Parallel Coordinates Parallel coordinates display of multi-dimensional data. Inputs Outputs Data input dataset Features list of attributes Selected Data instances selected from the plot Annotated Data data with an additional column showing whether a point is selected Features list of attributes The Parallel Coordinates widget shows high-dimensional data in a plot. The widget will display the first 9 attributes and color them by class, if class is present. The widget also enables plot optimization and subset selection. 1. Color lines (instances) by an attribute. Colored by class by default. 2. Select the dimensions you wish to display. Click Optimizie Selected Dimensions to optimize the plot. 3. Click Apply to commit the changes. To communicate changes automatically tick Apply Automatically. 4. Access help, save image and produce a report. To select a subset from the plot, hover over the dimension until you see the cursor change to a + and drag the selection across the dimension. Selected data instances will be on the output of the widget. You can select several subsets of 6 Chapter 1. Widgets

11 1.4. Parallel Coordinates 7

12 dimensions - only those data instances that match the all the criteria will be on the output. To remove the selection, click on the dimension outside of the selected range Example Parallel Coordinates can display multi-dimensional data, hence we will use heart-disease data set. We load it with the File widget and send it to Parallel Coordinates. We optimized the projection and selected patients who have left vent hypertrophy and a cholesterol level between 200 and 300. Finally, we sent the selected patients to Data Table for observation. 1.5 Feature Statistics Show basic statistics for data features. Inputs Data input dataset Outputs None The Feature Statistics widget displays basic infomation on feature type, its distribution, center, standard deviation, minimum and maximum value and the proportion of missing values. 1. Information on the input data. 2. Color histrograms in Distribution column by a feature. 3. Click Send to commit the changes. To communicate changes automatically tick Send Automatically. 4. Access widget help and create a report. 5. Feature statistics table: feature type (numeric, categorical, text or time) 8 Chapter 1. Widgets

13 1.5. Feature Statistics 9

14 feature name distribution in histogram (continuous variables binned in 10 bins). Class variable is used as color by default. center of the feature (mean for numeric, mode for categorical, text and time report nan) dispersion of the feature (standard deviation for numeric, entropy for categorical) minimum value maximum value percentage of missing values Example We will use iris data in the File widget and pass it to Feature Statistics. We see that iris data has 4 numeric features and 1 categorical class variable. Distributions in the widget are colored by class, where iris-setosa is blue, iris-versicolor red and iris-virginica green. We can observe other basic statistics and see whether there are any missing values in our data. Try changing the data set to housing, banking-crises and zoo to observe different feature types. 10 Chapter 1. Widgets

15 CHAPTER 2 Indices and tables genindex modindex search 11

Orange3 Educational Add-on Documentation

Orange3 Educational Add-on Documentation Orange3 Educational Add-on Documentation Release 0.1 Biolab Jun 01, 2018 Contents 1 Widgets 3 2 Indices and tables 27 i ii Widgets in Educational Add-on demonstrate several key data mining and machine

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button.

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button. Here is the Statistics button. After creating dataset you can analyze it in different ways. First, you can calculate statistics. Open Statistics dialog, Common tabsheet, click Calculate. Min, Max: minimal

More information

Hands on Datamining & Machine Learning with Weka

Hands on Datamining & Machine Learning with Weka Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze

More information

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Contents Introduction... 1 Start DIONE... 2 Load Data... 3 Missing Values... 5 Explore Data... 6 One Variable... 6 Two Variables... 7 All

More information

Orange3 Data Fusion Documentation. Biolab

Orange3 Data Fusion Documentation. Biolab Biolab Mar 07, 2018 Widgets 1 IMDb Actors 1 2 Chaining 5 3 Completion Scoring 9 4 Fusion Graph 13 5 Latent Factors 17 6 Matrix Sampler 21 7 Mean Fuser 25 8 Movie Genres 29 9 Movie Ratings 33 10 Table

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA.

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA. Subject Descriptive statistics with TANAGRA. The aim of descriptive statistics is to describe the main features of a collection of data in quantitative terms 1. The visualization of the whole data table

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

More information

GETTING STARTED. A Step-by-Step Guide to Using MarketSight

GETTING STARTED. A Step-by-Step Guide to Using MarketSight GETTING STARTED A Step-by-Step Guide to Using MarketSight Analyze any dataset Run crosstabs Test statistical significance Create charts and dashboards Share results online Introduction MarketSight is a

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

University of Florida CISE department Gator Engineering. Visualization

University of Florida CISE department Gator Engineering. Visualization Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis

More information

Overview of Clustering

Overview of Clustering based on Loïc Cerfs slides (UFMG) April 2017 UCBL LIRIS DM2L Example of applicative problem Student profiles Given the marks received by students for different courses, how to group the students so that

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Probability and Statistics. Copyright Cengage Learning. All rights reserved. Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.6 Descriptive Statistics (Graphical) Copyright Cengage Learning. All rights reserved. Objectives Data in Categories Histograms

More information

The Explorer. chapter Getting started

The Explorer. chapter Getting started chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1 Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development

More information

Input: Concepts, Instances, Attributes

Input: Concepts, Instances, Attributes Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,

More information

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

Error-Bar Charts from Summary Data

Error-Bar Charts from Summary Data Chapter 156 Error-Bar Charts from Summary Data Introduction Error-Bar Charts graphically display tables of means (or medians) and variability. Following are examples of the types of charts produced by

More information

Fitting Classification and Regression Trees Using Statgraphics and R. Presented by Dr. Neil W. Polhemus

Fitting Classification and Regression Trees Using Statgraphics and R. Presented by Dr. Neil W. Polhemus Fitting Classification and Regression Trees Using Statgraphics and R Presented by Dr. Neil W. Polhemus Classification and Regression Trees Machine learning methods used to construct predictive models from

More information

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. 2018 by Minitab Inc. All rights reserved. Minitab, SPM, SPM Salford

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Finding Clusters 1 / 60

Finding Clusters 1 / 60 Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Canadian National Longitudinal Survey of Children and Youth (NLSCY)

Canadian National Longitudinal Survey of Children and Youth (NLSCY) Canadian National Longitudinal Survey of Children and Youth (NLSCY) Fathom workshop activity For more information about the survey, see: http://www.statcan.ca/ Daily/English/990706/ d990706a.htm Notice

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

e-submission of Coursework

e-submission of Coursework e-submission of Coursework Providing Feedback via Turnitin GradeMark 1. GradeMark Feedback Options In Turnitin GradeMark feedback can be provided using a combination of general/individual comments, comments

More information

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer In part from: Yizhou Sun 2008 What is WEKA? Waikato Environment for Knowledge Analysis It s a data mining/machine learning tool developed by Department of Computer Science,,

More information

Unsupervised learning

Unsupervised learning Unsupervised learning Enrique Muñoz Ballester Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy enrique.munoz@unimi.it Enrique Muñoz Ballester 2017 1 Download slides data and scripts:

More information

Microsoft Excel Basics Ben Johnson

Microsoft Excel Basics Ben Johnson Microsoft Excel Basics Ben Johnson Topic...page # Basics...1 Workbook and worksheets...1 Sizing columns and rows...2 Auto Fill...2 Sort...2 Formatting Cells...3 Formulas...3 Percentage Button...4 Sum function...4

More information

Introduction to Excel Workshop

Introduction to Excel Workshop Introduction to Excel Workshop Empirical Reasoning Center June 6, 2016 1 Important Terminology 1. Rows are identified by numbers. 2. Columns are identified by letters. 3. Cells are identified by the row-column

More information

SAS Visual Analytics 8.2: Working with Report Content

SAS Visual Analytics 8.2: Working with Report Content SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects

More information

KANRI DISTANCE CALCULATOR. User Guide v2.4.9

KANRI DISTANCE CALCULATOR. User Guide v2.4.9 KANRI DISTANCE CALCULATOR User Guide v2.4.9 KANRI DISTANCE CALCULATORTM FLOW Participants Input File Correlation Distance Type? Generate Target Profile General Target Define Target Profile Calculate Off-Target

More information

Orange3-Textable Documentation

Orange3-Textable Documentation Orange3-Textable Documentation Release 3.0a1 LangTech Sarl Dec 19, 2017 Contents 1 Getting Started 3 1.1 Orange Textable............................................. 3 1.2 Description................................................

More information

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION Gîlcă Natalia, Roșia de Amaradia Technological High School, Gorj, ROMANIA Gîlcă Gheorghe, Constantin Brîncuși University from

More information

Modify Panel. Flatten Tab

Modify Panel. Flatten Tab AFM Image Processing Most images will need some post acquisition processing. A typical procedure is to: i) modify the image by flattening, using a planefit, and possibly also a mask, ii) analyzing the

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

Figure 3.20: Visualize the Titanic Dataset

Figure 3.20: Visualize the Titanic Dataset 80 Chapter 3. Data Mining with Azure Machine Learning Studio Figure 3.20: Visualize the Titanic Dataset 3. After verifying the output, we will cast categorical values to the corresponding columns. To begin,

More information

Data Mining: Exploring Data

Data Mining: Exploring Data Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar But we start with a brief discussion of the Friedman article and the relationship between Data

More information

molegro data modeller

molegro data modeller molegro data modeller user manual MDM 2010.2.5 for Windows, Linux, and Mac OS X copyright molegro 2007-2010 page 2/173 Molegro ApS Copyright 2007-2010 Molegro ApS. All rights reserved. Molegro Data Modeller

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny 171:161: Introduction to Biostatistics Breheny Lab #3 The focus of this lab will be on using SAS and R to provide you with summary statistics of different variables with a data set. We will look at both

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

CS 584 Data Mining. Classification 1

CS 584 Data Mining. Classification 1 CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for

More information

Orange Visual Programming Documentation

Orange Visual Programming Documentation Orange Visual Programming Documentation Release 3 Orange Data Mining Sep 03, 2018 Contents 1 Getting Started 1 2 Widgets 13 i ii CHAPTER 1 Getting Started Here we need to copy the getting started guide.

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing) k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors

More information

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before

More information

EPL451: Data Mining on the Web Lab 5

EPL451: Data Mining on the Web Lab 5 EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available

More information

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\ Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization

More information

Data Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)

Data Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano) Data Exploration and Preparation Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann

More information

User Manual. ici-configuration App. ICI-Configuration App. User Manual SAP AG ici_ ConfigApp_User_Manual_ docx

User Manual. ici-configuration App. ICI-Configuration App. User Manual SAP AG ici_ ConfigApp_User_Manual_ docx ICI-Configuration App User Manual Table of Content 1 Introduction 3 1.1 Objective 3 2 Navigation 4 2.1 Views 5 2.1.1 Define Dashboard 5 2.1.2 Define Input Measures 9 2.1.3 Maintenance completed 11 3 Configuration

More information

Analysis and Latent Semantic Indexing

Analysis and Latent Semantic Indexing 18 Principal Component Analysis and Latent Semantic Indexing Understand the basics of principal component analysis and latent semantic index- Lab Objective: ing. Principal Component Analysis Understanding

More information

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

BL5229: Data Analysis with Matlab Lab: Learning: Clustering BL5229: Data Analysis with Matlab Lab: Learning: Clustering The following hands-on exercises were designed to teach you step by step how to perform and understand various clustering algorithm. We will

More information

VIDAEXPERT: WORKING WITH DATASET. First, open a new project. Or, if you have saved project, click this

VIDAEXPERT: WORKING WITH DATASET. First, open a new project. Or, if you have saved project, click this First, open a new project Or, if you have saved project, click this If you already have a map saved in ViDaExpert, click this To open a new datatable, click this If you already have a dataset saved in

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Statistical Methods in AI

Statistical Methods in AI Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing

More information

BE Share. Microsoft Office SharePoint Server 2010 Basic Training Guide

BE Share. Microsoft Office SharePoint Server 2010 Basic Training Guide BE Share Microsoft Office SharePoint Server 2010 Basic Training Guide Site Contributor Table of Contents Table of Contents Connecting From Home... 2 Introduction to BE Share Sites... 3 Navigating SharePoint

More information

Contents. Xweb User Manual

Contents. Xweb User Manual USER MANUAL Contents 1. Website/Pages/Sections/Items/Elements...2 2. Click & Edit, Mix & Match (Drag & Drop)...3 3. Adding a Section...4 4. Managing Sections...5 5. Adding a Page...8 6. Managing Pages

More information

Contents... 1 Installation... 3

Contents... 1 Installation... 3 Contents Contents... 1 Installation... 3 1 Prerequisites (check for.net framework 3.5)... 3 Install Doctor Eye... 3 Start Using Doctor Eye... 4 How to create a new user... 4 The Main Window... 4 Open a

More information

Minimum Spanning Tree

Minimum Spanning Tree Chapter 478 Minimum Spanning Tree Introduction A minimum spanning tree links all nodes (points or vertices) of a network with the minimum length of all arcs. This procedure finds the minimum spanning tree

More information

Orange Documentation. Release 3.0. Biolab

Orange Documentation. Release 3.0. Biolab Orange Documentation Release 3.0 Biolab November 10, 2014 Contents 1 Data model (data) 1 1.1 Data Storage (storage)........................................ 1 1.2 Data Table (table)...........................................

More information

Digital Image Processing through Hierarchical Clustering Methods, Tree Classifier of Data Mining

Digital Image Processing through Hierarchical Clustering Methods, Tree Classifier of Data Mining Digital Image Processing through Hierarchical Clustering Methods, Tree Classifier of Data Mining Reena Hooda * *Assistant Professor, Department of Computer Science & Applications, Indira Gandhi University

More information

Figure 1. Double-click on the input and output pins to launch the BRDViewer.

Figure 1. Double-click on the input and output pins to launch the BRDViewer. Viewing Data One of Lavastorm Desktop Professional s many features and benefits is the ability to see data from any part of the analytic process from the beginning, throughout the middle and at the end.

More information

ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN

ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN THE SEVENTH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2002), DEC. 2-5, 2002, SINGAPORE. ADAPTIVE TEXTURE IMAGE RETRIEVAL IN TRANSFORM DOMAIN Bin Zhang, Catalin I Tomai,

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

Orange Visual Programming Documentation

Orange Visual Programming Documentation Orange Visual Programming Documentation Release 3 Orange Data Mining Nov 24, 2017 Contents 1 Getting Started 1 2 Widgets 13 i ii CHAPTER 1 Getting Started Here we need to copy the getting started guide.

More information

Introduction to Excel 2007

Introduction to Excel 2007 Introduction to Excel 2007 These documents are based on and developed from information published in the LTS Online Help Collection (www.uwec.edu/help) developed by the University of Wisconsin Eau Claire

More information

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1 SPSS IBMSPSSSTATL1P IBMSPSSSTATL1P: IBM SPSS Statistics Level 1 Version: 4.4 QUESTION NO: 1 Which statement concerning IBM SPSS Statistics application windows is correct? A. At least one Data Editor window

More information

Shopping Cart: Queries, Personalizations, Filters, and Settings

Shopping Cart: Queries, Personalizations, Filters, and Settings Shopping Cart: Queries, Personalizations, Filters, and Settings on the Shopping Cart Home Page Use this Job Aid to: Learn how to organize the Shopping Cart home page so that it is easier to use. BEFORE

More information

Google Sheets: Spreadsheet basics

Google Sheets: Spreadsheet basics Google Sheets: Spreadsheet basics You can find all of your spreadsheets on the Google Sheets home screen or in Google Drive. Create a spreadsheet On the Sheets home screen, click Create new spreadsheet

More information

CUSTOMER PORTAL. Micro Survey Guide

CUSTOMER PORTAL. Micro Survey Guide CUSTOMER PORTAL Micro Survey Guide 1 Micro Surveys With the ability to create custom surveys with different response types. The micro surveys can be added to any access journey. Once the customer has authenticated,

More information

Tableau Advanced Training. Student Guide April x. For Evaluation Only

Tableau Advanced Training. Student Guide April x. For Evaluation Only Tableau Advanced Training Student Guide www.datarevelations.com 914.945.0567 April 2017 10.x Contents A. Warm Up 1 Bar Chart Colored by Profit 1 Salary Curve 2 2015 v s. 2014 Sales 3 VII. Programmatic

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

刘淇 School of Computer Science and Technology USTC

刘淇 School of Computer Science and Technology USTC Data Exploration 刘淇 School of Computer Science and Technology USTC http://staff.ustc.edu.cn/~qiliuql/dm2013.html t t / l/dm2013 l What is data exploration? A preliminary exploration of the data to better

More information

Introduction to Excel Workshop

Introduction to Excel Workshop Introduction to Excel Workshop Empirical Reasoning Center September 9, 2016 1 Important Terminology 1. Rows are identified by numbers. 2. Columns are identified by letters. 3. Cells are identified by the

More information

Advanced RNA-Seq 1.5. User manual for. Windows, Mac OS X and Linux. November 2, 2016 This software is for research purposes only.

Advanced RNA-Seq 1.5. User manual for. Windows, Mac OS X and Linux. November 2, 2016 This software is for research purposes only. User manual for Advanced RNA-Seq 1.5 Windows, Mac OS X and Linux November 2, 2016 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark Contents 1 Introduction

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

More information

Using Excel for a Gradebook: Advanced Gradebook Formulas

Using Excel for a Gradebook: Advanced Gradebook Formulas Using Excel for a Gradebook: Advanced Gradebook Formulas Objective 1: Review basic formula concepts. Review Basic Formula Concepts Entering a formula by hand: Always start with an equal sign, and click

More information

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 6. k-means Algorithm Spring 2010 Instructor: Dr. Masoud Yaghini Outline K-Means Algorithm in K-Means Node References K-Means Algorithm in Overview The k-means method is a clustering

More information

Blue Form Builder extension for Magento 2

Blue Form Builder extension for Magento 2 Blue Form Builder extension for Magento 2 User Guide Version 1.0 Table of Contents I) Introduction......5 II) General Configurations....6 1) General Settings.....7 2) ReCaptcha... 8 III) Manage Forms......

More information

Creating a Histogram Creating a Histogram

Creating a Histogram Creating a Histogram Creating a Histogram Another great feature of Excel is its ability to visually display data. This Tip Sheet demonstrates how to create a histogram and provides a general overview of how to create graphs,

More information