Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016

Size: px
Start display at page:

Download "Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016"

Transcription

1 Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016

2 Overview Overview (short: we covered most of this in the tutorial) Why infographics and visualisation What s the problem we re trying to solve? What makes for good infographics and visualisations? Where are we now in this area? Interactive visualisations ITNPD4: Applications of Big Data 2

3 The problem Data analysis may tell you something about the structure of a problem Or may predict how to optimise something Profit, energy usage etc. BUT: In general you will have to convince someone else And they may not be convinced by the numbers on their own They expect some sort of graphic that they can show to the Board/CEO to convince them A visualisation, perhaps an infographic. The other side of this is that people may be presenting their data with a particular axe to grind ITNPD4: Applications of Big Data 3

4 Visualisation and infographics Visualisation is the generic name for displaying data May be a single image Or a movie, for example. Visualizations help people see things that were not obvious to them before (SAS website) There is also sonification, where data is sounded out: this works, because our ears are very good a picking up patterns. E.g. Geiger counter, reversing systems in modern cars. Infographics may be single images Providing a visualisation of a specific set of data. But they may also be interactive ITNPD4: Applications of Big Data 4

5 Infographics An infographic is a picture that displays information in an accessable and/or informative way. Can be quite simple or quite complex ITNPD4: Applications of Big Data 5

6 not a new idea (Minard, 1869)! The standard text in this area is E. R. Tufte, The visual display of quantitative information ITNPD4: Applications of Big Data 6

7 Infographic shows the troops and troop movements on the eastern from in World War 2. ITNPD4: Applications of Big Data 7

8 Visualisation of low-dimensional datasets Low-dimensional datasets are often visualised as simple X/Y graphs: but even here there are issues For both X and Y axes: Offset (is the origin at 0?) Scale Linear or logarithmic? Continuous or broken axes. Graph lines: One or more than one? Line style: continuous, dashed, dotted Line colour Symbols and/or lines? ITNPD4: Applications of Big Data 8

9 ITNPD4: Applications of Big Data

10 Using different line styles and colours ITNPD4: Applications of Big Data 10

11 Visualising 3D data. ITNPD4: Applications of Big Data 11

12 Visualising high dimensional datasets This is harder: and can be where infographics comes in Cannot do this directly. Can plot two or three dimensions directly, but not more Clever infographics can plot more dimensions, for example using geographical location, lines of varying thickness and colour, multiple symbols How can we show the structure of such datasets? When we can t think of one-off target-domain clever tricks Discuss earlier infographics Clearly depends on what we are trying to show! Geography as timeline, for example See also ITNPD4: Applications of Big Data 12

13 What can we do in general Let s say that we don t have any inspiration for designing a good infographic (!) Infographics often depends on specific factors E.g. dates, geographic distribution, Can we find 2 or 3 (or even a few more) dimensions that in some sense summarise (what we want to emphasise about) the dataset? Ways forward: projecting and clustering ITNPD4: Applications of Big Data 13

14 Choosing dimensions and projecting data If the data is randomly spread throughout all the dimensions and has no structure? Give up. There s nothing to be learned from it (if it really is random) Datasets that have something to tell us have some from of structure Maybe the data lie (largely) on a smaller dimensional subset of the high-dimensional space. As opposed to being spread randomly and evenly throughout the original space. ITNPD4: Applications of Big Data 14

15 Example Say that we have 3-dimensional data, sampled over time Each point is (x,y,z,t): really 4-dimensional data and -1 <= x 2 +y 2 +z 2 <=1, 0<=t<=10 (the points (x,y,z) are inside a sphere, of radius 1, centered at the origin) Let s also say that at each time t, sqrt(x 2 +y 2 +z 2 ) = t/10 So that the points at time t are on the surface of a sphere of radius t/10 Clearly, if we simply look at all the(x,y,z) points (ignoring t) they are spread throughout the sphere But not in an unstructured way ITNPD4: Applications of Big Data 15

16 Discovering structure in data There are many techniques for discovering (uncovering) structure Principal component analysis (pca) Linearly projecting a high dimensional dataset on to a smaller number of dimensions In such a way that as much as possible of the variance in the data is contained in this smaller number of dimensions And the dimensions are orthogonal to each other Well-understood and commonly used technique for data dimension reduction ITNPD4: Applications of Big Data 16

17 ITNPD4: Applications of Big Data 17

18 Independent components analysis Independent components analysis (ica) a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. Hyvärinen, (U Helsinki) Essentially looking for dimensions that co-vary Finding ways of summarising points in the N-dimensional space using less than N values. Data is assumed to be a linear mixture of underlying latent variables These are assumed non-gaussian, and mutually independent: independent components Related to PCA, but can find structure when PCA fails to do so ITNPD4: Applications of Big Data 18

19 Example: input ITNPD4: Applications of Big Data 19

20 ICA output ITNPD4: Applications of Big Data 20

21 ITNPD4: Applications of Big Data 21

22 Clustering data Often rather than projecting data on to other axes, it is better to look at how the data points are grouped The aim is to classify a large number of data vectors into a small number of manageable groups Does the data fall into clusters? How unevenly distributed is the data? Does it cluster in The original high-dimensional space In a lower-dimensional projected space? ITNPD4: Applications of Big Data 22

23 How does clustering work? Techniques Partition or Hierarchical ITNPD4: Applications of Big Data 23

24 Examples ITNPD4: Applications of Big Data 24

25 Partition-based clustering Based on distance between vectors But which distance? Euclidean City-block? Weighted versions Chebychev distance Forming clusters: Simple method: Start with each vector as a single-element cluster Identify two closest vectors and combine them into the same cluster. Keep doing this until the distance between the two closest vectors not in the same cluster is large. ITNPD4: Applications of Big Data 25

26 Criticisms of clustering Clustering is descriptive, and not unique Actual clusters may depend on techniques used, as well as on the data Clustering techniques will always find clusters Even when there aren t any! (This implies some measure for quality of clustering should be used) Clustering techniques depend strongly on the measures used There should ideally be some conceptual support of the measures used to calculate distances between vectors. ITNPD4: Applications of Big Data 26

27 Examples: Google News indexes Uses text to create topic clusters Title, article listings Used to discover multiple reports of same story Video clusters on YouTube Uses keywords, popularity, viewer engagement, user browsing history ITNPD4: Applications of Big Data 27

28 Infographics tools At its simplest, Excel has many facilities for creating infographics and visualisations. But it s limited, and proprietary (though one can import comma separated values) Matlab? Not free! Good graphing tools Flot: jquery and JavaScript based Google Chart API: free JavaScript based, browser output D3: JavaScript based, very powerful. ITNPD4: Applications of Big Data 28

29 Using visualisation and infographics As noted earlier, infographics and visualisation Is about communication of ideas about data, discoveries from data mining etc to others But visualisation has another important usage as well Exploratory (Initial) data analysis How can you decide which tools to apply to data and how to apply them if you haven t an initial idea of what might be useful? ITNPD4: Applications of Big Data 29

30 ITNPD4: Applications of Big Data 30

Exploratory Analysis: Clustering

Exploratory Analysis: Clustering Exploratory Analysis: Clustering (some material taken or adapted from slides by Hinrich Schutze) Heejun Kim June 26, 2018 Clustering objective Grouping documents or instances into subsets or clusters Documents

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

BIG Data How to handle it. Mark Holton, College of Engineering, Swansea University,

BIG Data How to handle it. Mark Holton, College of Engineering, Swansea University, BIG Data How to handle it Mark Holton, College of Engineering, Swansea University, m.d.holton@swansea.ac.uk The usual What I m going to talk about The source of the data some tag (data loggers) history

More information

16 Data Visualizations. to Improve Your Application

16 Data Visualizations. to Improve Your Application 16 Data Visualizations to Improve Your Application Table of Contents Best data visualizations to boost customer satisfaction Introduction 2 Types of Visualizations 3 Static vs. Animated Charts 6 Drilldowns

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Data Visualization for M&E. BRIDGE M&E Colloquium Jerusha Govender 8 August 2017

Data Visualization for M&E. BRIDGE M&E Colloquium Jerusha Govender 8 August 2017 Data Visualization for M&E BRIDGE M&E Colloquium Jerusha Govender 8 August 2017 About Us We help organizations tell their story through innovative analysis, data visualization & strategic communication

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information

Overview for Families

Overview for Families unit: Picturing Numbers Mathematical strand: Data Analysis and Probability The following pages will help you to understand the mathematics that your child is currently studying as well as the type of problems

More information

Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.)

Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.) Advanced data visualization (charts, graphs, dashboards, fever charts, heat maps, etc.) It is a graphical representation of numerical data. The right data visualization tool can present a complex data

More information

turning data into dollars

turning data into dollars turning data into dollars Tom s Ten Data Tips November 2008 Neural Networks Neural Networks (NNs) are sometimes considered the epitome of data mining algorithms. Loosely modeled after the human brain (hence

More information

1 SEO Synergy. Mark Bishop 2014

1 SEO Synergy. Mark Bishop 2014 1 SEO Synergy 2 SEO Synergy Table of Contents Disclaimer... 3 Introduction... 3 Keywords:... 3 Google Keyword Planner:... 3 Do This First... 4 Step 1... 5 Step 2... 5 Step 3... 6 Finding Great Keywords...

More information

Clustering and Dimensionality Reduction

Clustering and Dimensionality Reduction Clustering and Dimensionality Reduction Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: Data Mining Automatically extracting meaning from

More information

Working with Charts Stratum.Viewer 6

Working with Charts Stratum.Viewer 6 Working with Charts Stratum.Viewer 6 Getting Started Tasks Additional Information Access to Charts Introduction to Charts Overview of Chart Types Quick Start - Adding a Chart to a View Create a Chart with

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

CIS192 Python Programming

CIS192 Python Programming CIS192 Python Programming Machine Learning in Python Robert Rand University of Pennsylvania October 22, 2015 Robert Rand (University of Pennsylvania) CIS 192 October 22, 2015 1 / 18 Outline 1 Machine Learning

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

What Type Of Graph Is Best To Use To Show Data That Are Parts Of A Whole

What Type Of Graph Is Best To Use To Show Data That Are Parts Of A Whole What Type Of Graph Is Best To Use To Show Data That Are Parts Of A Whole But how do you choose which style of graph to use? This page sets They are generally used for, and best for, quite different things.

More information

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018 CPSC 340: Machine Learning and Data Mining Deep Learning Fall 2018 Last Time: Multi-Dimensional Scaling Multi-dimensional scaling (MDS): Non-parametric visualization: directly optimize the z i locations.

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

What to come. There will be a few more topics we will cover on supervised learning

What to come. There will be a few more topics we will cover on supervised learning Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression

More information

Nearest Neighbor Classification. Machine Learning Fall 2017

Nearest Neighbor Classification. Machine Learning Fall 2017 Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision

More information

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Dimensional Scaling Fall 2017 Assignment 4: Admin 1 late day for tonight, 2 late days for Wednesday. Assignment 5: Due Monday of next week. Final: Details

More information

CSC 2515 Introduction to Machine Learning Assignment 2

CSC 2515 Introduction to Machine Learning Assignment 2 CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

MIS2502: Data Analytics Principles of Data Visualization. Alvin Zuyin Zheng

MIS2502: Data Analytics Principles of Data Visualization. Alvin Zuyin Zheng MIS2502: Data Analytics Principles of Data Visualization Alvin Zuyin Zheng zheng@temple.edu http://community.mis.temple.edu/zuyinzheng/ Data visualization can: provide clear understanding of patterns in

More information

SAS Visual Analytics 8.2: Getting Started with Reports

SAS Visual Analytics 8.2: Getting Started with Reports SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

Programming. Dr Ben Dudson University of York

Programming. Dr Ben Dudson University of York Programming Dr Ben Dudson University of York Outline Last lecture covered the basics of programming and IDL This lecture will cover More advanced IDL and plotting Fortran and C++ Programming techniques

More information

Facebook Page Insights

Facebook Page Insights Facebook Product Guide for Facebook Page owners Businesses will be better in a connected world. That s why we connect 845M people and their friends to the things they care about, using social technologies

More information

Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results?

Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results? Declutter your Spreadsheets by Hiding Zero Values Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results? Undertaking data

More information

K-means Clustering & PCA

K-means Clustering & PCA K-means Clustering & PCA Andreas C. Kapourani (Credit: Hiroshi Shimodaira) 02 February 2018 1 Introduction In this lab session we will focus on K-means clustering and Principal Component Analysis (PCA).

More information

CPSC 536N: Randomized Algorithms Term 2. Lecture 5

CPSC 536N: Randomized Algorithms Term 2. Lecture 5 CPSC 536N: Randomized Algorithms 2011-12 Term 2 Prof. Nick Harvey Lecture 5 University of British Columbia In this lecture we continue to discuss applications of randomized algorithms in computer networking.

More information

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Choosing the right graph in Excel

Choosing the right graph in Excel Choosing the right graph in Excel Guide? Presentation Level? Graph type Example Application Variants Notes (Y) Column Shows data change over time Illustrates comparisons (Y) Bar Illustrates comparisons

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007 What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Outlier detection using autoencoders

Outlier detection using autoencoders Outlier detection using autoencoders August 19, 2016 Author: Olga Lyudchik Supervisors: Dr. Jean-Roch Vlimant Dr. Maurizio Pierini CERN Non Member State Summer Student Report 2016 Abstract Outlier detection

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

1 Counting triangles and cliques

1 Counting triangles and cliques ITCSC-INC Winter School 2015 26 January 2014 notes by Andrej Bogdanov Today we will talk about randomness and some of the surprising roles it plays in the theory of computing and in coding theory. Let

More information

3 Vectors and the Geometry of Space

3 Vectors and the Geometry of Space 3 Vectors and the Geometry of Space Up until this point in your career, you ve likely only done math in 2 dimensions. It s gotten you far in your problem solving abilities and you should be proud of all

More information

Intro to Analytics Learning Web Analytics

Intro to Analytics Learning Web Analytics Intro to Analytics 100 - Learning Web Analytics When you hear the word analytics, what does this mean to you? Analytics is the discovery, interpretation and communication of meaningful patterns in data.

More information

DecisionPoint For Excel

DecisionPoint For Excel DecisionPoint For Excel Getting Started Guide 2015 Antivia Group Ltd Notation used in this workbook Indicates where you need to click with your mouse Indicates a drag and drop path State >= N Indicates

More information

University of Florida CISE department Gator Engineering. Visualization

University of Florida CISE department Gator Engineering. Visualization Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TORONTO CSC318S THE DESIGN OF INTERACTIVE COMPUTATIONAL MEDIA. Lecture March 1998

DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TORONTO CSC318S THE DESIGN OF INTERACTIVE COMPUTATIONAL MEDIA. Lecture March 1998 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TORONTO CSC318S THE DESIGN OF INTERACTIVE COMPUTATIONAL MEDIA Lecture 19 30 March 1998 PRINCIPLES OF DATA DISPLAY AND VISUALIZATION 19.1 Nature, purpose of

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

How to print a Hypercube

How to print a Hypercube How to print a Hypercube Henry Segerman One of the things that mathematics is about, perhaps the thing that mathematics is about, is trying to make things easier to understand. John von Neumann once said

More information

Introduction to Data Science Lecture 8 Unsupervised Learning. CS 194 Fall 2015 John Canny

Introduction to Data Science Lecture 8 Unsupervised Learning. CS 194 Fall 2015 John Canny Introduction to Data Science Lecture 8 Unsupervised Learning CS 194 Fall 2015 John Canny Outline Unsupervised Learning K-Means clustering DBSCAN Matrix Factorization Performance Machine Learning Supervised:

More information

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders Neural Networks for Machine Learning Lecture 15a From Principal Components Analysis to Autoencoders Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Principal Components

More information

HOW-TO GUIDE. Join or Login. About this Guide!

HOW-TO GUIDE. Join or Login. About this Guide! HOW-TO GUIDE About this Guide In this guide, you will learn about each section of the online community to help you make the best use of all it has to offer. Here you will find information on: Join or Login

More information

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,

More information

Introduction to Excel

Introduction to Excel Introduction to Excel Written by Jon Agnone Center for Social Science Computation & Research 145 Savery Hall University of Washington Seattle WA 98195 U.S.A. (206)543-8110 November 2004 http://julius.csscr.washington.edu/pdf/excel.pdf

More information

Lecture 2: January 24

Lecture 2: January 24 CMPSCI 677 Operating Systems Spring 2017 Lecture 2: January 24 Lecturer: Prashant Shenoy Scribe: Phuthipong Bovornkeeratiroj 2.1 Lecture 2 Distributed systems fall into one of the architectures teaching

More information

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection CSE 158 Lecture 6 Web Mining and Recommender Systems Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016. Inf2B assignment 2 (Ver. 1.2) Natural images classification Submission due: 4pm, Wednesday 30 March 2016 Hiroshi Shimodaira and Pol Moreno This assignment is out of 100 marks and forms 12.5% of your final

More information

UNSUPERVISED LEARNING, CLUSTERING

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised learning: X - y pairs, f(x) function approximation Unsupervised learning: only X, no y Exploring the space of X

More information

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS

Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix. Paula Ahonen-Rainio Maa Visual Analysis in GIS Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015 Topics today YOUR REPORTS OF A-2 Thematic maps with charts

More information

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

understanding media metrics WEB METRICS Basics for Journalists FIRST IN A SERIES

understanding media metrics WEB METRICS Basics for Journalists FIRST IN A SERIES understanding media metrics WEB METRICS Basics for Journalists FIRST IN A SERIES Contents p 1 p 3 p 3 Introduction Basic Questions about Your Website Getting Started: Overall, how is our website doing?

More information

Slide 1 Hello, I m Jason Borgen, Program Coordinator for the TICAL project and a Google Certified Teacher. This Quick Take will show you a variety of ways to search Google to maximize your research and

More information

Data Clustering. Algorithmic Thinking Luay Nakhleh Department of Computer Science Rice University

Data Clustering. Algorithmic Thinking Luay Nakhleh Department of Computer Science Rice University Data Clustering Algorithmic Thinking Luay Nakhleh Department of Computer Science Rice University Data clustering is the task of partitioning a set of objects into groups such that the similarity of objects

More information

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017 Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

CSC411/2515 Tutorial: K-NN and Decision Tree

CSC411/2515 Tutorial: K-NN and Decision Tree CSC411/2515 Tutorial: K-NN and Decision Tree Mengye Ren csc{411,2515}ta@cs.toronto.edu September 25, 2016 Cross-validation K-nearest-neighbours Decision Trees Review: Motivation for Validation Framework:

More information

Creating Page Layouts 25 min

Creating Page Layouts 25 min 1 of 10 09/11/2011 19:08 Home > Design Tips > Creating Page Layouts Creating Page Layouts 25 min Effective document design depends on a clear visual structure that conveys and complements the main message.

More information

MITOCW ocw f99-lec07_300k

MITOCW ocw f99-lec07_300k MITOCW ocw-18.06-f99-lec07_300k OK, here's linear algebra lecture seven. I've been talking about vector spaces and specially the null space of a matrix and the column space of a matrix. What's in those

More information

ONS Beta website. 7 December 2015

ONS Beta website. 7 December 2015 ONS Beta website Terminology survey results 7 December 2015 Background During usability sessions, both moderated and online, it has become clear that users do not understand the majority of terminology

More information

SOME TYPES AND USES OF DATA MODELS

SOME TYPES AND USES OF DATA MODELS 3 SOME TYPES AND USES OF DATA MODELS CHAPTER OUTLINE 3.1 Different Types of Data Models 23 3.1.1 Physical Data Model 24 3.1.2 Logical Data Model 24 3.1.3 Conceptual Data Model 25 3.1.4 Canonical Data Model

More information

Motion Interpretation and Synthesis by ICA

Motion Interpretation and Synthesis by ICA Motion Interpretation and Synthesis by ICA Renqiang Min Department of Computer Science, University of Toronto, 1 King s College Road, Toronto, ON M5S3G4, Canada Abstract. It is known that high-dimensional

More information

Images help us relate to content, help us become involved. They help us to see ourselves in the science, rather than standing on the outskirts.

Images help us relate to content, help us become involved. They help us to see ourselves in the science, rather than standing on the outskirts. 1 2 3 Images help us relate to content, help us become involved. They help us to see ourselves in the science, rather than standing on the outskirts. 4 Why imagery? Because we are overloaded with information.

More information

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3

CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 CSE 586 Final Programming Project Spring 2011 Due date: Tuesday, May 3 What I have in mind for our last programming project is to do something with either graphical models or random sampling. A few ideas

More information

Facebook Page Insights

Facebook Page Insights Facebook Product Guide for Facebook Page owners Businesses will be better in a connected world. That s why we connect 800M people and their friends to the things they care about, using social technologies

More information

An Unsupervised Technique for Statistical Data Analysis Using Data Mining

An Unsupervised Technique for Statistical Data Analysis Using Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 5, Number 1 (2013), pp. 11-20 International Research Publication House http://www.irphouse.com An Unsupervised Technique

More information

Browsing the World Wide Web with Firefox

Browsing the World Wide Web with Firefox Browsing the World Wide Web with Firefox B 660 / 1 Try this Popular and Featurepacked Free Alternative to Internet Explorer Internet Explorer 7 arrived with a bang a few months ago, but it hasn t brought

More information

Making Science Graphs and Interpreting Data

Making Science Graphs and Interpreting Data Making Science Graphs and Interpreting Data Eye Opener: 5 mins What do you see? What do you think? Look up terms you don t know What do Graphs Tell You? A graph is a way of expressing a relationship between

More information

Visual Encoding Design

Visual Encoding Design CSE 442 - Data Visualization Visual Encoding Design Jeffrey Heer University of Washington Review: Expressiveness & Effectiveness / APT Choosing Visual Encodings Assume k visual encodings and n data attributes.

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

Nearest Neighbor Predictors

Nearest Neighbor Predictors Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,

More information

Setting up Blogger. We have focused on Blogger as it is easy to use and ideal for someone starting blogging.

Setting up Blogger. We have focused on Blogger as it is easy to use and ideal for someone starting blogging. Setting up Blogger The three most popular platforms for blogging are WordPress, Tumblr and Blogger. In Module 1 the primary features of each platform were outlined. We have focused on Blogger as it is

More information

CS Information Visualization Sep. 2, 2015 John Stasko

CS Information Visualization Sep. 2, 2015 John Stasko Multivariate Visual Representations 2 CS 7450 - Information Visualization Sep. 2, 2015 John Stasko Recap We examined a number of techniques for projecting >2 variables (modest number of dimensions) down

More information

An Introduction to PDF Estimation and Clustering

An Introduction to PDF Estimation and Clustering Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University

More information

COSC 311: ALGORITHMS HW1: SORTING

COSC 311: ALGORITHMS HW1: SORTING COSC 311: ALGORITHMS HW1: SORTIG Solutions 1) Theoretical predictions. Solution: On randomly ordered data, we expect the following ordering: Heapsort = Mergesort = Quicksort (deterministic or randomized)

More information

SEO: SEARCH ENGINE OPTIMISATION

SEO: SEARCH ENGINE OPTIMISATION SEO: SEARCH ENGINE OPTIMISATION SEO IN 11 BASIC STEPS EXPLAINED What is all the commotion about this SEO, why is it important? I have had a professional content writer produce my content to make sure that

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information