Data Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting
|
|
- Margery Allyson Jordan
- 5 years ago
- Views:
Transcription
1 CS 725/825 Information Visualization Fall 2013 Data Foundations Dr. Michele C. Weigle Topic Objectives! Distinguish between ordinal and nominal values and list subcategories of each.! Give an example of structured data and describe its properties.! Explain the importance of data preprocessing before producing a visualization! Contrast interactive subsetting with query-based subsetting! Explain the purpose of dimensional reduction. 2
2 Outline Reading Assignment (due today): Ch 2 - Data Foundations! Data! Types of Data! Structure in and between Records! Data Cleaning! Data Preprocessing! Data Sources Homework Assignment: Due next Tuesday 3 The first step is data m variables (or, attributes) n records (or, values) WGK, Fig
3 Variables: Independent vs. Dependent 5 Variables: Domain and Range! f(domain) = range!!! 6 domain - independent variables range - dependent variables f(weight) = City MPG
4 Outline! Data! Types of Data! Structure in and between Records! Data Cleaning! Data Preprocessing! Data Sources 7 Ordinal vs. Nominal Data! Binary! 0, 1! Discrete! integers: 1, 2, 3, 4, 5! Continuous! real numbers: [0, 5] or 0.1, 0.25, , 0.456! Categorical! ODU-CS faculty: Nelson, Weigle, Olariu, Maly, Nadeem,! Ranked! undergrad classes: Freshman, Sophomore, Junior, Senior! Arbitrary! last names: Helser, Bokinsky, Clark, Tarr, Kum, Van Leeuwen, Weigle,... 8
5 Scale! Ordering relation! 1, 2, 3, 4, 5! Freshman, Sophomore, Junior, Senior! Distance metric! distance between 5 and 1 is 4! Existence of absolute zero! absolute minimum weight is 0 9 Outline! Data! Types of Data! Structure within and between Records! Data Cleaning! Data Preprocessing! Data Sources 10
6 Structured Data! Data has structure! syntax! semantics! Time is special! Data has topology 11 Example of Structured Data! Vehicle Name - nominal, arbitrary! Next 8 variables - ordinal, binary! Price, cost - ordinal, discrete, absolute zero! etc. 12
7 Data Formats! Delimited Text! tabbed delimited! comma delimited (CSV)! Extensible Markup Language (XML)! looks a bit like HTML! user-defined tags to identify data! JavaScript Object Notation (JSON)! collection of name/value pairs! smaller than XML! easier to parse 13 Example of Data Formats CSV JSON , , ,27 {"observations": [ {"date":" ", "max_temp":26}, {"date":" ", "max_temp":34}, {"date":" ", "max_temp":27} ] } XML <weather_data> <observation> <date> </date> <max_temp>26</max_temp> </observation> <observation> <date> </date> <max_temp>34</max_temp> </observation> <observation> <date> </date> <max_temp>27</max_temp> </observation> </weather_data> Yau, Visualize This, Ch 2 14
8 How to convert between data formats?! Write a program to convert from one format to another! awk (my favorite, but I'm old school), Python, Perl, PHP,...! Other tools! search Google for "csv to json", "csv to xml", "xml to json"! Mr. Data Converter! developed by a graphics editor at The New York Times! input: CSV or tab-delimited data! output: HTML table, JSON, MySQL, Python, PHP, Ruby, XML, Outline! Data! Types of Data! Structure within and between Records! Data Cleaning! Data Preprocessing! Data Sources 16
9 Data in the Real World! Data can be missing, have typos, be inconsistent, spread over multiple tables, etc.! Two big issues:! format! accuracy 17 World Disasters - Inconsistent 18
10 World Disasters - Missing 19 What to do with dirty data? 20
11 Data Cleaning Tools Quick Tools Full Apps! Data Science Toolkit! lots of quick conversion tools! Mr. People! formats lists of names! Mr. Data Converter! data_converter/! Data Wrangler! app/! video: Open Refine (was Google Refine)! video: watch?v=b70j_h_zawm! more info: using-google-refine-for-datacleaning 21 What about accuracy?! Nathan Yau (Visualize This, Data Points) was intern at The New York Times! One day, his entire goal was to verify 3 numbers in a dataset! Must have accurate data before can trust the visualization Yau, Visualize This, Ch 1 22
12 Yes, this is actually a problem. 23 Graphing the raw data 24
13 Let's look at the Excel file 25 Now, let's look at the PDF 26
14 But wait, there's more funny stuff PDF Excel 27 Bottom Line! If you see something weird in your graph that you can't explain, go back and double-check your data! Even if you didn't make an error, maybe someone else did 28
15 Outline! Data! Types of Data! Structure within and between Records! Data Cleaning! Data Preprocessing! Data Sources 29 Raw vs. Processed Data 30
16 What statistics could we compute?! Mean! Standard deviation! Distribution Mean = 440 Median = Remember Anscombe's Quartet?! N = 11! Mean of the x values = 9.0! Mean of the y values = 7.5! Equation of the least-squared regression line: y = x! Sums of squared errors (about the mean) = 110.0! Regression sums of squared errors (variance accounted for by x) = 27.5! Residual sums of squared errors (about the regression line) = 13.75! Correlation coefficient = 0.82! Coefficient of determination = 0.67 F.J. Anscombe, "Graphs in Statistical Analysis", American Statistician, Feb 1973, graphs from 32
17 The four data sets are not the same 33 Other Analysis Techniques Outlier detection Cluster analysis
18 Other Analysis Techniques Correlation analysis 35 Visual Analytics Mantra "Analyze First, Show the Important, Zoom, Filter and Analyze Further, Details-on-Demand" - Daniel Keim Keim et al., "Visual Analytics: Combining Automated Discovery with Interactive Visualizations",
19 Normalization! Example: Transform range of values so that all values fall between ! Why?! compare seemingly unrelated variables! How?! d normal = (d orig - d min ) / (d max - d min ) 37 Subsetting! Set of constraints to retrieve only data that meets desired conditions 38
20 Query-Based Subsetting "Find all employees who have not made a sale in the past year" SELECT * FROM EMP AS e WHERE NOT EXISTS (SELECT * FROM SALE AS s WHERE s.eid = e.eid) 39 Interactive Subsetting 40
21 Dimension Reduction! What about when there are lots of dimensions (attributes)?! allow user to select more important dimensions! or reduce automatically through computational methods! principal component analysis (PCA)! multidimensional scaling (MDS)! Kohonen self-organizing maps (SOMs) 41 PCA Example WGK, Fig
22 How to visualize nominal values? 43 Label nominal values? 44
23 Aggregation and Summarization! What to do with too much data?! Two components to aggregation!!! method of grouping points method of displaying the groups Key to visualizing aggregated data! provide sufficient info for user to decide whether they want to perform a drill-down on data (explore contents of one or more clusters) 45 Original Data WGK, Fig 2.5a 46
24 Aggregated Data WGK, Fig 2.5b 47 Outline! Data! Types of Data! Structure within and between Records! Data Cleaning! Data Preprocessing! Data Sources 48
25 Data Sources! Giant list of data sources! Some notable ones! Data.gov! Google Public Data Explorer! Census Bureau! Census data visualization gallery - Federal Reserve! FRASER - FRED Data for Maps! TIGER! detailed data on roads, railroads, rivers, ZIP codes (from Census Bureau)! OpenStreetMap! free wiki world map! Geocommons! contains data and ability to make maps 50 Yau, Visualize This, Ch 2
26 Outline! Data! Types of Data! Structure within and between Records! Data Cleaning! Data Preprocessing! Data Sources 51
Data. Notes. are required reading for the week. textbook reading and a few slides on data formats and data cleaning
CS 725/825 Information Visualization Spring 2018 Data Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-s18/ Notes } We will not cover these slides in class, but they are required reading for
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More information3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data
3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Andreas Butz, WS 2009/10 Konzept und Basis für n:
More informationUNIT 2 Data Preprocessing
UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationECT7110. Data Preprocessing. Prof. Wai Lam. ECT7110 Data Preprocessing 1
ECT7110 Data Preprocessing Prof. Wai Lam ECT7110 Data Preprocessing 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest,
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.
More informationK236: Basis of Data Science
Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationGrading for Assignment #1
Grading for Assignment #1-4 -35 Out of 100 points points off for not following directions Name in wrong place Wrong dimensions in image or html no name, weird links Linking to whatever page had a picture
More informationUNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES
UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationBy Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad
By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 2 Sajjad Haider Spring 2010 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationLearning Objectives for Data Concept and Visualization
Learning Objectives for Data Concept and Visualization Assignment 1: Data Quality Concept and Impact of Data Quality Summarize concepts of data quality. Understand and describe the impact of data on actuarial
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More informationJMP Book Descriptions
JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked
More informationSAS Visual Analytics 8.2: Getting Started with Reports
SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationECO375 Tutorial 1 Introduction to Stata
ECO375 Tutorial 1 Introduction to Stata Matt Tudball University of Toronto Mississauga September 14, 2017 Matt Tudball (University of Toronto) ECO375H5 September 14, 2017 1 / 25 What Is Stata? Stata is
More informationSPSS TRAINING SPSS VIEWS
SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data
More informationTo study the application of Data Visualization and Analysis tools
To study the application of Data Visualization and Analysis tools Mrs. Shibani Kulkarni, Department of Computer Science, Dr. D. Y. Patil ACS College, Pimpri, Pune-18 Ms. Neeta Takawale, Department of Computer
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationIntroduction to Data Science
Introduction to Data Science CS 491, DES 430, IE 444, ME 444, MKTG 477 UIC Innovation Center Fall 2017 and Spring 2018 Instructors: Charles Frisbie, Marco Susani, Michael Scott and Ugo Buy Author: Ugo
More informationExcel Primer CH141 Fall, 2017
Excel Primer CH141 Fall, 2017 To Start Excel : Click on the Excel icon found in the lower menu dock. Once Excel Workbook Gallery opens double click on Excel Workbook. A blank workbook page should appear
More informationDr. Barbara Morgan Quantitative Methods
Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In
More informationStatistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.
Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. SPSS Statistics were designed INTRODUCTION TO SPSS Objective About the
More informationAssigns a number to 110,000 letters/glyphs U+0041 is an A U+0062 is an a. U+00A9 is a copyright symbol U+0F03 is an
Unicode Unicode Assigns a number to 110,000 letters/glyphs U+0041 is an A U+0062 is an a UTF U+00A9 is a copyright symbol U+0F03 is an Universal Character Set Transformation Format describes how zeroes
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More informationPSS718 - Data Mining
Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationCSE4334/5334 Data Mining 4 Data and Data Preprocessing. Chengkai Li University of Texas at Arlington Fall 2017
CSE4334/5334 Data Mining 4 Data and Data Preprocessing Chengkai Li University of Texas at Arlington Fall 2017 10 What is Data? Collection of data objects and their attributes Attributes An attribute is
More informationData Analysis and Data Science
Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical
More informationAn introduction to plotting data
An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a
More informationScalable Data Analysis (CIS )
Scalable Data Analysis (CIS 602-01) Introduction Dr. David Koop NYC Taxi Data [Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance, T. W. Schneider] 2 What are your questions about this data?
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationComputer Science & Engineering 120 Learning to Code
Computer Science & Engineering 120 Learning to Code Introduction to Data Christopher M. Bourke cbourke@cse.unl.edu Part I: Working With Data Topic Overview Data Data Formats Data Operations Introduction
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 2 Original Slides: Jiawei Han and Micheline Kamber Modification: Li Xiong Data Mining: Concepts and Techniques 1 Chapter 2: Data Preprocessing Why preprocess
More informationData can be in the form of numbers, words, measurements, observations or even just descriptions of things.
+ What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data
More informationArcGIS Online (AGOL) Quick Start Guide Fall 2018
ArcGIS Online (AGOL) Quick Start Guide Fall 2018 ArcGIS Online (AGOL) is a web mapping tool available to UC Merced faculty, students and staff. The Spatial Analysis and Research Center (SpARC) provides
More informationData Collection, Simple Storage (SQLite) & Cleaning
Data Collection, Simple Storage (SQLite) & Cleaning Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 15, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationBUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)
SAS (Base & Advanced) Analytics & Predictive Modeling Tableau BI 96 HOURS Practical Learning WEEKDAY & WEEKEND BATCHES CLASSROOM & LIVE ONLINE DexLab Certified BUSINESS ANALYTICS Training Module Gurgaon
More informationUsing the DATAMINE Program
6 Using the DATAMINE Program 304 Using the DATAMINE Program This chapter serves as a user s manual for the DATAMINE program, which demonstrates the algorithms presented in this book. Each menu selection
More informationUtilizing Folksonomy: Similarity Metadata from the Del.icio.us System CS6125 Project
Utilizing Folksonomy: Similarity Metadata from the Del.icio.us System CS6125 Project Blake Shaw December 9th, 2005 1 Proposal 1.1 Abstract Traditionally, metadata is thought of simply
More informationNon-trivial extraction of implicit, previously unknown and potentially useful information from data
CS 795/895 Applied Visual Analytics Spring 2013 Data Mining Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs795-s13/ What is Data Mining? Many Definitions Non-trivial extraction of implicit, previously
More informationIntroduction to STATA
Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 Introduction to STATA WORKSHOP OBJECTIVE: This workshop
More informationDATA PREPROCESSING. Tzompanaki Katerina
DATA PREPROCESSING Tzompanaki Katerina Background: Data storage formats Data in DBMS ODBC, JDBC protocols Data in flat files Fixed-width format (each column has a specific number of characters, filled
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationWhat s New in Spotfire DXP 1.1. Spotfire Product Management January 2007
What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this
More information9.8 Rockin the Residuals
42 SECONDARY MATH 1 // MODULE 9 9.8 Rockin the Residuals A Solidify Understanding Task The correlation coefficient is not the only tool that statisticians use to analyze whether or not a line is a good
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationRoad Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary
2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types
More information刘淇 School of Computer Science and Technology USTC
Data Exploration 刘淇 School of Computer Science and Technology USTC http://staff.ustc.edu.cn/~qiliuql/dm2013.html t t / l/dm2013 l What is data exploration? A preliminary exploration of the data to better
More informationInfographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016
Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016 Overview Overview (short: we covered most of this in the tutorial) Why infographics and visualisation What s the
More informationOrganizing Your Data. Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013
Organizing Your Data Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013 Learning Objectives Identify Different Types of Variables Appropriately Naming Variables Constructing
More informationAnnouncements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless
Introduction to Database Systems CSE 414 Lecture 11: NoSQL 1 HW 3 due Friday Announcements Upload data with DataGrip editor see message board Azure timeout for question 5: Try DataGrip or SQLite HW 2 Grades
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input
More informationCOMP33111: Tutorial/lab exercise 2
COMP33111: Tutorial/lab exercise 2 Part 1: Data cleaning, profiling and warehousing Note: use lecture slides and additional materials (see Blackboard and COMP33111 web page). 1. Explain why legacy data
More informationVisual Analytics Tools for the Global Change Assessment Model. Ross Maciejewski Arizona State University
Visual Analytics Tools for the Global Change Assessment Model Ross Maciejewski Arizona State University GCAM Simulation After running thousands or even hundreds of simulations through GCAM this process
More informationAssignment 1. Question 1: Brock Wilcox CS
Assignment 1 Brock Wilcox wilcox6@uiuc.edu CS 412 2009-09-30 Question 1: In the introduction chapter, we have introduced different ways to perform data mining: (1) using a data mining language to write
More informationBasic Stata Tutorial
Basic Stata Tutorial By Brandon Heck Downloading Stata To obtain Stata, select your country of residence and click Go. Then, assuming you are a student, click New Educational then click Students. The capacity
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationSPSS Statistics 21.0 GA Fix List. Release notes. Abstract
SPSS Statistics 21.0 GA Fix List Release notes Abstract A comprehensive list of defect corrections for SPSS Statistics 21 GA. Details of the fixes are listed below. If you have questions about a particular
More informationRegression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationThe C# Programming Yellow Book Free Ebooks PDF
The C# Programming Yellow Book Free Ebooks PDF Learn C# from first principles the Rob Miles way. With jokes, puns, and a rigorous problem solving based approach.you can download all the code samples used
More informationINTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010
INTRODUCTION to Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010 While we are waiting Everyone who wishes to work along with the presentation should log onto
More informationData Structures And Other Objects Using Java Download Free (EPUB, PDF)
Data Structures And Other Objects Using Java Download Free (EPUB, PDF) This is the ebook of the printed book and may not include any media, website access codes, or print supplements that may come packaged
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationVisual Traffic Jam Analysis based on Trajectory Data
Visualization Workshop 13 Visual Traffic Jam Analysis based on Trajectory Data Zuchao Wang 1, Min Lu 1, Xiaoru Yuan 1, 2, Junping Zhang 3, Huub van de Wetering 4 1) Key Laboratory of Machine Perception
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:
More informationData Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140
Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 7, 2019 What is Data Mining? What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational
More informationCSE 115. Introduction to Computer Science I
CSE 115 Introduction to Computer Science I Road map Review HTTP Web API's JSON in Python Examples Python Web Server import bottle @bottle.route("/") def any_name(): response = "" response
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Data Exploration Fall 2017 This lecture roughly follow: http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap2_data.pdf Admin Assignment 0 is due next Friday:
More informationHal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009
The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill in the next decades, because
More informationCS 521 Data Mining Techniques Instructor: Abdullah Mueen
CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks
More informationData preprocessing Functional Programming and Intelligent Algorithms
Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute
More informationData Visualization (CIS/DSC 468)
Data Visualization (CIS/DSC 468) Isosurfaces Dr. David Koop Data Wrangling Problem 1: Visualizations need data Solution: The Web: github, gists, cloud services Problem 2: Data has extra information I don't
More information