Scalable Data Analysis (CIS )
|
|
- Maud Merritt
- 5 years ago
- Views:
Transcription
1 Scalable Data Analysis (CIS ) Introduction Dr. David Koop
2 NYC Taxi Data [Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance, T. W. Schneider] 2
3 What are your questions about this data? 3
4 NYC Taxi Data [Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance, T. W. Schneider] 4
5 NYC Taxi Data: Day analysis [Ferreira et al., 2013] 5
6 NYC Taxi Data: Region analysis t) in different neighborhoods over the first week of May The plots show that Midtown over the weekend, there is an increased number of dropoffs in Downtown. The figure also ncrease in the ay 5), with big s indicates that ew of city life, qualities. Peohe lack of taxi at their disconude difference her more afflule people take xploring other rising fact: the he other neighat the fare per omic incentive eans to reward Fig. 10. Comparing movement across NYC transportation hubs. On the [Ferreira et al., 2013] s top, we examine trips starting at the two major airports in NYC: JFK and La Guardia. In the bottom, we refine the query to compare trips starting 6 Grand Central) D. Koop, CIS , Fall 2017 at the airports with trips starting at the train stations, Penn Station and
7 Marine Traffic Data + - Leaflet Map data OpenStreetMap contributors 7
8 Marine Traffic Data + - Leaflet Map data OpenStreetMap contributors 8
9 Baseball Data [Deitrich et al., 2014] 9
10 Baseball Data [Deitrich et al., 2014] 10
11 Baseball Data [Deitrich et al., 2014] 11
12 Mobile Data Growth [Cisco Visual Networking Index Mobile, 2017] 12
13 Mobile Video Keeps Growing Note: Figures in parentheses refer to 2016 and 2021 traffic share. [Cisco Visual Networking Index Mobile, 2017] 13
14 Data Science Venn Diagram [D. Conway, The Data Science Venn Diagram, 2013] 14
15 Questions are important! Having data is great, but most of the time it just sits waiting for someone to analyze it The reason data analysis is not completely automated is that there are so many potential questions Humans need to stay involved in the loop Interaction and visualization can be important, especially early in data analysis 15
16 Scalability Big Data - What is big? For whom is it big? - variety, velocity, volume, Lots of data that was big is not an issue now Understanding the scalability of techniques is important There will always be larger datasets, want to understand - how methods scale - performance bounds - storage constraints 16
17 Real-time Analysis Want to have results now How? - Faster machines - Clusters - Progressive techniques 17
18 About Me Research Interests - Visualization - Computational Provenance - Geospatial Analysis Research Projects - VisTrails: - Dataflow Notebooks - Meta-versioning - Marine Traffic Data See my web page for more information
19 About You Previous topics course (CIS 602)? Research Papers? Data Science? Python? Database Experience? Analytics Experience? Cloud Computing Experience? Anything you want to see covered? 19
20 About this course Course web page is authoritative: Schedule, Readings, Assignments will be posted online - Check the web site before ing me Topics course - A current research area the professor works in - A chance to be on the cutting edge of research Requires student participation - Reading responses - Project presentations 20
21 About this course Balance of techniques and research ideas Some background (Python) followed by topic areas and readings Assignments at the beginning of course, project at end Two tests Topic areas: - Exploratory Data Analysis and Visualization - Data Acquisition - Data Storage and Access - Cloud Computing and Scalable Computation - Applications and specific data considerations 21
22 Project Do scalable data analysis of a large dataset - Questions - Analysis - Visualizations - Cloud/Cluster Computing Another option: research-related topic Waypoints: - Proposal - Progress Report - Final Presentation 22
23 About this course Course Registration: - Make sure you have registered in COIN for the course - me if you are not registered but are interested in taking the course Review of course policies: - Plagiarism and academic honesty - If you have any concerns or questions, please me as soon as possible If you are not sure if this course is a good fit, please me or talk to me 23
24 Data What is this data? Semantics: real-world meaning of the data Type: structural or mathematical interpretation Both often require metadata - Sometimes we can infer some of this information - Line between data and metadata isn t always clear 24
25 Data 25
26 Data Types Items - An item is an individual discrete entity - e.g. row in a table, node in a network Attributes - An attribute is some specific property that can be measured, observed, or logged - a.k.a. variable, (data) dimension 26
27 Items & Attributes attribute Field item 22 27
28 Data Types Nodes - Synonym for item but in the context of networks (graphs) Links - A link is a relation between two items - e.g. social network friends, computer network links 28
29 Items & Links Item Links [Bostock, 2011] 29
30 Dataset Types Dataset Types Tables Networks Fields (Continuous) Geometry (Spatial) Attributes (columns) Grid of positions Items (rows) Cell containing value Link Node (item) Cell Attributes (columns) Position Multidimensional Table Trees Value in cell Value in cell [Munzner (ill. Maguire), 2014] 30
31 Attribute Types Attribute Types Categorical Ordered Ordinal Quantitative Ordering Direction Sequential Diverging Cyclic [Munzner (ill. Maguire), 2014] 31
32 Categorial, Ordinal, and Quantitative 1 = Quantitative quantitative 23 2 = Nominal ordinal 3 = Ordinal categorical 32
33 Categorial, Ordinal, and Quantitative 1 = Quantitative quantitative 24 2 = Nominal ordinal 3 = Ordinal categorical 33
34 Semantics The type of data does not tell us what the data means or how it should be interpreted Tables have keys/values, fields have independent/dependent vars Flat Tables Multidimensional Fields [Munzner (ill. Maguire), 2014] 34
35 Analysis Actions Why? Targets Analyze All Data Consume Trends Outliers Features Discover Present Enjoy Produce Annotate Record Derive tag Attributes One Many Distribution Dependency Correlation Similarity Search Extremes Target known Target unknown Location known Location unknown Lookup Locate Browse Explore Network Data Topology Query Identify Compare Summarize Paths Spatial Data Shape What? Why? How? [Munzner (ill. Maguire), 2014] 35
36 Analysis: Consume & Produce Consume - Exploration - Explanation Analyze Consume Discover Present Enjoy - Enjoyment Produce - Annotation - Record - Derivation Produce Leads to new directions/ideas Annotate Record Derive tag [Munzner (ill. Maguire), 2014] 36
37 Analysis: Search and Query Search based on what a user knows Search - Target Target known Target unknown - Location Location known Lookup Browse Query depends on what data matters - One - Some (Often Two) - All Location unknown Query Locate Explore Identify Compare Summarize [Munzner (ill. Maguire), 2014] 37
38 Targets ALL DATA NETWORK DATA Trends Outliers Features Topology Paths ATTRIBUTES One Many Distribution Dependency Correlation Similarity SPATIAL DATA Shape Extremes [Munzner (ill. Maguire), 2014] 38
39 More Reading Listed on course schedule: - Challenges and Opportunities with Big Data, D. Agrawal et al. - Toward Scalable Systems for Big Data Analysis: A Technology Tutorial, H. Hu et al. - Big Data computing and clouds: Trends and future directions, M. D. Assuncao et al. 39
40 Next Class Introduction to/review of Python Download anaconda distribution: I am planning to use Python 3 (3.6) 40
DSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Visualization Tools Dr. David Koop Visualization for Exploration 2 MTA Fare Data Exploration 3 MTA Fare Data Exploration 4 MTA Fare Data Exploration 5 MTA Fare Data
More informationData Visualization (CIS/DSC 468)
Data Visualization (CIS/DSC 468) Data & Tasks Dr. David Koop Programmatic SVG Example Draw a horizontal bar chart - var a = [6, 2, 6, 10, 7, 18, 0, 17, 20, 6]; Steps: - Programmatically create SVG - Create
More informationData Visualization (DSC 530/CIS )
Data Visualization (DSC 530/CIS 602-01) Data Dr. David Koop HTML and CSS HTML: Tags define the boundaries of the structures of the content this is cool. What about this?
More informationData Visualization (CIS 468)
Data Visualization (CIS 468) D3 + Marks & Channels Dr. David Koop Tasks Actions Targets Analyze All Data Consume Trends Outliers Features Discover Present Enjoy Produce Annotate Record Derive tag Attributes
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input
More information3.Data Abstraction. Prof. Tulasi Prasad Sariki SCSE, VIT, Chennai 1 / 26
3.Data Abstraction Prof. Tulasi Prasad Sariki SCSE, VIT, Chennai www.learnersdesk.weebly.com 1 / 26 Outline What can be visualized? Why Do Data Semantics and Types Matter? Data Types Items, Attributes,
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Visualization Design Dr. David Koop Definition Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks
More informationData Visualization (CIS/DSC 468)
Data Visualization (CIS/DSC 468) Data Dr. David Koop SVG Example http://codepen.io/dakoop/pen/ yexvxb
More informationVisualization Analysis & Design Full-Day Tutorial Session 1
Visualization Analysis & Design Full-Day Tutorial Session 1 Tamara Munzner Department of Computer Science University of British Columbia Sanger Institute / European Bioinformatics Institute June 2014,
More informationCIS 467/602-01: Data Visualization
CIS 467/602-01: Data Visualization Tables Dr. David Koop Assignment 2 http://www.cis.umassd.edu/ ~dkoop/cis467/assignment2.html Plagiarism on Assignment 1 Any questions? 2 Recap (Interaction) Important
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Python and Notebooks Dr. David Koop Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.
More informationVisual Traffic Jam Analysis based on Trajectory Data
Visualization Workshop 13 Visual Traffic Jam Analysis based on Trajectory Data Zuchao Wang 1, Min Lu 1, Xiaoru Yuan 1, 2, Junping Zhang 3, Huub van de Wetering 4 1) Key Laboratory of Machine Perception
More informationTrajAnalytics: A software system for visual analysis of urban trajectory data
TrajAnalytics: A software system for visual analysis of urban trajectory data Ye Zhao Computer Science, Kent State University Xinyue Ye Geography, Kent State University Jing Yang Computer Science, University
More informationContact: Ye Zhao, Professor Phone: Dept. of Computer Science, Kent State University, Ohio 44242
Table of Contents I. Overview... 2 II. Trajectory Datasets and Data Types... 3 III. Data Loading and Processing Guide... 5 IV. Account and Web-based Data Access... 14 V. Visual Analytics Interface... 15
More informationData Visualization. Fall 2016
Data Visualization Fall 2016 Information Visualization Upon now, we dealt with scientific visualization (scivis) Scivisincludes visualization of physical simulations, engineering, medical imaging, Earth
More informationHistorical Text Mining:
Historical Text Mining Historical Text Mining, and Historical Text Mining: Challenges and Opportunities Dr. Robert Sanderson Dept. of Computer Science University of Liverpool azaroth@liv.ac.uk http://www.csc.liv.ac.uk/~azaroth/
More informationData Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting
CS 725/825 Information Visualization Fall 2013 Data Foundations Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-f13/ Topic Objectives! Distinguish between ordinal and nominal values and list
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationCOSC 490 Computational Topology
COSC 490 Computational Topology Dr. Joe Anderson Fall 2018 Salisbury University Course Structure Weeks 1-2: Python and Basic Data Processing Python commonly used in industry & academia Weeks 3-6: Group
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop http://www.cis.umassd.edu/~dkoop/dsc201 2 Chicago Food Inspections Exploration Based on David Beazley's PyData Chicago talk
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationInstructor: Dr. Mehmet Aktaş. Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University
Instructor: Dr. Mehmet Aktaş Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
More informationSPSS TRAINING SPSS VIEWS
SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data
More informationDSC 201: Data Analysis & Visualization
DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop Python Support for Time The datetime package - Has date, time, and datetime classes -.now() method: the current datetime
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationData Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets. Fernando Chirigati Harish Doraiswamy Theodoros Damoulas
Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets Fernando Chirigati Harish Doraiswamy Theodoros Damoulas Juliana Freire New York University New York University University
More informationWeb Development: Client Side
Course Description This course introduces web site design and development using EXtensible HyperText Markup Language (XHTML) and Cascading Style Sheets (CSS). You will learn standard XHTML and CSS and
More informationLearning Objectives for Data Concept and Visualization
Learning Objectives for Data Concept and Visualization Assignment 1: Data Quality Concept and Impact of Data Quality Summarize concepts of data quality. Understand and describe the impact of data on actuarial
More information: Semantic Web (2013 Fall)
03-60-569: Web (2013 Fall) University of Windsor September 4, 2013 Table of contents 1 2 3 4 5 Definition of the Web The World Wide Web is a system of interlinked hypertext documents accessed via the Internet
More informationTips and Guidance for Analyzing Data. Executive Summary
Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to
More informationThe Ultimate YouTube SEO Guide: Tips & Tricks on How to Increase Views and Rankings for your Online Videos
The Ultimate YouTube SEO Guide: Tips & Tricks on How to Increase Views and Rankings for your Online Videos The Ultimate App Store Optimization Guide Summary 1. Introduction 2. Choose the right video topic
More informationMULTIMEDIA DATABASES OVERVIEW
MULTIMEDIA DATABASES OVERVIEW Recent developments in information systems technologies have resulted in computerizing many applications in various business areas. Data has become a critical resource in
More informationTwo-dimensional Totalistic Code 52
Two-dimensional Totalistic Code 52 Todd Rowland Senior Research Associate, Wolfram Research, Inc. 100 Trade Center Drive, Champaign, IL The totalistic two-dimensional cellular automaton code 52 is capable
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationUSC Viterbi School of Engineering
Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues- Thur 10am- 11:50am Location: Allan Hancock Foundation
More informationWhat are we working with? Data Abstractions. Week 4 Lecture A IAT 814 Lyn Bartram
What are we working with? Data Abstractions Week 4 Lecture A IAT 814 Lyn Bartram Munzner s What-Why-How What are we working with? DATA abstractions, statistical methods Why are we doing it? Task abstractions
More informationMath 7 Notes - Unit 4 Pattern & Functions
Math 7 Notes - Unit 4 Pattern & Functions Syllabus Objective: (3.2) The student will create tables, charts, and graphs to extend a pattern in order to describe a linear rule, including integer values.
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationCentralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationEfficient Orienteering-Route Search over Uncertain Spatial Datasets
Efficient Orienteering-Route Search over Uncertain Spatial Datasets Mr. Nir DOLEV, Israel Dr. Yaron KANZA, Israel Prof. Yerach DOYTSHER, Israel 1 Route Search A standard search engine on the WWW returns
More informationHierarchy of knowledge BIG DATA 9/7/2017. Architecture
BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationUnderstanding Geospatial Data Models
Understanding Geospatial Data Models 1 A geospatial data model is a formal means of representing spatially referenced information. It is a simplified view of physical entities and a conceptualization of
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationData Processing at Scale (CSE 511)
Data Processing at Scale (CSE 511) Note: Below outline is subject to modifications and updates. About this Course Database systems are used to provide convenient access to disk-resident data through efficient
More informationCIS : Scalable Data Analysis
CIS 602-01: Scalable Data Analysis Visualization Dr. David Koop Growth of Data 2 Usefulness of Data 3 Analyzed Data 4 Example Data Sources Radio Telescopes Twitter Wind Turbine Sensors Surveillance Cameras
More informationStrategic Briefing Paper Big Data
Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which
More informationSemi-Structured Data Management (CSE 511)
Semi-Structured Data Management (CSE 511) Note: Below outline is subject to modifications and updates. About this Course Database systems are used to provide convenient access to disk-resident data through
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationVisualization Analysis & Design
Visualization Analysis & Design Tamara Munzner Department of Computer Science University of British Columbia UBC STAT 545A Guest Lecture October 20 2016, Vancouver BC http://www.cs.ubc.ca/~tmm/talks.html#vad16bryan
More informationCISD Math Department
CISD Math Department New vocabulary! New verbs! We cannot just go to OLD questions and use them to represent NEW TEKS. New nouns! New grade level changes! New resources! NEW 7.4A represent constant
More informationWeek 6: Networks, Stories, Vis in the Newsroom
Week 6: Networks, Stories, Vis in the Newsroom Tamara Munzner Department of Computer Science University of British Columbia JRNL 520H, Special Topics in Contemporary Journalism: Data Visualization Week
More informationUniversity of Florida CISE department Gator Engineering. Visualization
Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to
More informationHow to actively build inbound enquiry. ebook
How to actively build inbound enquiry ebook You know it s important HOW TO ACTIVELY BUILD INBOUND ENQUIRY... Businesses spend thousands of dollars every month on PR, advertising and at times, elaborate
More informationResource Discovery in IoT: Current Trends, Gap Analysis and Future Standardization Aspects
Resource Discovery in IoT: Current Trends, Gap Analysis and Future Standardization Aspects Soumya Kanti Datta Research Engineer, EURECOM TF-DI Coordinator in W3C WoT IG Email: dattas@eurecom.fr Roadmap
More informationEsri and MarkLogic: Location Analytics, Multi-Model Data
Esri and MarkLogic: Location Analytics, Multi-Model Data Ben Conklin, Industry Manager, Defense, Intel and National Security, Esri Anthony Roach, Product Manager, MarkLogic James Kerr, Technical Director,
More informationVisual Computing. Lecture 2 Visualization, Data, and Process
Visual Computing Lecture 2 Visualization, Data, and Process Pipeline 1 High Level Visualization Process 1. 2. 3. 4. 5. Data Modeling Data Selection Data to Visual Mappings Scene Parameter Settings (View
More informationInformation Visualization
Information Visualization Text: Information visualization, Robert Spence, Addison-Wesley, 2001 What Visualization? Process of making a computer image or graph for giving an insight on data/information
More informationData warehouse architecture consists of the following interconnected layers:
Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and
More informationMassive Data Analysis
Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that
More informationData Visualization Pitfalls to Avoid
Data Visualization Pitfalls to Avoid Tamara Munzner Department of Computer Science University of British Columbia CBR Arts Meets Science, UBC Centre for Blood Research Mar 23 2017, Vancouver BC http://www.cs.ubc.ca/~tmm/talks.html#cbr17
More informationTerms and definitions * keep definitions of processes and terms that may be useful for tests, assignments
Lecture 1 Core of GIS Thematic layers Terms and definitions * keep definitions of processes and terms that may be useful for tests, assignments Lecture 2 What is GIS? Info: value added data Data to solve
More informationData Visualization. Fall 2017
Data Visualization Fall 2017 Course Targets and Goals Getting acquainted with advanced techniques of visualization of scientific and technical data (spatial and non-spatial data) Application of selected
More informationGlyphs. Presentation Overview. What is a Glyph!? Cont. What is a Glyph!? Glyph Fundamentals. Goal of Paper. Presented by Bertrand Low
Presentation Overview Glyphs Presented by Bertrand Low A Taxonomy of Glyph Placement Strategies for Multidimensional Data Visualization Matthew O. Ward, Information Visualization Journal, Palmgrave,, Volume
More informationDS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Pre-processing and Cleaning Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: AK 232 Fall 2016 The Data Equation Oceans of Data Ocean Biodiversity Informatics,
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationData Management: the What, When and How
Data Management: the What, When and How Data Management: the What DAMA(Data Management Association) states that "Data Resource Management is the development and execution of architectures, policies, practices
More informationWebsite Designs Australia
Proudly Brought To You By: Website Designs Australia Contents Disclaimer... 4 Why Your Local Business Needs Google Plus... 5 1 How Google Plus Can Improve Your Search Engine Rankings... 6 1. Google Search
More informationMake the most of your access to ScienceDirect
1 Make the most of your access to ScienceDirect Present Future 2 ScienceDirect Training Deck We re here to help you make the most of your access to ScienceDirect. ScienceDirect offers researchers the latest
More information21 st Century Math Projects
Project Title: International Cell Phone Plan Standard Focus: Patterns, Algebra & Functions Topics of Focus: - Linear Functions - Rate of Change Time Range: 4-5 Days Supplies: Computer lab Benchmarks: Functions
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationReal-Time & Big Data GIS: Leveraging the spatiotemporal big data store
Real-Time & Big Data GIS: Leveraging the spatiotemporal big data store Suzanne Foss Product Manager, Esri sfoss@esri.com Ricardo Trujillo Real-Time & Big Data GIS Developer, Esri rtrujillo@esri.com @rtrujill007
More informationTrending Words in Digital Library for Term Cloud-based Navigation
Trending Words in Digital Library for Term Cloud-based Navigation Samuel Molnár, Róbert Móro, Mária Bieliková Institute of Informatics and Software Engineering, Faculty of Informatics and Information Technologies,
More informationBusiness Analytics Nanodegree Syllabus
Business Analytics Nanodegree Syllabus Master data fundamentals applicable to any industry Before You Start There are no prerequisites for this program, aside from basic computer skills. You should be
More informationComputer Science Seminar. Whats the next big thing? Ruby? Python? Neither?
Computer Science Seminar Whats the next big thing? Ruby? Python? Neither? Introduction Seminar Style course unlike many computer science courses discussion important, encouraged and part of your grade
More information06 Visualizing Information
Professor Shoemaker 06-VisualizingInformation.xlsx 1 It can be sometimes difficult to uncover meaning in data that s presented in a table or list Especially if the table has many rows and/or columns But
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationGeoTemporal Reasoning for the Social Semantic Web
GeoTemporal Reasoning for the Social Semantic Web Jans Aasman Franz Inc. 2201 Broadway, Suite 715, Oakland, CA 94612, USA ja@franz.com Abstract: We demonstrate a Semantic Web application that organizes
More informationCourse Title: Computer Networking 2. Course Section: CNS (Winter 2018) FORMAT: Face to Face
Course Title: Computer Networking 2 Course Section: CNS-106-50 (Winter 2018) FORMAT: Face to Face TIME FRAME: Start Date: 15 January 2018 End Date: 28 February 2018 Monday & Wednesday 1:00pm 5:00pm CREDITS:
More informationData Visualization (CIS 468)
Data Visualization (CIS 468) Web Programming Dr. David Koop What is Data Visualization? 2 Exploration Communication Spectrum Consecutive Starts by a Quarterback for a Single Team Exploration Confirmation
More informationMEMBERSHIP & PARTICIPATION
MEMBERSHIP & PARTICIPATION What types of activities can I expect to participate in? There are a variety of activities for you to participate in such as discussion boards, idea exchanges, contests, surveys,
More informationGetting Started. What is SAS/SPECTRAVIEW Software? CHAPTER 1
3 CHAPTER 1 Getting Started What is SAS/SPECTRAVIEW Software? 3 Using SAS/SPECTRAVIEW Software 5 Data Set Requirements 5 How the Software Displays Data 6 Spatial Data 6 Non-Spatial Data 7 Summary of Software
More informationHow to use search, recommender systems and online community to help users find what they want. Rashmi Sinha
The Quest for the "right item": How to use search, recommender systems and online community to help users find what they want. Rashmi Sinha Summary of the talk " Users have different types of information
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationProject Collaboration
Bonus Chapter 8 Project Collaboration It s quite ironic that the last bonus chapter of this book contains information that many of you will need to get your first Autodesk Revit Architecture project off
More informationData publication and discovery with Globus
Data publication and discovery with Globus Questions and comments to outreach@globus.org The Globus data publication and discovery services make it easy for institutions and projects to establish collections,
More informationData Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data
Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio
More informationChapter 8: GPS Clustering and Analytics
Chapter 8: GPS Clustering and Analytics Location information is crucial for analyzing sensor data and health inferences from mobile and wearable devices. For example, let us say you monitored your stress
More informationData Model and Management
Data Model and Management Ye Zhao and Farah Kamw Outline Urban Data and Availability Urban Trajectory Data Types Data Preprocessing and Data Registration Urban Trajectory Data and Query Model Spatial Database
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationScientific Visualization
Scientific Visualization Topics Motivation Color InfoVis vs. SciVis VisTrails Core Techniques Advanced Techniques 1 Check Assumptions: Why Visualize? Problem: How do you apprehend 100k tuples? when your
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationECONOMICS 5317: CONTEMPORARY GOVERNMENT AND BUSINESS RELATIONS
1 ECONOMICS 5317: CONTEMPORARY GOVERNMENT AND BUSINESS RELATIONS Fall 2011, MWF 9:05-9:55, HCB 408 INSTRUCTOR: David VanHoose OFFICE HOURS: OFFICE: 339 Hankamer MWF 8:00-9:00 & 12:15-1:15; OFFICE PHONE:
More informationD DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi
Journal of Energy and Power Engineering 10 (2016) 405-410 doi: 10.17265/1934-8975/2016.07.004 D DAVID PUBLISHING Shirin Abbasi Computer Department, Islamic Azad University-Tehran Center Branch, Tehran
More informationAdvanced Visualization
320581 Advanced Visualization Prof. Lars Linsen Fall 2011 0 Introduction 0.1 Syllabus and Organization Course Website Link in CampusNet: http://www.faculty.jacobsuniversity.de/llinsen/teaching/320581.htm
More informationSecurity analytics: From data to action Visual and analytical approaches to detecting modern adversaries
Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Chris Calvert, CISSP, CISM Director of Solutions Innovation Copyright 2013 Hewlett-Packard Development
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More information