Information Visualization and Visual Analytics roles, challenges, and examples Giuseppe Santucci

Size: px
Start display at page:

Download "Information Visualization and Visual Analytics roles, challenges, and examples Giuseppe Santucci"

Transcription

1 Information Visualization and Visual Analytics roles, challenges, and examples Giuseppe Santucci

2 VisDis and the Database & User Interface The VisDis and the Database/Interface group background is about: Visual Information Access Data quality Data integration Adaptive Interfaces User Centered Design Usability and Accessibility Infovis evaluation Visual quality metrics Visual Analytics Data sampling Density map optimization

3 Outline Information Visualization Main issues Data overloading Visual Analytics Automatic data analysis Three examples Projects and books

4 Information visualization! 1. Infovis is perfect for exploration, when we don t know exactly what to look at. It supports vague goals 2. Infovis is perfect to explain complex data and to support decisions Other approaches to data analysis Statistics: strong verification but does not support exploration and vague goals Data mining: actionable and reliable but black box, not interactive, question-response style Visual analytics (formerly Visual Data Mining) is trying to join the two worlds

5 Canonical steps in infovis STEP 1 DATA Internal Representation Sport Mathematics Physics Encoding of values Univariate data Bivariate data Trivariate data Multidimensional data Encoding of relations Temporal data Map & Diagrams Graphs/Trees Data streams Chemistry Art Geography Literature History

6 Canonical steps in infovis STEP 2 Internal Representation Space limitations Scrolling Overview + details Distortion Suppression Zoom & pan Semantic zoom Time limitation Perceptual issues Cognitive issues Presentation

7 SO WE ARE DONE! (?)

8 Outline Information Visualization Data overloading Visual Analytics Automatic data analysis Three examples Projects and books and conferences

9 Data size and complexity! 100 million FedEx transactions per day 150 million VISA credit card transactions per day 300 million long distance ATT calls per day 50 billion s per day 600 billion IP packets per day 1 trillion (10 12 ) of web pages (according to Google), corresponding to about 3 petabytes of data Google processes 20 petabytes of data per day Data streams (sensor network, IP traffic, etc) kilobyte, megabyte, gigabyte, terabyte, petabyte

10 Rescuing information In different situations people need to exploit and to use hidden information resting in unexplored large data sets decision-makers analysts engineers emergency response teams... Several techniques exist devoted to this aim Automatic analysis techniques (e.g., data mining) Manual analysis techniques (e.g., Information visualization) Petabyte datasets require a joint effort:

11 Visual Analytics

12 VA is highly interdisciplinary Evaluation Data Mining Evaluation Data Management Scientific & Information Visualisation Spatio- Temporal Data Infrastructure Human Perception +Cognition Infrastructure Each component presents challenging issues

13 Visualization Scientific Visualization & Information Visualization interactivity & scalability issues Challenges: design of new scalable structure that support: Visual abstractions (e.g., clustering, sampling, etc.) Rapid update of visual displays for billion record databases (10 frames per second)

14 Data Management Answering a query against a large data set is now possible Among the other challenges: Integration of heterogeneous data such as numeric data, graphs, text, audio and video signals, semi-structured data Data streams - In many application data are continuously produced (sensor data, stock market data, news data, etc.) Data provenance - Understanding where data come from Data reduction - Visualizing billion records is not possible. We need to reduce and abstract the data to support interaction at different detail levels (see, e.g., Google Earth)...

15 Data mining Methods to automatically extract insights Supervised learning from examples: using training samples to learn models for the classification (or prediction) of previously unseen data sample Cluster analysis, which aims to extract structure from unknown data, grouping data instances into classes based on mutual similarity, and to identify outliers Association rule mining (analysis of co-occurrence of data items) and dimensionality reduction Challenges come from: semi-structured and complex data (web data, documents) interaction with visualizations

16 Spatio - Temporal Data Data about time and space are widely spread geographic measurements GPS position data remote sensing applications (e.g., satellite data) Finding spatial relationships and patterns among this data is of special interest The analysis of data with references both in space and in time is a challenging research topic: scale: clusters and other phenomena may only occur at particular scales, which may not be the scale at which data is recorded uncertainty: spatio-temporal data are often incomplete, interpolated, collected at different times, etc.

17 Perception and cognition A critical element is the human being ( ) Visual analysis tasks require the careful design of apt human-computer interfaces Challenges: need to integrate Psychology, Sociology, Neurosciences, and Design issues user-centred analysis and modelling multimodal interaction techniques for visualization and exploration of large information spaces availability of improved display resources novel interaction algorithms perceptual, cognitive and graphical principles which in combination lead to improved visual communication of data and analysis results Form Intention Form Action plan Execute Action Evaluatio Interpretatio Perception

18 Evaluation and Infrastructure How to assess (evaluate) the effectiveness of visual analytics environment is a topic of lively debate The same happens for infrastructures: agreed solutions are still under investigation Both topics are still in the phase of workshop results... D3!

19 Back to the Automatic Data Analysis We can classify the automatic activities in three main groups 1. Deriving new values from the dataset for ad-hoc visualization This is the less standard and the more creative part of the process 2. Data reduction / data mining Clustering /classification / Sampling / pixel oriented visualization Dimension reduction 3. Visualization improvement Data distribution Perceptual issues Cognitive issues

20 Example for group 1 Deriving new values from the dataset for ad-hoc visualization (you are going to visualize DERIVED data)

21 A Visual Analytics example (Group 1) Deriving new values from the dataset for ad-hoc visualization How to visually compare J. London and M. Twain books? [D. A. Keim and D. Oelke. Literature Fingerprinting: A New Method for Visual Literary Analysis IEEE Symp. on Visual Analytics Science and Technology (VAST '07) ] 1. Split the book in several text block (e.g., pages, paragraph, sentences) 2. Measure, for each text block, a relevant feature (e.g., average sentence length, word usage, etc. ) 3. Associate the relevant feature to a visual attribute (e.g., color) 4. Visualize it

22 J.London vs M.Twain average sentence lengths

23 User interaction (a non uniform book?)

24 Details of a book

25 What about the Bible?

26 Example 2 Data reduction / data mining

27 Visual Analytics of Anomaly Detection in Large Data Streams (paper from Daniel Keim group) You have to monitor a network composed of 8 systems with 16 servers each Each server provide basic information CPU % occupation DISK % occupation MEM % occupation... That corresponds to 128 temporal data streams (overplotting!!) CPU % time

28 Pixel oriented visualization 28 days (5 min windows), about 8k observations Each observation takes a pixel The color codes the CPU %

29 The whole system Color is preattentive!

30 Automated analysis Computing high CPU % clusters That selects hot time intervals

31 Automated analysis... Detecting persistent anomalies

32 Looking for correlations

33 Example 3 Visualization improvement

34 A Visual Analytics example (Group 3 Visualization improvement) Data distribution and perceptual issues 4 data items are plotted on the same pixel:d=4 Density maps empty pixel 8x8 pixels we can map the density values to a 256 levels grey or color scale

35 The case study (Infovis contest 2005) About 60,000 USA companies plotted on a 800x450 (360,000 pixels) scatter plot 126 distinct density values ranging on [1..1,633] 7,042 active pixels (i.e., hosting at least one company): 2526 pixels (36%) host exactly one company (d=1) 1182 pixels (17%) host two companies (d=2)... 1 pixel ( %) hosts 1633 companies (d=1633)

36 What is the problem? The choice of the right mapping is crucial, because of density frequency distribution presents very skewed behaviour 36% Pixel number 17% 0.001% Density (126 distinct values) 1633

37 The mapping 126 different data densities = { 1, 2,, 1,633 }? 256 Color Codes = { 0,1, 2,, 255} Available solutions - Linear mapping - Non linear mappings

38 Linear mapping ColorCode( d) = Round 255 d d max d min d min colors collisions Transfer Function Straightforward solution Useless in this situation Most pixels share very low color codes Few color codes are used (46 out of 256) Color code frequency distribution Different low density values are represented by the same color code: densities in [1..10] are mapped on codes {1,2}

39 Density function mapping ColorCode( d j ) j = Round 255 i= 1 DN N AP ( d AP i ) TF Hermann et al. [HMM00] Quite similar to histogram aequalization Better than linear mapping Color code frequency distribution Few color codes are used (39 out of 256) Lowest color code unnecessarily high Codes ranging only on [ ] Different high density values are represented by the same color code: densities in [48..1,633] -> [250,255]

40 Our proposal We take into account that: densities and color codes are discrete and finite too close color codes are hardly distinguishable (for human beings) [E. Bertini, A. Di Girolamo, G.Santucci - See what you know: analyzing data distribution to improve density map visualization Eurovis 2007 conference]

41 uniform scale mapping We use a reduced color scale, e.g. with 15 codes (N L =15) This implies that different density values will be necessarily represented by the same color code: to reduce the degradation the mapping is performed through an algorithm that tries to assign to each code the same number of pixels N N AP L c1 c2 c3 cnl Target color code frequency distribution

42 N DV >N L : uniform scale mapping ColorCode( d j ) = DistributePixels Because of densities are discrete the algorithm cannot ensure the N AP /N L value and through a peak analysis it minimizes the variance Full color scale usage [0..255] All the color codes are used Maximum color code separation Color code frequency distribution

43 Visual comparison Linear mapping Density function mapping Uniform scale mapping

44 Visual comparison

45

46

47 The parcel dataset Postal parcels plotted by weight (x) and volume (y)

48 Grey scale Linear CSU=0.53 CsAR=1 CS=2.83 Density Function CSU=0.18 CsAR=0.62 CS=5.23 Uniform color sc. CSU=1 CsAR=1 CS=8.79

49 Conclusions Visual Analytics is a new (exciting) emerging research field Information visualization is a core component of VA Automated data analysis could be classified in three main groups Deriving new values (more creative) Data reduction (sometimes creative) Image improvement (very technical) It is highly interdisciplinary and require a collaborative approach It is mainly a METHODOLOGY / VISION than a technique However a collection of available results / proposal is quickly growing

50 The new (European) book on VA Illuminating the path : The Research and Development Agenda for Visual Analytics 2005, focusing on USA homeland security Managing the Information Age Solving Problems with Visual Analytics (2010) One of the major outcome of Vismaster Availble for free at:

51 5 books you HAVE to read (greedy order) Robert Spence - Information Visualization: Design for Interaction (2nd Edition) - Addison-Wesley (ACM Press) - BASIC ISSUES Chaomei Chen - Information Visualization - Second Edition - Springer - AN UPDATED OVERVIEW Managing the Information Age Solving Problems with Visual Analytics (2010) VISMASTER BOOK Colin Ware - Information Visualization, Third Edition: Perception for Design (Interactive Technologies) - Morgan Kaufmann - PERCEPTUAL ISSUES Card, Mackinlay, Shneiderman - Reading in Information Visualization HYSTORICAL

52 Visual Analytics projects

53 The Vismaster CA project

54 The Promise NoE project

55 PanopteSec Network Cyber Security 3 years European IP project!

56

57 PanopteSec: Call for Master Thesis Design implement and test a Visual Analytics Environment for Network security D3 framework It includes the Information visualization homework

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Geometric Techniques. Part 1. Example: Scatter Plot. Basic Idea: Scatterplots. Basic Idea. House data: Price and Number of bedrooms

Geometric Techniques. Part 1. Example: Scatter Plot. Basic Idea: Scatterplots. Basic Idea. House data: Price and Number of bedrooms Part 1 Geometric Techniques Scatterplots, Parallel Coordinates,... Geometric Techniques Basic Idea Visualization of Geometric Transformations and Projections of the Data Scatterplots [Cleveland 1993] Parallel

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Information Visualization & Visual Analytics

Information Visualization & Visual Analytics Information Visualization & Visual Analytics Jack van Wijk Dept. Math. & Computer Science TU Eindhoven BPM round table, March 28, 2011 Overview InfoVis Visual Analytics Why is my hard disk full?? SequoiaView

More information

Quality Metrics for Visual Analytics of High-Dimensional Data

Quality Metrics for Visual Analytics of High-Dimensional Data Quality Metrics for Visual Analytics of High-Dimensional Data Daniel A. Keim Data Analysis and Information Visualization Group University of Konstanz, Germany Workshop on Visual Analytics and Information

More information

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Grundlagen methodischen Arbeitens Informationsvisualisierung [WS ] Monika Lanzenberger

Grundlagen methodischen Arbeitens Informationsvisualisierung [WS ] Monika Lanzenberger Grundlagen methodischen Arbeitens Informationsvisualisierung [WS0708 01 ] Monika Lanzenberger lanzenberger@ifs.tuwien.ac.at 17. 10. 2007 Current InfoVis Research Activities: AlViz 2 [Lanzenberger et al.,

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Spatial Outlier Detection

Spatial Outlier Detection Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point

More information

Introduction to Trajectory Clustering. By YONGLI ZHANG

Introduction to Trajectory Clustering. By YONGLI ZHANG Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem

More information

Model Based Impact Location Estimation Using Machine Learning Techniques

Model Based Impact Location Estimation Using Machine Learning Techniques Model Based Impact Location Estimation Using Machine Learning Techniques 1. Introduction Impacts on composite structures result in invisible damages that need to be detected and corrected before they lead

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Information Visualization - Introduction

Information Visualization - Introduction Information Visualization - Introduction Institute of Computer Graphics and Algorithms Information Visualization The use of computer-supported, interactive, visual representations of abstract data to amplify

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

BIG DATA SCIENTIST Certification. Big Data Scientist

BIG DATA SCIENTIST Certification. Big Data Scientist BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Seeing and Reading Red: Hue and Color-word Correlation in Images and Attendant Text on the WWW

Seeing and Reading Red: Hue and Color-word Correlation in Images and Attendant Text on the WWW Seeing and Reading Red: Hue and Color-word Correlation in Images and Attendant Text on the WWW Shawn Newsam School of Engineering University of California at Merced Merced, CA 9534 snewsam@ucmerced.edu

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2012 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt12 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Computer Science 591Y Department of Computer Science University of Massachusetts Amherst February 3, 2005 Topics Tasks (Definition, example, and notes) Classification

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Fall 2017 ECEN Special Topics in Data Mining and Analysis

Fall 2017 ECEN Special Topics in Data Mining and Analysis Fall 2017 ECEN 689-600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

A Survey Of Issues And Challenges Associated With Clustering Algorithms

A Survey Of Issues And Challenges Associated With Clustering Algorithms International Journal for Science and Emerging ISSN No. (Online):2250-3641 Technologies with Latest Trends 10(1): 7-11 (2013) ISSN No. (Print): 2277-8136 A Survey Of Issues And Challenges Associated With

More information

8. Automatic Content Analysis

8. Automatic Content Analysis 8. Automatic Content Analysis 8.1 Statistics for Multimedia Content Analysis 8.2 Basic Parameters for Video Analysis 8.3 Deriving Video Semantics 8.4 Basic Parameters for Audio Analysis 8.5 Deriving Audio

More information

Semi supervised clustering for Text Clustering

Semi supervised clustering for Text Clustering Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,

More information

S. Rinzivillo DATA VISUALIZATION AND VISUAL ANALYTICS

S. Rinzivillo DATA VISUALIZATION AND VISUAL ANALYTICS S. Rinzivillo rinzivillo@isti.cnr.it DATA VISUALIZATION AND VISUAL ANALYTICS Who I Am? Salvatore Rinzivillo rinzivillo@isti.cnr.it Page course: http://didawiki.cli.di.unipi.it/ Visual Analytics Github

More information

Information Visualisation

Information Visualisation Information Visualisation Computer Animation and Visualisation Lecture 18 Taku Komura tkomura@ed.ac.uk Institute for Perception, Action & Behaviour School of Informatics 1 Overview Information Visualisation

More information

FROM PEER TO PEER...

FROM PEER TO PEER... FROM PEER TO PEER... Dipartimento di Informatica, Università degli Studi di Pisa HPC LAB, ISTI CNR Pisa in collaboration with: Alessandro Lulli, Emanuele Carlini, Massimo Coppola, Patrizio Dazzi 2 nd HPC

More information

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE Dr. Kirti Singh, Librarian, SSD Women s Institute of Technology, Bathinda Abstract: Major libraries have large collections and circulation. Managing

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

More information

Data Sets. of Large. Visual Exploration. Daniel A. Keim

Data Sets. of Large. Visual Exploration. Daniel A. Keim Visual Exploration of Large Data Sets Computer systems today store vast amounts of data. Researchers, including those working on the How Much Information? project at the University of California, Berkeley,

More information

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data 3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Andreas Butz, WS 2009/10 Konzept und Basis für n:

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

Introduction to digital image classification

Introduction to digital image classification Introduction to digital image classification Dr. Norman Kerle, Wan Bakx MSc a.o. INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION Purpose of lecture Main lecture topics Review

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Parallel Approach for Implementing Data Mining Algorithms

Parallel Approach for Implementing Data Mining Algorithms TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

More information

Remotely Sensed Image Processing Service Automatic Composition

Remotely Sensed Image Processing Service Automatic Composition Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University

More information

Strategic Briefing Paper Big Data

Strategic Briefing Paper Big Data Strategic Briefing Paper Big Data The promise of Big Data is improved competitiveness, reduced cost and minimized risk by taking better decisions. This requires affordable solution architectures which

More information

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN Course Code : CS110 Course Title : Visualization Technique Semester :

More information

Data Visualization. Fall 2016

Data Visualization. Fall 2016 Data Visualization Fall 2016 Information Visualization Upon now, we dealt with scientific visualization (scivis) Scivisincludes visualization of physical simulations, engineering, medical imaging, Earth

More information

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

High Dimensional Data Visualization

High Dimensional Data Visualization High Dimensional Data Visualization Some examples Text data. Finance. Time Series Data. Climate Data (http://www.erh.noaa.gov/lwx/f6.htm ). Spatial Data. Spatio temporal Data. Biological Data Many others

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Knowledge-Defined Networking: Towards Self-Driving Networks

Knowledge-Defined Networking: Towards Self-Driving Networks Knowledge-Defined Networking: Towards Self-Driving Networks Albert Cabellos (UPC/BarcelonaTech, Spain) albert.cabellos@gmail.com 2nd IFIP/IEEE International Workshop on Analytics for Network and Service

More information

Contextual priming for artificial visual perception

Contextual priming for artificial visual perception Contextual priming for artificial visual perception Hervé Guillaume 1, Nathalie Denquive 1, Philippe Tarroux 1,2 1 LIMSI-CNRS BP 133 F-91403 Orsay cedex France 2 ENS 45 rue d Ulm F-75230 Paris cedex 05

More information

Information Visualization

Information Visualization Information Visualization Text: Information visualization, Robert Spence, Addison-Wesley, 2001 What Visualization? Process of making a computer image or graph for giving an insight on data/information

More information

Visual Analytics: Combining Automated Discovery with Interactive Visualizations

Visual Analytics: Combining Automated Discovery with Interactive Visualizations Visual Analytics: Combining Automated Discovery with Interactive Visualizations Daniel A. Keim, Florian Mansmann, Daniela Oelke, and Hartmut Ziegler University of Konstanz, Germany first.lastname@uni-konstanz.de,

More information

CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING

CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING MAJOR: DEGREE: COMPUTER SCIENCE MASTER OF SCIENCE (M.S.) CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING The Department of Computer Science offers a Master of Science

More information

COMPUTER NETWORKS PERFORMANCE. Gaia Maselli

COMPUTER NETWORKS PERFORMANCE. Gaia Maselli COMPUTER NETWORKS PERFORMANCE Gaia Maselli maselli@di.uniroma1.it Prestazioni dei sistemi di rete 2 Overview of first class Practical Info (schedule, exam, readings) Goal of this course Contents of the

More information

Big Data Challenges in Large IP Networks

Big Data Challenges in Large IP Networks Big Data Challenges in Large IP Networks Feature Extraction & Predictive Alarms for network management Wednesday 28 th Feb 2018 Dave Yearling British Telecommunications plc 2017 What we will cover Making

More information

MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ)

MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ) MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ) Application of Machine Learning to Testing in Finance, Cyber, and Software Innovation center, Washington, D.C. THE SCIENCE OF TEST WORKSHOP 2017 AGENDA

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

Data Mining and. in Dynamic Networks

Data Mining and. in Dynamic Networks Data Mining and Knowledge Discovery in Dynamic Networks Panos M. Pardalos Center for Applied Optimization Dept. of Industrial & Systems Engineering Affiliated Faculty of: Computer & Information Science

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS. Wentong CAI

PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS. Wentong CAI PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS Wentong CAI Parallel & Distributed Computing Centre School of Computer Engineering Nanyang Technological University Singapore

More information

An Introduction to Content Based Image Retrieval

An Introduction to Content Based Image Retrieval CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing

More information

MetaData for Database Mining

MetaData for Database Mining MetaData for Database Mining John Cleary, Geoffrey Holmes, Sally Jo Cunningham, and Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand. Abstract: At present, a machine

More information

Framework for Visual Analytics of Measurement Data

Framework for Visual Analytics of Measurement Data Framework for Visual Analytics of Measurement Data Paula Järvinen, Pekka Siltanen, Kari Rainio VTT, PL 1000, 02044 VTT Espoo, Finland {paula.jarvinen, pekka.siltanen, kari.rainio}@vtt.fi Abstract-Visual

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

With turing you can: Identify, locate and mitigate the effects of botnets or other malware abusing your infrastructure

With turing you can: Identify, locate and mitigate the effects of botnets or other malware abusing your infrastructure Decoding DNS data If you have a large DNS infrastructure, understanding what is happening with your real-time and historic traffic is difficult, if not impossible. Until now, the available network management

More information

Advanced Visualization

Advanced Visualization 320581 Advanced Visualization Prof. Lars Linsen Fall 2011 0 Introduction 0.1 Syllabus and Organization Course Website Link in CampusNet: http://www.faculty.jacobsuniversity.de/llinsen/teaching/320581.htm

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

Fast Approximations for Analyzing Ten Trillion Cells. Filip Buruiana Reimar Hofmann

Fast Approximations for Analyzing Ten Trillion Cells. Filip Buruiana Reimar Hofmann Fast Approximations for Analyzing Ten Trillion Cells Filip Buruiana (filipb@google.com) Reimar Hofmann (reimar.hofmann@hs-karlsruhe.de) Outline of the Talk Interactive analysis at AdSpam @ Google Trade

More information

Interaction. CS Information Visualization. Chris Plaue Some Content from John Stasko s CS7450 Spring 2006

Interaction. CS Information Visualization. Chris Plaue Some Content from John Stasko s CS7450 Spring 2006 Interaction CS 7450 - Information Visualization Chris Plaue Some Content from John Stasko s CS7450 Spring 2006 Hello. What is this?! Hand back HW! InfoVis Music Video! Interaction Lecture remindme.mov

More information

Course Curriculum for Master Degree in Network Engineering and Security

Course Curriculum for Master Degree in Network Engineering and Security Course Curriculum for Master Degree in Network Engineering and Security The Master Degree in Network Engineering and Security is awarded by the Faculty of Graduate Studies at Jordan University of Science

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Statistical Learning and Data Mining CS 363D/ SSC 358

Statistical Learning and Data Mining CS 363D/ SSC 358 Statistical Learning and Data Mining CS 363D/ SSC 358! Lecture: Introduction Pradeep Ravikumar pradeepr@cs.utexas.edu What is this course about (in 1 minute) Big Data Data Mining, Statistical Learning

More information

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019

Exploring the Structure of Data at Scale. Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Exploring the Structure of Data at Scale Rudy Agovic, PhD CEO & Chief Data Scientist at Reliancy January 16, 2019 Outline Why exploration of large datasets matters Challenges in working with large data

More information

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A

More information

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended

More information

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140 Data Mining CS 5140 / CS 6140 Jeff M. Phillips January 7, 2019 What is Data Mining? What is Data Mining? Finding structure in data? Machine learning on large data? Unsupervised learning? Large scale computational

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

DETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS

DETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS DETECTION OF ANOMALIES FROM DATASET USING DISTRIBUTED METHODS S. E. Pawar and Agwan Priyanka R. Dept. of I.T., University of Pune, Sangamner, Maharashtra, India M.E. I.T., Dept. of I.T., University of

More information

UNCLASSIFIED. R-1 ITEM NOMENCLATURE PE D8Z: Data to Decisions Advanced Technology FY 2012 OCO

UNCLASSIFIED. R-1 ITEM NOMENCLATURE PE D8Z: Data to Decisions Advanced Technology FY 2012 OCO Exhibit R-2, RDT&E Budget Item Justification: PB 2012 Office of Secretary Of Defense DATE: February 2011 BA 3: Advanced Development (ATD) COST ($ in Millions) FY 2010 FY 2011 Base OCO Total FY 2013 FY

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Based on Big Data: Hype or Hallelujah? by Elena Baralis Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of

More information

A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay

A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay VPQM 2010, Scottsdale, Arizona, U.S.A, January 13-15 Outline Introduction Proposed approach Colors

More information

dan.fay@microsoft.com Scientific Data Intensive Computing Workshop 2004 Visualizing and Experiencing E 3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through

More information