IMPACT MODELS AND DATA MATTEO DE FELICE

Size: px
Start display at page:

Download "IMPACT MODELS AND DATA MATTEO DE FELICE"

Transcription

1 IMPACT MODELS AND DATA MATTEO DE FELICE

2 What is an impact model

3 What is an impact model

4 What is an impact model

5 What is an impact model Modelling the influence of something on something else

6 What is an impact model Modelling the influence of something on something else Usually two different domains (e.g. economics and industrial production, climate and energy sector, )

7 What is an impact model Why? How to build an impact model? Two approaches: modelling underlying processes or modelling observed input/outputs

8 ESSENTIALLY, ALL MODELS ARE WRONG, BUT SOME ARE USEFUL. George E. P. Box (English statistician)

9 Boxes White box Organised knowledge Black box Gray box Zero knowledge/ many observations and measures Some knowledge and some data

10 Right tools

11 Right tools Wolpert No Free Lunch theorem: best model doesn t exist considering all possible problems

12 Right tools Wolpert No Free Lunch theorem: best model doesn t exist considering all possible problems Exploring the trade-off between time-to-market and efficiency/performances

13 Right tools Wolpert No Free Lunch theorem: best model doesn t exist considering all possible problems Exploring the trade-off between time-to-market and efficiency/performances Performance 0 time-to-market

14 Right tools Wolpert No Free Lunch theorem: best model doesn t exist considering all possible problems Exploring the trade-off between time-to-market and efficiency/performances Performance 0 Random guess (dice?) time-to-market

15 Right tools Wolpert No Free Lunch theorem: best model doesn t exist considering all possible problems Exploring the trade-off between time-to-market and efficiency/performances Genie s lamp Performance 0 Random guess (dice?) time-to-market Worthless effort

16 Right tools Wolpert No Free Lunch theorem: best model doesn t exist considering all possible problems Exploring the trade-off between time-to-market and efficiency/performances Genie s lamp Performance 0 Random guess (dice?) time-to-market Worthless effort

17 Modeling software Spreadsheet software (e.g. Microsoft Excel) Code programming (C/C++, Python) Statistical computing environments (e.g. MATLAB, R) Performance MATLAB R Excel Low-level code High-level code 0 time-to-market

18 Modeling software Spreadsheet software (e.g. Microsoft Excel) Code programming (C/C++, Python) Statistical computing environments (e.g. MATLAB, R) Performance MATLAB Low-level code Cost R Excel Flexibility High-level code 0 time-to-market

19 Summary

20 Summary To take decisions we need estimates and predictions

21 Summary To take decisions we need estimates and predictions To generate predictions we need models

22 Summary To take decisions we need estimates and predictions To generate predictions we need models To build models we need data and information

23 DRIP & Big Data DRIP (Data Rich Information Poor) era (Big Data) We need tools to deal with high-dimensional and heterogeneous data Data factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation ( Merriam-Webster)? Information knowledge obtained from investigation, study, or instruction ( Merriam-Webster)

24 Data vs Information Precise formally-defined Accurate Data Information Useful context-related Meaningful

25 It is often said that we suffer from information overload, whereas we actually suffer from data overload. The problem is that we have access to large amounts of data containing relatively small amounts of useful information. INDEPENDENT COMPONENT ANALYSIS A Tutorial Introduction James V. Stone

26 Impact Models How the weather affects agriculture in a specific region? (weather/climate agricultural system) How the weather affects the solar power production? (weather/climate energy sector)

27 Impact Models GCM/ RGCM Climate Data Compare with observations Skill & Error Measures

28 Impact Models GCM/ RGCM Climate Data Impact Models Impact Information Compare with observations Skill & Error Measures

29 Drought Early Warning

30 Drought Early Warning From precipitation/evotranspiration (water balance) to crop production

31 Drought Early Warning From precipitation/evotranspiration (water balance) to crop production to the (estimated) number of people affected by drought

32 Drought Early Warning From precipitation/evotranspiration (water balance) to crop production to the (estimated) number of people affected by drought Objectives: model must be reliable, timely and tailored

33 WRSI: Crop Water Requirement Satisfaction Index CWRSI = q belg WRSI belg + q meher WRSI meher RW RSI = P P CWRSI P P N = N 0 + K [log(w m W 0 ) log(rw RSI W 0 )]

34 Impact Model Calibrated on observed data using empirical rules and statistical methods Monte Carlo methods to deal with the uncertainty

35 Solar Power

36 Approaches

37 Approaches Modelling PV technology (isc, ki, kv, )

38 Approaches Modelling PV technology (isc, ki, kv, ) Black-box model observing input (weather) and output (production)

39 Approaches Modelling PV technology (isc, ki, kv, ) Black-box model observing input (weather) and output (production) Accuracy Feasibility Uncertainty? What if 100 PV plants? panels?

40 Data Mining Approach

41 Data Mining Approach Extraction of implicit and potentially useful information from data Automated search (software) Main difficulties: Defining interestingness Spurious and accidental coincidences (exceptions) Missing data and noise

42 Modelling Methods Experience Rule-of-thumbs and good practises Statistics Machine Learning methods Forecasting Analysis of past events Control Anomaly Detection Why?

43 Tables Rudimentary and simple Look up the appropriate inputs Table 1.2 Weather Data Outlook Temperature Humidity Windy Play Sunny hot high false no Sunny hot high true no Overcast hot high false yes Rainy mild high false yes Rainy cool normal false yes Problem with large (real-world) datasets What about continuous variables?

44 Linear Models y = a 1 SSR + a 2 CC + a 3

45 Linear Models y = SSR CC Correlation of 0.84 Can we trust this model?

46 Regression Trees yes 1 SSR < 0.5 no 2 CV >= SSR < SSR < 0.19 SSR < CV >= library rpart rtree = rpart(pv_prod ~ SSR + CV, data = my_obs)

47 Regression Tree #2 output day

48 Linear vs Tree lm(formula=output~ssr+cv,data=df) SSR: CV vector rpart(formula=output~ssr+cv,data=df) SSR: CV SSR CV SSR CV

49 Linear vs Nonlinear

50 Best Tools

51 Best Tools 1. Data Retrieval

52 Best Tools 1. Data Retrieval 2. Data Processing

53 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building

54 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building 4. Data Visualization

55 Best Tools 1. Data Retrieval ecomsudg 2. Data Processing 3. Model Building 4. Data Visualization

56 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building 4. Data Visualization ecomsudg rnoaa

57 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building 4. Data Visualization ecomsudg downscaler rnoaa

58 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building 4. Data Visualization ecomsudg downscaler rnoaa plyr/dplyr

59 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building 4. Data Visualization ecomsudg downscaler e1071 rnoaa plyr/dplyr

60 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building 4. Data Visualization ecomsudg downscaler e1071 rnoaa plyr/dplyr rpart

61 Best Tools 1. Data Retrieval 2. Data Processing 3. Model Building ecomsudg downscaler e1071 rnoaa plyr/dplyr rpart 4. Data Visualization shiny, ggplot2

62 Meanwhile in the real world

63 Meanwhile in the real world.grib.nc metadata You expect this

64 Meanwhile in the real world.grib.nc metadata You expect this but you get this!

65 Dealing with the complicatedness

66 Dealing with the complicatedness Be patient

67 Dealing with the complicatedness Be patient Be methodical (frommessydatatonetcdf.r)

68 Dealing with the complicatedness Be patient Be methodical (frommessydatatonetcdf.r) Be reproduciple"

69 Data Visualization Scientist Programmer Data Analyst Scientific Papers Data Visualisation Stakeholders

70 Rstudio (Server) Open

71 Rstudio (Server)

72 Rstudio (Server)

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?

More information

Machine Learning Chapter 2. Input

Machine Learning Chapter 2. Input Machine Learning Chapter 2. Input 2 Input: Concepts, instances, attributes Terminology What s a concept? Classification, association, clustering, numeric prediction What s in an example? Relations, flat

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Summary. Machine Learning: Introduction. Marcin Sydow

Summary. Machine Learning: Introduction. Marcin Sydow Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:

More information

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter

More information

MACHINE LEARNING Example: Google search

MACHINE LEARNING Example: Google search MACHINE LEARNING Lauri Ilison, PhD Data Scientist 20.11.2014 Example: Google search 1 27.11.14 Facebook: 350 million photo uploads every day The dream is to build full knowledge of the world and know everything

More information

Data Mining and Machine Learning: Techniques and Algorithms

Data Mining and Machine Learning: Techniques and Algorithms Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,

More information

Basic Concepts Weka Workbench and its terminology

Basic Concepts Weka Workbench and its terminology Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know

More information

Data Mining. Part 1. Introduction. 1.4 Input. Spring Instructor: Dr. Masoud Yaghini. Input

Data Mining. Part 1. Introduction. 1.4 Input. Spring Instructor: Dr. Masoud Yaghini. Input Data Mining Part 1. Introduction 1.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be

More information

COMP33111: Tutorial and lab exercise 7

COMP33111: Tutorial and lab exercise 7 COMP33111: Tutorial and lab exercise 7 Guide answers for Part 1: Understanding clustering 1. Explain the main differences between classification and clustering. main differences should include being unsupervised

More information

Data Mining. Part 1. Introduction. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Input

Data Mining. Part 1. Introduction. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Input Data Mining Part 1. Introduction 1.3 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Input: Concepts, instances, attributes Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Terminology What s a concept z Classification,

More information

Data Mining Algorithms: Basic Methods

Data Mining Algorithms: Basic Methods Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association

More information

Data Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input

Data Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input Data Mining 1.3 Input Fall 2008 Instructor: Dr. Masoud Yaghini Outline Instances Attributes References Instances Instance: Instances Individual, independent example of the concept to be learned. Characterized

More information

Data Representation Information Retrieval and Data Mining. Prof. Matteo Matteucci

Data Representation Information Retrieval and Data Mining. Prof. Matteo Matteucci Data Representation Information Retrieval and Data Mining Prof. Matteo Matteucci Instances, Attributes, Concepts 2 Instances The atomic elements of information from a dataset Also known as records, prototypes,

More information

Input: Concepts, Instances, Attributes

Input: Concepts, Instances, Attributes Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation Prediction Prediction What is Prediction Simple methods for Prediction Classification by decision tree induction Classification and regression evaluation 2 Prediction Goal: to predict the value of a given

More information

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form) Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning 1 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle

More information

Decision Trees: Discussion

Decision Trees: Discussion Decision Trees: Discussion Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning

More information

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges. Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k

More information

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before

More information

Data Mining and Analytics

Data Mining and Analytics Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/

More information

DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM

DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM 1 Proceedings of SEAMS-GMU Conference 2007 DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM KUSRINI Abstract. Decision tree is one of data mining techniques that is applied in classification

More information

Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline

Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline Learn to Use Weka Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb-09-2010 Outline Introduction of Weka Explorer Filter Classify Cluster Experimenter KnowledgeFlow

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

An Empirical Study on feature selection for Data Classification

An Empirical Study on feature selection for Data Classification An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of

More information

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association

More information

Normalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation

Normalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation Preprocessing Data Normalization and denormalization Missing values Outliers detection and removing Noisy Data Variants of Attributes Meta Data Data Transformation Reading material: Chapters 2 and 3 of

More information

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules

More information

1. Introduction. 2. Modelling elements III. CONCEPTS OF MODELLING. - Models in environmental sciences have five components:

1. Introduction. 2. Modelling elements III. CONCEPTS OF MODELLING. - Models in environmental sciences have five components: III. CONCEPTS OF MODELLING 1. INTRODUCTION 2. MODELLING ELEMENTS 3. THE MODELLING PROCEDURE 4. CONCEPTUAL MODELS 5. THE MODELLING PROCEDURE 6. SELECTION OF MODEL COMPLEXITY AND STRUCTURE 1 1. Introduction

More information

Lecture 20: SW Testing Presented by: Mohammad El-Ramly, PhD

Lecture 20: SW Testing Presented by: Mohammad El-Ramly, PhD Cairo University Faculty of Computers and Information CS251 Software Engineering Lecture 20: SW Testing Presented by: Mohammad El-Ramly, PhD http://www.acadox.com/join/75udwt Outline Definition of Software

More information

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Decision Support for Extreme Weather Impacts on Critical Infrastructure

Decision Support for Extreme Weather Impacts on Critical Infrastructure Decision Support for Extreme Weather Impacts on Critical Infrastructure B. W. Bush Energy & Infrastructure Analysis Group Los Alamos National Laboratory Research Applications Laboratory and Computational

More information

CS513-Data Mining. Lecture 2: Understanding the Data. Waheed Noor

CS513-Data Mining. Lecture 2: Understanding the Data. Waheed Noor CS513-Data Mining Lecture 2: Understanding the Data Waheed Noor Computer Science and Information Technology, University of Balochistan, Quetta, Pakistan Waheed Noor (CS&IT, UoB, Quetta) CS513-Data Mining

More information

The Basics of HVAC Operation and the Role of Automation. Mr. Jay Zaino GxP Automation, LLC

The Basics of HVAC Operation and the Role of Automation. Mr. Jay Zaino GxP Automation, LLC Welcome The Basics of HVAC Operation and the Role of Automation Mr. Jay Zaino GxP Automation, LLC Agenda Introduction / Overview HVAC Basics HVAC in Life Science Market Automation s role in it all 1 Introduction

More information

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d) Outline RainForest A Framework for Fast Decision Tree Construction of Large Datasets resented by: ov. 25, 2004 1. 2. roblem Definition 3. 4. Family of Algorithms 5. 6. 2 Classification is an important

More information

Computer-based Tracking Protocols: Improving Communication between Databases

Computer-based Tracking Protocols: Improving Communication between Databases Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules

More information

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer

More information

The Explorer. chapter Getting started

The Explorer. chapter Getting started chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

Homework 1 Sample Solution

Homework 1 Sample Solution Homework 1 Sample Solution 1. Iris: All attributes of iris are numeric, therefore ID3 of weka cannt be applied to this data set. Contact-lenses: tear-prod-rate = reduced: none tear-prod-rate = normal astigmatism

More information

Technical Capabilities of WMO and its Partners in Support of GFCS Implementation: the Climate Services Information System

Technical Capabilities of WMO and its Partners in Support of GFCS Implementation: the Climate Services Information System Technical Capabilities of WMO and its Partners in Support of GFCS Implementation: the Climate Services Information System Simon Mason simon@iri.columbia.edu Geneva, Switzerland, 29 September 01 October

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.

More information

Machine Learning Applications for Data Center Optimization

Machine Learning Applications for Data Center Optimization Machine Learning Applications for Data Center Optimization Jim Gao, Google Ratnesh Jamidar Indian Institute of Technology, Kanpur October 27, 2014 Outline Introduction Methodology General Background Model

More information

Advance analytics and Comparison study of Data & Data Mining

Advance analytics and Comparison study of Data & Data Mining Advance analytics and Comparison study of Data & Data Mining Review paper on Concepts and Practice with RapidMiner Vandana Kaushik 1, Dr. Vikas Siwach 2 M.Tech (Software Engineering) Kaushikvandana22@gmail.com

More information

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points Lunds Tekniska Högskola EDA132 Institutionen för datavetenskap VT 2017 Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen 2016 03 15, 14.00 19.00, MA:8 You can give your answers

More information

BEAM TILTED CORRELATIONS. Frank Vignola Department of Physics University of Oregon Eugene, OR

BEAM TILTED CORRELATIONS. Frank Vignola Department of Physics University of Oregon Eugene, OR BEAM TILTED CORRELATIONS Frank Vignola Department of Physics University of Oregon Eugene, OR 9743-1274 fev@uoregon.edu ABSTRACT A model is described and evaluated that calculates beam irradiance from tilted

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Introduction to Simulink Design Optimization

Introduction to Simulink Design Optimization 2009 The MathWorks, Inc. Introduction to Simulink Design Optimization Estimate and optimize Simulink model parameters Arkadiy Turevskiy and Alec Stothert Introduction to Simulink Design Optimization Estimate

More information

Information Driven Healthcare:

Information Driven Healthcare: Information Driven Healthcare: Machine Learning course Lecture: Feature selection I --- Concepts Centre for Doctoral Training in Healthcare Innovation Dr. Athanasios Tsanas ( Thanasis ), Wellcome Trust

More information

The Solutions to Some Key Problems of Solar Energy Output in the Belt and Road Yong-ping GAO 1,*, Li-li LIAO 2 and Yue-shun HE 3

The Solutions to Some Key Problems of Solar Energy Output in the Belt and Road Yong-ping GAO 1,*, Li-li LIAO 2 and Yue-shun HE 3 2016 International Conference on Artificial Intelligence and Computer Science (AICS 2016) ISBN: 978-1-60595-411-0 The Solutions to Some Key Problems of Solar Energy Output in the Belt and Road Yong-ping

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

Data Mining. Wes Wilson Gerry Wiener Bill Myers

Data Mining. Wes Wilson Gerry Wiener Bill Myers Data Mining Wes Wilson Gerry Wiener Bill Myers OLS Ordinary Least Squares You're no less a miracle, just because you're ordinary E.Jong Regression Model: y(t) = W 0 + Σ W i *P i (t) P i is the i th predictor

More information

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009

Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009 Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009 Week Description Reading Material 12 Mar 23- Mar 27 Uncertainty and Sensitivity Analysis Two forms of crop models Random sampling for stochastic

More information

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning Data Mining and Machine Learning Instance-Based Learning Rote Learning k Nearest-Neighbor Classification Prediction, Weighted Prediction choosing k feature weighting (RELIEF) instance weighting (PEBLS)

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12 Association Rules Charles Sutton Data Mining and Exploration Spring 2012 Based on slides by Chris Williams and Amos Storkey The Goal Find patterns : local regularities that occur more often than you would

More information

Technical Computing with MATLAB

Technical Computing with MATLAB Technical Computing with MATLAB University Of Bath Seminar th 19 th November 2010 Adrienne James (Application Engineering) 1 Agenda Introduction to MATLAB Importing, visualising and analysing data from

More information

Introduction to Machine Learning CANB 7640

Introduction to Machine Learning CANB 7640 Introduction to Machine Learning CANB 7640 Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/5/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/canb7640/

More information

Data warehouse and Data Mining

Data warehouse and Data Mining Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Coefficient of Variation based Decision Tree (CvDT)

Coefficient of Variation based Decision Tree (CvDT) Coefficient of Variation based Decision Tree (CvDT) Hima Bindu K #1, Swarupa Rani K #2, Raghavendra Rao C #3 # Department of Computer and Information Sciences, University of Hyderabad Hyderabad, 500046,

More information

MATLAB COMPUTATIONAL FINANCE CONFERENCE Quantitative Sports Analytics using MATLAB

MATLAB COMPUTATIONAL FINANCE CONFERENCE Quantitative Sports Analytics using MATLAB MATLAB COMPUTATIONAL FINANCE CONFERENCE 2017 Quantitative Sports Analytics using MATLAB Robert Kissell, PhD Robert.Kissell@KissellResearch.com September 28, 2017 Important Email and Web Addresses AlgoSports23/MATLAB

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

Python Programming: An Introduction to Computer Science

Python Programming: An Introduction to Computer Science Python Programming: An Introduction to Computer Science Chapter 7 Decision Structures Python Programming, 2/e 1 Simple Decisions So far, we ve viewed programs as sequences of instructions that are followed

More information

Optical Testing... What s involved, and how it s done. Hugh Barton, OptiConsulting

Optical Testing... What s involved, and how it s done. Hugh Barton, OptiConsulting Optical Testing... What s involved, and how it s done. Slide 1 Why do we Test? Demonstrate compliance (or otherwise) with specifications CE Marking Manage risk Develop confidence during the design / development

More information

Time Series Data Analysis on Agriculture Food Production

Time Series Data Analysis on Agriculture Food Production , pp.520-525 http://dx.doi.org/10.14257/astl.2017.147.73 Time Series Data Analysis on Agriculture Food Production A.V.S. Pavan Kumar 1 and R. Bhramaramba 2 1 Research Scholar, Department of Computer Science

More information

BITS F464: MACHINE LEARNING

BITS F464: MACHINE LEARNING BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031

More information

Principles for a National Space Industry Policy

Principles for a National Space Industry Policy Principles for a National Space Industry Policy Commonwealth of Australia 2011 DIISR 11/144 This work is copyright. Apart from any use as permitted under the Copyright Act 1968, no part may be reproduced

More information

R: A Gentle Introduction. Vega Bharadwaj George Mason University Data Services

R: A Gentle Introduction. Vega Bharadwaj George Mason University Data Services R: A Gentle Introduction Vega Bharadwaj George Mason University Data Services Part I: Why R? What do YOU know about R and why do you want to learn it? Reasons to use R Free and open-source User-created

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Admin Course add/drop deadline tomorrow. Assignment 1 is due Friday. Setup your CS undergrad account ASAP to use Handin: https://www.cs.ubc.ca/getacct

More information

Introduction to R Programming

Introduction to R Programming Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

World Climate Conference-3

World Climate Conference-3 World Climate Conference-3 Better climate information for a better future Geneva, Switzerland 31 August 4 4 September 2009 The road to WCC-3 1 st WCC (1979) 2 nd WCC (1990) Climate variability/change impacts

More information

Data Science Course Content

Data Science Course Content CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference

More information

The statistical downscaling portal. An end-to-end tool for regional impact studies

The statistical downscaling portal. An end-to-end tool for regional impact studies http://www.meteo.unican.es The statistical downscaling portal. An end-to-end tool for regional impact studies José M. Gutiérrez gutierjm@unican.es Santander MetGroup IFCA (CSIC Univ. Cantabria) Daniel

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM 1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

More information

Data Curation Profile Plant Genetics / Corn Breeding

Data Curation Profile Plant Genetics / Corn Breeding Profile Author Author s Institution Contact Researcher(s) Interviewed Researcher s Institution Katherine Chiang Cornell University Library ksc3@cornell.edu Withheld Cornell University Date of Creation

More information

Parametric & Hone User Guide

Parametric & Hone User Guide Parametric & Hone User Guide IES Virtual Environment Copyright 2017 Integrated Environmental Solutions Limited. All rights reserved. No part of the manual is to be copied or reproduced in any Contents

More information

CITS2401 Computer Analysis & Visualisation

CITS2401 Computer Analysis & Visualisation FACULTY OF ENGINEERING, COMPUTING AND MATHEMATICS CITS2401 Computer Analysis & Visualisation SCHOOL OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING Topic 13 Revision Notes CAV review Topics Covered Sample

More information

What's New in MATLAB for Engineering Data Analytics?

What's New in MATLAB for Engineering Data Analytics? What's New in MATLAB for Engineering Data Analytics? Will Wilson Application Engineer MathWorks, Inc. 2017 The MathWorks, Inc. 1 Agenda Data Types Tall Arrays for Big Data Machine Learning (for Everyone)

More information

IPS with isensor sees, identifies and blocks more malicious traffic than other IPS solutions

IPS with isensor sees, identifies and blocks more malicious traffic than other IPS solutions IPS Effectiveness IPS with isensor sees, identifies and blocks more malicious traffic than other IPS solutions An Intrusion Prevention System (IPS) is a critical layer of defense that helps you protect

More information

Decision Trees In Weka,Data Formats

Decision Trees In Weka,Data Formats CS 4510/9010 Applied Machine Learning 1 Decision Trees In Weka,Data Formats Paula Matuszek Fall, 2016 J48: Decision Tree in Weka 2 NAME: weka.classifiers.trees.j48 SYNOPSIS Class for generating a pruned

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Verification and Validation. Assuring that a software system meets a user s needs. Verification vs Validation. The V & V Process

Verification and Validation. Assuring that a software system meets a user s needs. Verification vs Validation. The V & V Process Verification and Validation Assuring that a software system meets a user s needs Ian Sommerville 1995/2000 (Modified by Spiros Mancoridis 1999) Software Engineering, 6th edition. Chapters 19,20 Slide 1

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Classification: Decision Trees

Classification: Decision Trees Metodologie per Sistemi Intelligenti Classification: Decision Trees Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo regionale di Como Lecture outline What is a decision

More information

Implementation of Classification Rules using Oracle PL/SQL

Implementation of Classification Rules using Oracle PL/SQL 1 Implementation of Classification Rules using Oracle PL/SQL David Taniar 1 Gillian D cruz 1 J. Wenny Rahayu 2 1 School of Business Systems, Monash University, Australia Email: David.Taniar@infotech.monash.edu.au

More information

GMS 9.1 Tutorial MODFLOW Stochastic Modeling, PEST Null Space Monte Carlo II Use results from PEST NSMC to evaluate the probability of a prediction

GMS 9.1 Tutorial MODFLOW Stochastic Modeling, PEST Null Space Monte Carlo II Use results from PEST NSMC to evaluate the probability of a prediction v. 9.1 GMS 9.1 Tutorial MODFLOW Stochastic Modeling, PEST Null Space Monte Carlo II Use results from PEST NSMC to evaluate the probability of a prediction Objectives Learn how to use the results from a

More information

Data Mining Input: Concepts, Instances, and Attributes

Data Mining Input: Concepts, Instances, and Attributes Data Mining Input: Concepts, Instances, and Attributes Chapter 2 of Data Mining Terminology Components of the input: Concepts: kinds of things that can be learned Goal: intelligible and operational concept

More information

CHAPTER 6. Computer Model

CHAPTER 6. Computer Model CHAPTER 6 Computer Model 6.1 Introduction In the previous chapters, the underlying principles that a designer of photovoltaic systems needs to understand before beginning the design process have been addressed.

More information