Data Mining Introduction
|
|
- Augusta Dennis
- 5 years ago
- Views:
Transcription
1 Data Mining Introduction University of Iowa Power Plant is conducting study on their Boiler 11 Wants to use this data to predict efficiency and classification accuracy 1
2 Problem Definition Data has been collected measuring 14 different variables within the boiler The original data set was mined the rules determined from this data set only consider the static data The project will try to determine how the rules will change classification accuracy (CA) if you consider the trends of the data, instead of focusing on what the value was at a particular time Project Goals The goal of the project is to determine if boiler efficiency can be predicted more accurately if the dynamic nature of the process is considered Create new features based on trends and current values Find which discretization level impacts the classification accuracy best Learned how to use data mining software 2
3 Data Collection Iowa City Power Plant raw data 14 different variables I.E. Boiler Master, Air Fuel Ratio, ect About 10,000 different observations for each variable Data collected was collected at one minute intervals on randomly selected days Trends were created by looking at 20 minute intervals Slope Calculation Calculate slope of raw data over time Understood how the data was changing from point to point Used 20 minute intervals Standardize Slope of data Eased comparison Improved transformation process 3
4 Slope Discretization Discretization of the standardized data Determined how much the standardized slope data was changing Decreasing rapidly Decreasing No change Increasing Increasing rapidly Feature Discretization Discretization count The Values were assigned based on mean and standard deviation. Ex: SA Fan Flow Mean=55.67, Std Dev=6.98 IF(B2>65,65,IF(B2>60,60,IF(B2>57,57,IF(B2>55,55,IF(B 2>52,52,IF(B2>50,50,IF(B2>45,45,IF(B2>40,40,35)))))))) actual value discretized standardized value discretized 57_3 4
5 65_3 65_-2 65_2 65_-1 65_1 65_ _0.5 65_0 60_-3 60_3 60_-2 60_2 60_-1 60_1 60_ _0.5 60_0 57_-3 57_3 57_-2 57_2 57_-1 57_1 57_ _0.5 57_0 Feature Discretization High Discretization Decreasing rapidly Decreasing No change Increasing Increasing rapidly -2S -1S 0 1S 2S Low Discretization Same process as before except ONLY use 3 values Decreasing rapidly -1S No change 0 Increasing 1S 5
6 Experiment Setup Moving Average The first set of date used was the moving average of the raw data Bad idea because moving average takes out all the noise and extreme points Impossible to get a classification accuracy Therefore we used the raw data Solution Approach Data Mining Interested in using data to predict efficiency based on rule sets Supervised Learning because our training set had labels indicating the classes of observations From the rules sets we extract, we should be able to predict efficiency levels for new data that is measured. 6
7 Solution Approach Prepared the data as discussed in model formulation Ran data through data mining software to determine rules Weka Free data mining software In data sets replaced trend testing with original variable Example: Air Fuel Ratio Trend replaced Air Fuel Ratio Computational Study Software uses machine learning algorithms to solve data mining problems Used.j48.j48 and.j48.part algorithms to analyze data sets Based on the C4.5 data mining algorithm learned in class 7
8 Classifiers Used J48.J48 Classifier Decision tree algorithm Builds decision trees 10-fold validation to determine classification accuracy Unfortunately, too many branches would need to be created for this data, so the software could not handle it Classifiers Used J48.PART C4.5 Algorithm Creates rule sets 10-fold verification to determine classification accuracy te/lecture/datamining.pdf 8
9 Results % increase in rules from raw data % increase in rules from low to high variable discretization % increase CA # rules raw data none % 881 ave mid temp trend high % % 38.5% ave mid temp trend low % % air fuel ratio trend high % % 37.4% air fuel ratio trend low % % biomass feedrate trend high % % 21.3% biomass feedrate trend low % % Results Approximately 1000 rules and about 10,000 data points Each rule on average only describes 10 points Raw data describes on average 11.4 data points (10000/880) Low discretized data describes on average 10 data points (10000/1000) High discretized data describes on average 7.1 data points (10000/1400) 9
10 Conclusions Combining trends into the data set can increase the classification accuracy Low discretization lowers the number of rules formed, but adding trends increases the number of rules formed Low discretization is better than high discretization for two reasons: Higher classification accuracy (in most cases) Less rules formed Any Questions??? 10
Data Mining and Evolutionary Computation Algorithms for Process Modeling and Optimization
Data Mining and Evolutionary Computation Algorithms for Process Modeling and Optimization Zhe Song, Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 andrew-kusiak@uiowa.edu Tel: 319-335-5934
More informationA Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995) Department of Information, Operations and Management Sciences Stern School of Business, NYU padamopo@stern.nyu.edu
More informationData Mining Practical Machine Learning Tools and Techniques
Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward
More informationData Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationAn adaptive Bayesian classification for real-time image analysis in real-time particle monitoring for polymer film manufacturing
Data Mining VI 455 An adaptive Bayesian classification for real-time image analysis in real-time particle monitoring for polymer film manufacturing K. Torabi, S. Sayad & S. T. Balke Department of Chemical
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationStatistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.
Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationSubject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.
Subject Copy paste feature into the diagram. When we define the data analysis process into Tanagra, it is possible to copy components (or entire branches of components) towards another location into the
More informationData Mining with Weka
Data Mining with Weka Class 5 Lesson 1 The data mining process Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1 The data mining process Class
More informationDATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data
DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine
More informationModel s Performance Measures
Model s Performance Measures Evaluating the performance of a classifier Section 4.5 of course book. Taking into account misclassification costs Class imbalance problem Section 5.7 of course book. TNM033:
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationVerification and Validation of X-Sim: A Trace-Based Simulator
http://www.cse.wustl.edu/~jain/cse567-06/ftp/xsim/index.html 1 of 11 Verification and Validation of X-Sim: A Trace-Based Simulator Saurabh Gayen, sg3@wustl.edu Abstract X-Sim is a trace-based simulator
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationUniversity of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationRSM Split-Plot Designs & Diagnostics Solve Real-World Problems
RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationHow Learning Differs from Optimization. Sargur N. Srihari
How Learning Differs from Optimization Sargur N. srihari@cedar.buffalo.edu 1 Topics in Optimization Optimization for Training Deep Models: Overview How learning differs from optimization Risk, empirical
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationWEKA homepage.
WEKA homepage http://www.cs.waikato.ac.nz/ml/weka/ Data mining software written in Java (distributed under the GNU Public License). Used for research, education, and applications. Comprehensive set of
More informationImproving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique
Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation
More informationA Benders decomposition approach for the robust shortest path problem with interval data
A Benders decomposition approach for the robust shortest path problem with interval data R. Montemanni, L.M. Gambardella Istituto Dalle Molle di Studi sull Intelligenza Artificiale (IDSIA) Galleria 2,
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationOnline Pattern Recognition in Multivariate Data Streams using Unsupervised Learning
Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning
More informationEstimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees
Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More informationBayes Classifiers and Generative Methods
Bayes Classifiers and Generative Methods CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Stages of Supervised Learning To
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More informationA System for Managing Experiments in Data Mining. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment
A System for Managing Experiments in Data Mining A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Greeshma
More informationIMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER
IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER N. Suresh Kumar, Dr. M. Thangamani 1 Assistant Professor, Sri Ramakrishna Engineering College, Coimbatore, India 2 Assistant
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More information2. POINT CLOUD DATA PROCESSING
Point Cloud Generation from suas-mounted iphone Imagery: Performance Analysis A. D. Ladai, J. Miller Towill, Inc., 2300 Clayton Road, Suite 1200, Concord, CA 94520-2176, USA - (andras.ladai, jeffrey.miller)@towill.com
More informationPARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS. Mike Gerdes 1, Dieter Scholz 1
AST 2011 Workshop on Aviation System Technology PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS Mike Gerdes 1, Dieter Scholz 1 1 Aero - Aircraft Design
More informationCPSC 695. Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova
CPSC 695 Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova Overview Data sampling for continuous surfaces Interpolation methods Global interpolation Local interpolation
More informationCS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008
CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem
More informationTUBE: Command Line Program Calls
TUBE: Command Line Program Calls March 15, 2009 Contents 1 Command Line Program Calls 1 2 Program Calls Used in Application Discretization 2 2.1 Drawing Histograms........................ 2 2.2 Discretizing.............................
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationMachine Learning - Clustering. CS102 Fall 2017
Machine Learning - Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for
More informationData Preparation. Nikola Milikić. Jelena Jovanović
Data Preparation Nikola Milikić nikola.milikic@fon.bg.ac.rs Jelena Jovanović jeljov@fon.bg.ac.rs Normalization Normalization is the process of rescaling the values of an attribute to a specific value range,
More informationKey Terms. Symbology. Categorical attributes. Style. Layer file
Key Terms Symbology Categorical attributes Style Layer file Review Questions POP-RANGE is a string field of the Cities feature class with the following entries: 0-9,999, 10,000-49,999, 50,000-99,000 This
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationI-96 Case Study: Jointed Concrete Pavement Curling and Warp Presented in the Context of Pavement Asset Management
I-96 Case Study: Jointed Concrete Pavement Curling and Warp Presented in the Context of Pavement Asset Management Christopher R. Byrum, PhD, PE PAVEMENT ANALYSIS: Soil-Structure Interaction Engineering
More informationK-Mean Clustering Algorithm Implemented To E-Banking
K-Mean Clustering Algorithm Implemented To E-Banking Kanika Bansal Banasthali University Anjali Bohra Banasthali University Abstract As the nations are connected to each other, so is the banking sector.
More informationOutline. Prepare the data Classification and regression Clustering Association rules Graphic user interface
Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other
More informationHands on Datamining & Machine Learning with Weka
Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze
More informationData Collection, Preprocessing and Implementation
Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,
More informationAn Introduction to Dynamic Simulation Modeling
Esri International User Conference San Diego, CA Technical Workshops ****************** An Introduction to Dynamic Simulation Modeling Kevin M. Johnston Shitij Mehta Outline Model types - Descriptive versus
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationClassification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging
1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationCHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY
23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series
More informationIII. CONCEPTS OF MODELLING II.
III. CONCEPTS OF MODELLING II. 5. THE MODELLING PROCEDURE 6. TYPES OF THE MODELS 7. SELECTION OF MODEL TYPE 8. SELECTION OF MODEL COMPLEXITY AND STRUCTURE 1 5. MODELLING PROCEDURE Three significant steps
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationEnergy Engineering: Tools and Trends
Energy Engineering: Tools and Trends AEE Northeast Ohio Chapter Meeting March 27, 2008 1 Introduction Background and Experience Energy Engineer for Trane 20 years in HVAC Industry Equipment design and
More informationCyber attack detection using decision tree approach
Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information
More informationClustering Analysis based on Data Mining Applications Xuedong Fan
Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based
More informationImproving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique
www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn
More informationPlease download and install the Phylogenetics for Mathematica and the Quantitative Paleontology for Mathematica packages.
In addition to the packages that come with Mathematica, one can install packages written by other Mathematica users for special purposes. Several such packages are available from the Polly Lab website
More informationCustomer Clustering using RFM analysis
Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras
More informationAnalysis of Feature Selection Techniques: A Data Mining Approach
Analysis of Feature Selection Techniques: A Data Mining Approach Sheena M.Tech Scholar CSE, SBSSTC Krishan Kumar Associate Professor CSE, SBSSTC Gulshan Kumar Assistant Professor MCA, SBSSTC ABSTRACT Feature
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationUsing Decision Boundary to Analyze Classifiers
Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationData Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats
Data Mining Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka I. Data sets I.1. Data sets characteristics and formats The data to be processed can be structured (e.g. data matrix,
More informationSTANFRED CONSULTANTS PAGE 1 E N R O L L M E N T P R O J E C T I O N DATE VERSION V-PC ID MASON PUBLIC HEAD COUNT K-12 (NO ECSE)
STANFRED CONSULTANTS PAGE 1 FOR FACILITIES PLANNING PURPOSES WE SUGGEST USING THE 1.5 PROJECTIONS EXPECTING ENROLLMENTS TO FALL BETWEEN THE MOST LIKELY AND HIGH MUCH CLOSER TO THE MOST LIKELY -- THREE
More informationSIMULATED LIDAR WAVEFORMS FOR THE ANALYSIS OF LIGHT PROPAGATION THROUGH A TREE CANOPY
SIMULATED LIDAR WAVEFORMS FOR THE ANALYSIS OF LIGHT PROPAGATION THROUGH A TREE CANOPY Angela M. Kim and Richard C. Olsen Remote Sensing Center Naval Postgraduate School 1 University Circle Monterey, CA
More informationFunction Algorithms: Linear Regression, Logistic Regression
CS 4510/9010: Applied Machine Learning 1 Function Algorithms: Linear Regression, Logistic Regression Paula Matuszek Fall, 2016 Some of these slides originated from Andrew Moore Tutorials, at http://www.cs.cmu.edu/~awm/tutorials.html
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationLab 9. Julia Janicki. Introduction
Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationClassifying Twitter Data in Multiple Classes Based On Sentiment Class Labels
Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),
More informationOptimizing Completion Techniques with Data Mining
Optimizing Completion Techniques with Data Mining Robert Balch Martha Cather Tom Engler New Mexico Tech Data Storage capacity is growing at ~ 60% per year -- up from 30% per year in 2002. Stored data estimated
More informationSPM Users Guide. This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more.
SPM Users Guide Model Compression via ISLE and RuleLearner This guide elaborates on powerful ways to combine the TreeNet and GPS engines to achieve model compression and more. Title: Model Compression
More informationAjloun National University
Study Plan Guide for the Bachelor Degree in Computer Information System First Year hr. 101101 Arabic Language Skills (1) 101099-01110 Introduction to Information Technology - - 01111 Programming Language
More informationECLT 5810 Evaluation of Classification Quality
ECLT 5810 Evaluation of Classification Quality Reference: Data Mining Practical Machine Learning Tools and Techniques, by I. Witten, E. Frank, and M. Hall, Morgan Kaufmann Testing and Error Error rate:
More informationWhite paper ETERNUS Extreme Cache Performance and Use
White paper ETERNUS Extreme Cache Performance and Use The Extreme Cache feature provides the ETERNUS DX500 S3 and DX600 S3 Storage Arrays with an effective flash based performance accelerator for regions
More informationIndividualized Error Estimation for Classification and Regression Models
Individualized Error Estimation for Classification and Regression Models Krisztian Buza, Alexandros Nanopoulos, Lars Schmidt-Thieme Abstract Estimating the error of classification and regression models
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationInput: Concepts, Instances, Attributes
Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,
More informationAMOL MUKUND LONDHE, DR.CHELPA LINGAM
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationPractical Data Mining COMP-321B. Tutorial 4: Preprocessing
Practical Data Mining COMP-321B Tutorial 4: Preprocessing Shevaun Ryan Mark Hall June 30, 2008 c 2006 University of Waikato 1 Introduction For this tutorial we will be using the Preprocess panel, the Classify
More informationAutomated Parameter Optimization for Feature Extraction for Condition Monitoring
Automated Parameter Optimization for Feature Extraction for Condition Monitoring Mike Gerdes 1, Diego Galar 2, Dieter Scholz 1 1 Hamburg University of Applied Sciences, AERO Aircraft Design and Systems
More informationDESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES
EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset
More informationDATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI
DATA ANALYSIS WITH WEKA Author: Nagamani Mutteni Asst.Professor MERI Topic: Data Analysis with Weka Course Duration: 2 Months Objective: Everybody talks about Data Mining and Big Data nowadays. Weka is
More informationIJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): 2321-0613 A Study on Handling Missing Values and Noisy Data using WEKA Tool R. Vinodhini 1 A. Rajalakshmi
More informationExperimental Design and Graphical Analysis of Data
Experimental Design and Graphical Analysis of Data A. Designing a controlled experiment When scientists set up experiments they often attempt to determine how a given variable affects another variable.
More informationClassification of Hand-Written Numeric Digits
Classification of Hand-Written Numeric Digits Nyssa Aragon, William Lane, Fan Zhang December 12, 2013 1 Objective The specific hand-written recognition application that this project is emphasizing is reading
More informationPredicting connection quality in peer-to-peer real-time video streaming systems
Predicting connection quality in peer-to-peer real-time video streaming systems Alex Giladi Jeonghun Noh Information Systems Laboratory, Department of Electrical Engineering Stanford University, Stanford,
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More information