Non-trivial extraction of implicit, previously unknown and potentially useful information from data
|
|
- Darleen Bryant
- 5 years ago
- Views:
Transcription
1 CS 795/895 Applied Visual Analytics Spring 2013 Data Mining Dr. Michele C. Weigle What is Data Mining? Many Definitions Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns 2 CS 795/895 - Spring Weigle Slides Tan,Steinbach, Kumar Introduction to Data Mining
2 What is (not) Data Mining? What is not Data Mining? What is Data Mining? Look up phone number in phone directory Query a Web search engine for information about "Amazon" Certain names are more prevalent in certain US locations (O'Brien, O'Rurke, O'Reilly in Boston area) Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) 3 CS 795/895 - Spring Weigle Slides Tan,Steinbach, Kumar Introduction to Data Mining Data Mining Tasks Prediction Methods Use some variables to predict unknown or future values of other variables. Ex: classification, regression, deviation detection Description Methods Find human-interpretable patterns that describe the data. Ex: clustering, association rule discovery, sequential pattern discovery From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, CS 795/895 - Spring Weigle Slides Tan,Steinbach, Kumar Introduction to Data Mining
3 Data Mining with WEKA Following slides are based on IBM developerworks articles by Michael Abernethy Part 1: Introduction and regression Part 2: Classification and clustering Part 3: Nearest neighbor and server-side library Explains the basics and shows examples using WEKA should be sufficient for our purposes for more details, take a Data Mining course or see Introduction to Data Mining by Tan, Steinbach, and Kumar 5 CS 795/895 - Spring Weigle Abernethy, "Data Mining with What is Data Mining? Transformation of large amount of data into meaningful patterns and rules directed trying to predict a particular data point undirected trying to create groups of data, or find patterns in existing data Ultimate goal is to create a model major step is determining what technique to use 6 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
4 Comparison of Techniques Data: BMW dealership information about each person who purchased a BMW, looked at a BMW, and browsed the BMW showroom Regression "How much should we charge for the new BMW M5?" Classification "How likely is person X to buy the newest BMW M5?" Clustering "What ages groups like the silver BMW M5?" Nearest neighbor "When people purchase the BMW M5, what other options do they tend to buy at the same time?" 7 CS 795/895 - Spring Weigle Abernethy, "Data Mining with What is WEKA? Waikato Environment for Knowledge Analysis First implemented in 1997 GPL (so it's free) Written in Java Very powerful data mining software 8 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
5 WEKA Examples Install and start WEKA article uses version newest version is All examples use the "Explorer" application Data files are available for download at the end of each IBM article 9 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Data Mining with WEKA Part 1: Introduction and regression Part 2: Classification and clustering Part 3: Nearest neighbor and server-side library 10 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
6 Regression Easiest technique but also least powerful Takes a number of independent variables that produce a result - a dependent variable Regression model is used to predict the result of an unknown dependent variable, given the values of the independent variables 11 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Regression Example - Pricing a House Independent variables square footage, size of the lot, granite in the kitchen, bathrooms upgraded, etc. Dependent variable house price 12 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
7 Example Loading Data into WEKA WEKA's preferred format is Attribute-Relation File Format (ARFF) define each column and data type regression - limited to NUMERIC or DATE supply each row of data in comma-delimited form 13 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Regression Example - Pricing a House House size Upgraded Selling (square feet) Lot size Bedrooms Granite bathroom price $205, $224, $197, $189, $195, $325, $230, housesize lotsize bedrooms granite bathroom sellingprice 3529,9191,6,0,0, ,10061,5,1,1, ,10150,5,0,1, ,14156,4,1,0, ,9600,4,0,1, ,19994,6,1,1, ,9365,5,0,1, CS 795/895 - Spring Weigle Abernethy, "Data Mining with
8 Example - House Loading Data into WEKA Preprocess tab Open file houses.arff Explore the data by choosing attributes and/or Visualize All 15 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Example - House Create the Model Classify tab Choose button Expand the functions branch Select LinearRegression note "SimpleLinearRegression" only looks at one variable Test options Use training set - use the data set we supplied Supplied test set - different set of data Cross-validation - use subsets of supplied data and average them out for final model Percentage split - use percentage of supplied data Choose (Num) sellingprice as dependent variable Start 16 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
9 Example - House Interpreting the Model sellingprice = ( * 3198) + ( * 9669) + ( * 5) + ( * 1) sellingprice = 219, CS 795/895 - Spring Weigle Abernethy, "Data Mining with Example Visualize Tab 18 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
10 Example - House Observations sellingprice = ( * housesize) + ( * lotsize) + ( * bedrooms) + ( * bathroom) Granite doesn't matter isn't used in the model Bathrooms do matter Bigger houses reduce the value but, house size isn't an independent variable not a perfect model 19 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Regression Example - Cars Classic dataset of vehicles produced often used for parallel coordinates examples 398 rows of data Independent variables cylinders, displacement, horsepower, weight, acceleration, model year, origin, car make Dependent variable miles per gallon (MPG) - aka class 20 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
11 Regression More Information Keywords to search for: least squares homoscedasticity White tests Lilliefors tests R-squared p-values 21 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Data Mining with WEKA Part 1: Introduction and regression Part 2: Classification and clustering Part 3: Nearest neighbor and server-side library 22 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
12 Classification Creates a step-by-step guide for how to determine the output of a new data instance aka classification trees or decision trees Creates a tree where each node represents a spot where a decision must be made based on the input want the tree to be as simple as possible with as few nodes and leaves as possible The model can be used for any unknown data instance 23 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Classification Training Set Data set with known output values used to build the model Take an entire training set and divide it into two parts: 60-80% - in training set, used to create model remaining - in test set, used to test the accuracy of the model overfitting - if you give too much data to the model, the model will be created perfectly, but just for that set of data 24 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
13 Classification Confusion Matrix false positive - data instance where the model predicts it should be positive, but the actual value is negative false negative - data instance where the model predicts it should be negative, but the actual value is positive Impact of false positive and false negative are not always the same Ex: spam - A false positive (real marked as spam) is more damaging than a false negative (spam marked as not spam) 25 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Classification Example - BMW Use data set from BMW dealership Goal: try to push a two-year extended warranty to its past customers Attributes: income bracket year/month first BMW bought year/month most recent BMW bought whether they responded to extended warranty in the past 26 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
14 Example - BMW Accuracy Precision fraction of retrieved instances that are relevant Recall fraction of relevant instances that are retrieved F-Measure combines precision and recall harmonic mean of precision and recall 2 * (precision * recall) / (precision + recall) relevant red - errors not relevant 27 CS 795/895 - Spring Weigle wikipedia - "Precision and recall" Example - BMW Validation Run the test set through the model bmw-test.arff Correctly Classified Instances training set % test set % Pretty close (though still not great) hmmm, maybe classification isn't the best method for this data 28 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
15 Clustering Make groups of data to determine patterns from the data Advantages when the data set is defined and a general pattern needs to be determined Every attribute in the data set will be used to analyze the data Disadvantage - need to know in advance how many groups to create 29 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Clustering Basic Math Every attribute in data set is normalized Given the number of desired clusters, randomly select that number of samples from the data set to serve as initial test cluster centers Compute distance from each data sample to the cluster center Assign each data row into a cluster, based on min distance Compute the centroid, average of each column of data using only the members of each cluster Calculate the distance from each data sample to the centroids. If clusters and cluster members don't change, done! 30 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
16 Clustering Example - BMW Use data set from BMW dealership Kept track of how people walk through the dealership and showroom, what cars they look at, how often they make purchases 100 rows of data Each column describes the steps that customers reached in their BMW experience 31 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Example - BMW Clusters Cluster 0 - "Dreamers" wander around dealership, don't purchase anything Cluster 1 - "M5 Lovers" walk straight to M5s, not a high purchase rate Cluster 2 - "Throw-Aways" small group, not statistically relevant Cluster 3 - "BMW Babies" always end up purchasing a car and always end up financing it walk around, then turn to computer search at the dealership, always buys M5 or Z4 Cluster 4 - "Starting Out With BMW" always look at 3-series, never more expensive M5 walk to showroom, not lot only 32% ultimately finish transaction 32 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
17 Clustering More Information Keywords to search for: Lloyd's algorithm Manhattan Distance Chebyshev Distance sum of squared errors cluster centroids 33 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Data Mining with WEKA Part 1: Introduction and regression Part 2: Classification and clustering Part 3: Nearest neighbor and server-side library 34 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
18 Nearest Neighbor aka collaborative filtering or instance-based learning Use past data instances, with known output values, to predict an unknown output value of a new data instance Different from regression as regression can only be used for numerical outputs 35 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Nearest Neighbor Basic Math Taking the unknown data point, the distance between it and every known data point is computed Algorithm can be expanded beyond the closest match to include any number of closest matches n-nearest neighbors Can also be used to predict a Yes/No output How many neighbors to use? need to experiment 36 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
19 Nearest Neighbor Example - BMW Use data set from BMW dealership Goal: try to push a two-year extended warranty to its past customers 4,500 past sales of extended warranties Attributes: income bracket year/month first BMW bought year/month most recent BMW bought whether they responded to extended warranty in the past 37 CS 795/895 - Spring Weigle Abernethy, "Data Mining with Nearest Neighbor More Information Keywords to search for: distance weighting Hamming distance Mahalanobis distance 38 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
20 Remember Data mining models aren't always simple inputoutput mechanisms Data must be examined to determine the right model to choose Output must be analyzed and accurate before you're ready to move on Server-Side WEKA - We won't cover this, but article 3 introduces how to use the WEKA API for Java. 39 CS 795/895 - Spring Weigle Abernethy, "Data Mining with
INTRODUCTION TO DATA MINING
INTRODUCTION TO DATA MINING 1 Chiara Renso KDDLab - ISTI CNR, Italy http://www-kdd.isti.cnr.it email: chiara.renso@isti.cnr.it Knowledge Discovery and Data Mining Laboratory, ISTI National Research Council,
More informationStatistical Learning and Data Mining CS 363D/ SSC 358
Statistical Learning and Data Mining CS 363D/ SSC 358! Lecture: Introduction Pradeep Ravikumar pradeepr@cs.utexas.edu What is this course about (in 1 minute) Big Data Data Mining, Statistical Learning
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.
More informationCS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008
CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof. Ruiz Problem
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationAn Introduction to Data Mining BY:GAGAN DEEP KAUSHAL
An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL Trends leading to Data Flood More data is generated: Bank, telecom, other business transactions... Scientific Data: astronomy, biology, etc Web, text,
More informationData Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationTopic 1 Classification Alternatives
Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationThanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a
Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently
More informationData Mining. Lecture 03: Nearest Neighbor Learning
Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationData Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats
Data Mining Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka I. Data sets I.1. Data sets characteristics and formats The data to be processed can be structured (e.g. data matrix,
More informationCS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering
W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationDUE By 11:59 PM on Thursday March 15 via make turnitin on acad. The standard 10% per day deduction for late assignments applies.
CSC 558 Data Mining and Predictive Analytics II, Spring 2018 Dr. Dale E. Parson, Assignment 2, Classification of audio data samples from assignment 1 for predicting numeric white-noise amplification level
More informationRecommender Systems 6CCS3WSN-7CCSMWAL
Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:
More informationLecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy
Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples
More informationCOMP s1 - Getting started with the Weka Machine Learning Toolkit
COMP9417 16s1 - Getting started with the Weka Machine Learning Toolkit Last revision: Thu Mar 16 2016 1 Aims This introduction is the starting point for Assignment 1, which requires the use of the Weka
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationData Preprocessing. Supervised Learning
Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationMIS2502: Data Analytics Clustering and Segmentation. Jing Gong
MIS2502: Data Analytics Clustering and Segmentation Jing Gong gong@temple.edu http://community.mis.temple.edu/gong What is Cluster Analysis? Grouping data so that elements in a group will be Similar (or
More informationWEKA homepage.
WEKA homepage http://www.cs.waikato.ac.nz/ml/weka/ Data mining software written in Java (distributed under the GNU Public License). Used for research, education, and applications. Comprehensive set of
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationK-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
K-Nearest Neighbors Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine:
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationMachine Learning in Python. Rohith Mohan GradQuant Spring 2018
Machine Learning in Python Rohith Mohan GradQuant Spring 2018 What is Machine Learning? https://twitter.com/myusuf3/status/995425049170489344 Traditional Programming Data Computer Program Output Getting
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationNotes based on: Data Mining for Business Intelligence
Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction
More informationINTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING
CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationCS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1 Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development
More informationECO375 Tutorial 1 Introduction to Stata
ECO375 Tutorial 1 Introduction to Stata Matt Tudball University of Toronto Mississauga September 14, 2017 Matt Tudball (University of Toronto) ECO375H5 September 14, 2017 1 / 25 What Is Stata? Stata is
More informationShort instructions on using Weka
Short instructions on using Weka G. Marcou 1 Weka is a free open source data mining software, based on a Java data mining library. Free alternatives to Weka exist as for instance R and Orange. The current
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationIntroduction to Data Science
Introduction to Data Science CS 491, DES 430, IE 444, ME 444, MKTG 477 UIC Innovation Center Fall 2017 and Spring 2018 Instructors: Charles Frisbie, Marco Susani, Michael Scott and Ugo Buy Author: Ugo
More informationFunction Algorithms: Linear Regression, Logistic Regression
CS 4510/9010: Applied Machine Learning 1 Function Algorithms: Linear Regression, Logistic Regression Paula Matuszek Fall, 2016 Some of these slides originated from Andrew Moore Tutorials, at http://www.cs.cmu.edu/~awm/tutorials.html
More informationBusiness Analytics and Big Data: the process and the tools
Business Analytics and Big Data: the process and the tools Mehmet Gençer Assoc.Prof., Organization Studies & Computer Engineering mehmetgencer@yahoo.com mehmet.gencer@ieu.edu.tr https://mgencer.com How
More informationCake and Grief Counseling Will be Available: Using Artificial Intelligence for Forensics Without Jeopardizing Humanity.
Cake and Grief Counseling Will be Available: Using Artificial Intelligence for Forensics Without Jeopardizing Humanity Jesse Kornblum Outline Introduction Artificial Intelligence Spam Detection Clustering
More informationOutline. Prepare the data Classification and regression Clustering Association rules Graphic user interface
Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other
More informationPart 12: Advanced Topics in Collaborative Filtering. Francesco Ricci
Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules
More informationJeff Howbert Introduction to Machine Learning Winter
Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d
More informationData Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University
Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationFrequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values
Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used
More informationMidterm Examination CS 540-2: Introduction to Artificial Intelligence
Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State
More informationData Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.
Data Mining Chapter 4. Algorithms: The Basic Methods (Covering algorithm, Association rule, Linear models, Instance-based learning, Clustering) 1 Covering approach At each stage you identify a rule that
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationSeeing the Big Picture
Seeing the Big Picture Segmenting Images to Create Data 15.071x The Analytics Edge Image Segmentation Divide up digital images to salient regions/clusters corresponding to individual surfaces, objects,
More informationCSE 446 Bias-Variance & Naïve Bayes
CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework
More informationData Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.
Data Mining Lesson 9 Support Vector Machines MSc in Computer Science University of New York Tirana Assoc. Prof. Dr. Marenglen Biba Data Mining: Content Introduction to data mining and machine learning
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationData Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2012 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationFormal Methods of Software Design, Eric Hehner, segment 1 page 1 out of 5
Formal Methods of Software Design, Eric Hehner, segment 1 page 1 out of 5 [talking head] Formal Methods of Software Engineering means the use of mathematics as an aid to writing programs. Before we can
More informationGENERAL MATH FOR PASSING
GENERAL MATH FOR PASSING Your math and problem solving skills will be a key element in achieving a passing score on your exam. It will be necessary to brush up on your math and problem solving skills.
More information2012 Fall, CENG 514 Data Mining, Homework 3 Key by Dilek Önal
2012 Fall, CENG 514 Data Mining, Homework 3 Key by Dilek Önal SOLUTIONS Task 1 (Data conversion 15 points, Weka commands 10 points = 25 points) You should have implemented a piece of code which converts
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationUnderstanding Rule Behavior through Apriori Algorithm over Social Network Data
Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate
More informationPractical Data Mining COMP-321B. Tutorial 5: Article Identification
Practical Data Mining COMP-321B Tutorial 5: Article Identification Shevaun Ryan Mark Hall August 15, 2006 c 2006 University of Waikato 1 Introduction This tutorial will focus on text mining, using text
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationIntroduction to Data Mining CS 584 Data Mining (Fall 2016)
Introduction to Data Mining CS 584 Data Mining (Fall 2016) Huzefa Rangwala AssociateProfessor, Computer Science George Mason University Email: rangwala@cs.gmu.edu Website: www.cs.gmu.edu/~hrangwal Slides
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationDATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI
DATA ANALYSIS WITH WEKA Author: Nagamani Mutteni Asst.Professor MERI Topic: Data Analysis with Weka Course Duration: 2 Months Objective: Everybody talks about Data Mining and Big Data nowadays. Weka is
More informationWEKA Explorer User Guide for Version 3-4
WEKA Explorer User Guide for Version 3-4 Richard Kirkby Eibe Frank July 28, 2010 c 2002-2010 University of Waikato This guide is licensed under the GNU General Public License version 2. More information
More informationCase Study: SAP BW Data Mining (Association Analysis)
Case Study: SAP BW Data Mining (Association Analysis) Product SAP Netweaver Release 2004s Level Undergraduate Focus BW Data Mining Author Paul Hawking Robert Jovanovic Version 1.0 MOTIVATION The management
More informationPredictive Analysis: Evaluation and Experimentation. Heejun Kim
Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training
More informationData Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44
Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software
More informationData Science Essentials
Data Science Essentials Lab 6 Introduction to Machine Learning Overview In this lab, you will use Azure Machine Learning to train, evaluate, and publish a classification model, a regression model, and
More informationKnowledge Discovery. URL - Spring 2018 CS - MIA 1/22
Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More information