10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,
|
|
- Wilfred Gilbert
- 5 years ago
- Views:
Transcription
1 10/5/2017 MIST.6060 Business Intelligence and Data Mining 1 Distance Measures Nearest Neighbors In a p-dimensional space, the Euclidean distance between two records, a = a, a,..., a ) and b = b, b,..., b ), is defined as: ( 1 2 p ( 1 2 p d (, b) = ( a1 b1 ) + ( a2 b2 ) + + ( a p bp ) a. It is not necessary to perform square root operation if the purpose is to compare distances. The Euclidean distance is typically calculated based on normalized values. The Euclidean distance measure implicitly assumes data are numeric. When applied to categorical data, the difference between two categorical values is defined as zero if they are the same, and one otherwise. Example: Consider two customer records, a and b, in a customer dataset. Suppose attribute 1 is gender. If a 1 = b1 (both customers are male or both are female), then a b 0; otherwise (one is male and the other is female), then a b 1 [also, 1 1 = 2 1 b1 ) = ( a 1]. 1 1 = K-Nearest Neighbors (k-nn) Method 0. Input: an integer value k. 1. To classify a new record, find the nearest k neighboring records in the training set, based on a distance measure (e.g., normalized Euclidean distance). 2. Classify the record as a member of the majority class of the k neighbors. If the problem is to predict a numeric value of an outcome variable, then take the average outcome value of the k neighbors as the predicted value. Drawback of K-Nearest Neighbors The k-nn does not provide explicit structures or models for classification or learning.
2 10/5/2017 MIST.6060 Business Intelligence and Data Mining 2 An Illustrative Example College Admission The dataset includes 24 college application records (rows 2 25 below), with 2 predictors, GPA and SAT, and a class attribute, Accept? (with 2 classes: yes, no). Let us use the record with {GPA = 3.32, SAT = 2060} (highlighted) as a validation record. That is, the validation set has only this record while the training set includes the other 23 records. We first calculate the Euclidean distances (without taking square root) between this validation record and each of the 23 training records, based on the normalized GPA and SAT values. Normalized values are shown in columns D and E, and the distances are shown in column F. Their Excel calculations are shown in the Formula sheet below. If k = 1, the nearest neighbor is the one right below this validation record (with normalized distance = ), which has a no value. Therefore, the validation record is classified as no. But we know the actual class of this record is yes ; so it is misclassified by 1-NN. If k = 3, the 3 nearest neighbors are indicated in column H, which include 2 yes s and 1 no. By the majority rule, the validation record is classified as yes. Therefore, 3-NN correctly classifies the record.
3 10/5/2017 MIST.6060 Business Intelligence and Data Mining 3
4 10/5/2017 MIST.6060 Business Intelligence and Data Mining 4 The scatter plot shows graphically how k-nn works in this example. The solid-lined loop shows 1-NN result, and the dash-lined loop shows the 3-NN result. Note that although the chart is plotted using the original values, the axis scales of the chart are adjusted so that it is very close to a square. In this sense, the values are approximately normalized. The k-nn works in the same way when applied to a new record. The only difference is, of course, the true class of the new record is unknown. Choice of k Run k-nn multiple times, each using a different k value. Choose the k with the lowest validation error rate for future classification of any new record. The above procedure is computationally very expensive and is practically prohibitive for large data. There are many approaches to reduce the computational cost; see pages of the WFHP book for more detail (not required). K-Nearest Neighbors in Weka The Admission.arff GPA SAT Accept {yes,no} % numeric attribute specification % numeric attribute specification % categorical attribute 2.83, 1910, yes 3.43, 1760, yes 2.94, 2210, yes 2.87, 2140, yes 3.46, 2400, yes 4.00, 1990, yes 3.95, 1840, yes 3.36, 2290, yes 3.04, 2060, yes 3.60, 2140, yes 2.62, 2250, yes 3.32, 2060, yes 3.18, 2030, no 2.66, 2140, no 2.94, 1800, no 2.44, 2100, no 3.39, 1840, no 2.58, 1840, no 2.82, 1690, no 2.97, 1910, no 2.54, 1730, no 2.20, 1950, no 2.62, 1500, no 2.90, 1580, no
5 10/5/2017 MIST.6060 Business Intelligence and Data Mining 5 1. Click Open file, find and open the Admission.arff file. By default, the last attribute is the class attribute. 2. Click Classify / Choose / lazy / IBk. The default is 1-NN. Click Start. The output results show that the total validation error rate is 9/24 = 37.5%.
6 10/5/2017 MIST.6060 Business Intelligence and Data Mining 6 3. Next, let s try k = 3. Click the long horizontal box on the right of the Choose button. A pop-up weak.gui.genericobjecteditor appears. Enter 3 for the KNN box. Click OK. 4. Click Start to get the results. The total validation error rate with 3-NN s 6/24 = 25%.
7 10/5/2017 MIST.6060 Business Intelligence and Data Mining 7 K-Nearest Neighbors in R R commands: > data <- read.table("c:/courses/mist.6060(63.755)/datasets/admission.csv", sep=',', header=true) > x <- data[, 1:2] > y <- data[1:23, 3] > normgpa <- (x[,1] - min(x[,1])) / (max(x[,1]) - min(x[,1])) > normsat <- (x[,2] - min(x[,2])) / (max(x[,2]) - min(x[,2])) > normx <- cbind(normgpa, normsat) > trainx <- normx[1:23, ] > testx <- normx[24, ] > library(class) > knn(trainx, testx, y, k=1) > knn(trainx, testx, y, k=3) R commands with results: > data <- read.table("c:/courses/mist.6060(63.755)/datasets/admission.csv", sep=',', header=true) > data GPA SAT Accept yes yes yes yes yes yes yes yes yes yes yes no no no no no no no no no no no no yes
8 10/5/2017 MIST.6060 Business Intelligence and Data Mining 8 > x <- data[, 1:2] > x GPA SAT > y <- data[1:23, 3] > y [1] yes yes yes yes yes yes yes yes yes yes yes no no no no no no no no no no no no Levels: no yes > normgpa <- (x[,1] - min(x[,1])) / (max(x[,1]) - min(x[,1])) > normgpa [1] [21] > normsat <- (x[,2] - min(x[,2])) / (max(x[,2]) - min(x[,2])) > normsat [1] [19]
9 10/5/2017 MIST.6060 Business Intelligence and Data Mining 9 > normx <- cbind(normgpa, normsat) > normx normgpa normsat [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [21,] [22,] [23,] [24,] > trainx <- normx[1:23, ] > trainx normgpa normsat [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [21,] [22,] [23,] > testx <- normx[24, ] > testx normgpa normsat > > library(class) > knn(trainx, testx, y, k=1) [1] no Levels: no yes > knn(trainx, testx, y, k=3) [1] yes Levels: no yes
k-nn classification with R QMMA
k-nn classification with R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l1-knn-eng.html#(1) 1/16 HW (Height and weight) of adults Statistics
More information11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records
11/2/2017 MIST.6060 Business Intelligence and Data Mining 1 An Example Clustering X 2 X 1 Objective of Clustering The objective of clustering is to group the data into clusters such that the records within
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationData Preprocessing. Supervised Learning
Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationThree-Dimensional (Surface) Plots
Three-Dimensional (Surface) Plots Creating a Data Array 3-Dimensional plots (surface plots) are often useful for visualizing the behavior of functions and identifying important mathematical/physical features
More informationComparison of Linear Regression with K-Nearest Neighbors
Comparison of Linear Regression with K-Nearest Neighbors Rebecca C. Steorts, Duke University STA 325, Chapter 3.5 ISL Agenda Intro to KNN Comparison of KNN and Linear Regression K-Nearest Neighbors vs
More informationIntro to Artificial Intelligence
Intro to Artificial Intelligence Ahmed Sallam { Lecture 5: Machine Learning ://. } ://.. 2 Review Probabilistic inference Enumeration Approximate inference 3 Today What is machine learning? Supervised
More informationData Mining and Data Warehousing Classification-Lazy Learners
Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationK-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing
K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela
More informationSan Francisco State University
San Francisco State University Michael Bar Instructions for Excel 1. Plotting analytical function. 2 Suppose that you need to plot the graph of a function f ( x) = x on the interval [ 5,5]. Step 1: make
More informationPivotTables & Charts for Health
PivotTables & Charts for Health Data Inputs PivotTables Pivot Charts Global Strategic Information UCSF Global Health Sciences Version Malaria 1.0 1 Table of Contents 1.1. Introduction... 3 1.1.1. Software
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More informationData Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers
More informationHow to use Excel Spreadsheets for Graphing
How to use Excel Spreadsheets for Graphing 1. Click on the Excel Program on the Desktop 2. You will notice that a screen similar to the above screen comes up. A spreadsheet is divided into Columns (A,
More informationAssignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationOutline. Prepare the data Classification and regression Clustering Association rules Graphic user interface
Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other
More informationCHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM
CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the
More information6.034 Design Assignment 2
6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment
More informationGraphical Analysis of Data using Microsoft Excel [2016 Version]
Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationMIPE: Model Informing Probability of Eradication of non-indigenous aquatic species. User Manual. Version 2.4
MIPE: Model Informing Probability of Eradication of non-indigenous aquatic species User Manual Version 2.4 March 2014 1 Table of content Introduction 3 Installation 3 Using MIPE 3 Case study data 3 Input
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationData Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners
Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager
More informationExcel 2010 with XLSTAT
Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with
More informationMachine Learning - Clustering. CS102 Fall 2017
Machine Learning - Fall 2017 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for
More informationProject 11 Graphs (Using MS Excel Version )
Project 11 Graphs (Using MS Excel Version 2007-10) Purpose: To review the types of graphs, and use MS Excel 2010 to create them from a dataset. Outline: You will be provided with several datasets and will
More informationIntroduction to Excel Workshop
Introduction to Excel Workshop Empirical Reasoning Center June 6, 2016 1 Important Terminology 1. Rows are identified by numbers. 2. Columns are identified by letters. 3. Cells are identified by the row-column
More information7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015
Foundations of Machine Learning École Centrale Paris Fall 2015 7. Nearest neighbors Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr Learning
More informationNearest Neighbor Predictors
Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,
More informationRockefeller College MPA Excel Workshop: Clinton Impeachment Data Example
Rockefeller College MPA Excel Workshop: Clinton Impeachment Data Example This exercise is a follow-up to the MPA admissions example used in the Excel Workshop. This document contains detailed solutions
More informationClassification of Hand-Written Numeric Digits
Classification of Hand-Written Numeric Digits Nyssa Aragon, William Lane, Fan Zhang December 12, 2013 1 Objective The specific hand-written recognition application that this project is emphasizing is reading
More informationExcel Functions & Tables
Excel Functions & Tables SPRING 2016 Spring 2016 CS130 - EXCEL FUNCTIONS & TABLES 1 Review of Functions Quick Mathematics Review As it turns out, some of the most important mathematics for this course
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationClassifica(on and Clustering with WEKA. Classifica*on and Clustering with WEKA
Classifica(on and Clustering with WEKA 1 Schedule: Classifica(on and Clustering with WEKA 1. Presentation of WEKA. 2. Your turn: perform classification and clustering. 2 WEKA Weka is a collec*on of machine
More information7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech
Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More informationClassification Key Concepts
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Parishit
More informationMathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul
Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we
More informationLecture 3. Oct
Lecture 3 Oct 3 2008 Review of last lecture A supervised learning example spam filter, and the design choices one need to make for this problem use bag-of-words to represent emails linear functions as
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationSelect the Points You ll Use. Tech Assignment: Find a Quadratic Function for College Costs
In this technology assignment, you will find a quadratic function that passes through three of the points on each of the scatter plots you created in an earlier technology assignment. You will need the
More informationGlobal Journal of Engineering Science and Research Management
A NOVEL HYBRID APPROACH FOR PREDICTION OF MISSING VALUES IN NUMERIC DATASET V.B.Kamble* 1, S.N.Deshmukh 2 * 1 Department of Computer Science and Engineering, P.E.S. College of Engineering, Aurangabad.
More informationComparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*
Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini* #Student, Department of Computer Engineering, Punjabi university Patiala, India, aikjotnarula@gmail.com
More informationClassification Key Concepts
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Classification Key Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech 1 How will
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.
More informationPlease consider the environment before printing this tutorial. Printing is usually a waste.
Ortiz 1 ESCI 1101 Excel Tutorial Fall 2011 Please consider the environment before printing this tutorial. Printing is usually a waste. Many times when doing research, the graphical representation of analyzed
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationData Mining. Lecture 03: Nearest Neighbor Learning
Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost
More informationNearest Neighbor Classification. Machine Learning Fall 2017
Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationINTRODUCTION TO SPSS OUTLINE 6/17/2013. Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN Phone:
INTRODUCTION TO SPSS Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN-0-024 Phone: 89287269 E-mail: mujibur@uniten.edu.my OUTLINE About the four-windows in SPSS The basics of managing data files The basic
More informationReference Services Division Presents. Excel Introductory Course
Reference Services Division Presents Excel 2007 Introductory Course OBJECTIVES: Navigate Comfortably in the Excel Environment Create a basic spreadsheet Learn how to format the cells and text Apply a simple
More informationKNIME Enalos+ Molecular Descriptor nodes
KNIME Enalos+ Molecular Descriptor nodes A Brief Tutorial Novamechanics Ltd Contact: info@novamechanics.com Version 1, June 2017 Table of Contents Introduction... 1 Step 1-Workbench overview... 1 Step
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More information6.034 Quiz 2, Spring 2005
6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13
More informationINSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING
APPENDIX INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING (Developed by Dr. Dale Vogelien, Kennesaw State University) ** For a good review of basic
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationIntroduction to Excel Workshop
Introduction to Excel Workshop Empirical Reasoning Center September 9, 2016 1 Important Terminology 1. Rows are identified by numbers. 2. Columns are identified by letters. 3. Cells are identified by the
More informationMake sure to keep all graphs in same excel file as your measures.
Project Part 2 Graphs. I. Use Excel to make bar graph for questions 1, and 5. II. Use Excel to make histograms for questions 2, and 3. III. Use Excel to make pie graphs for questions 4, and 6. IV. Use
More informationCS 584 Data Mining. Classification 1
CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for
More informationCP365 Artificial Intelligence
CP365 Artificial Intelligence Example Problem Problem: Does a given image contain cats? Input vector: RGB/BW pixels of the image. Output: Yes or No. Example Problem Problem: What category is a news story?
More informationTopic 1 Classification Alternatives
Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent
More informationQuestion 1: knn classification [100 points]
CS 540: Introduction to Artificial Intelligence Homework Assignment # 8 Assigned: 11/13 Due: 11/20 before class Question 1: knn classification [100 points] For this problem, you will be building a k-nn
More informationIntroduction to Minitab 1
Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,
More information9 POINTS TO A GOOD LINE GRAPH
NAME: PD: DATE: 9 POINTS TO A GOOD LINE GRAPH - 2013 1. Independent Variable on the HORIZONTAL (X) AXIS RANGE DIVIDED BY SPACES and round up to nearest usable number to spread out across the paper. LABELED
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationDepartment of Chemical Engineering ChE-101: Approaches to Chemical Engineering Problem Solving Excel Tutorial VIII
Department of Chemical Engineering ChE-101: Approaches to Chemical Engineering Problem Solving Excel Tutorial VIII EXCEL Basics (last updated 4/12/06 by GGB) Objectives: These tutorials are designed to
More informationTechnology Assignment: Scatter Plots
The goal of this assignment is to create a scatter plot of a set of data. You could do this with any two columns of data, but for demonstration purposes we ll work with the data in the table below. You
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationKeywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization
GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES APPLICATION OF CLASSIFICATION TECHNIQUES TO DETECT HYPERTENSIVE HEART DISEASE Tulasimala B. N* 1, Elakkiya S 2 & Keerthana N 3 *1 Assistant Professor,
More information9 Classification: KNN and SVM
CSE4334/5334 Data Mining 9 Classification: KNN and SVM Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2017 (Slides courtesy of Pang-Ning Tan, Michael Steinbach
More information3. EXCEL FORMULAS & TABLES
Winter 2019 CS130 - Excel Formulas & Tables 1 3. EXCEL FORMULAS & TABLES Winter 2019 Winter 2019 CS130 - Excel Formulas & Tables 2 Cell References Absolute reference - refer to cells by their fixed position.
More informationFall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points)
Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points) Name: This exam is open book and notes. You can use a calculator but no laptops, cell phones, nor other electronic devices are allowed.
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationExcel Functions & Tables
Excel Functions & Tables Winter 2012 Winter 2012 CS130 - Excel Functions & Tables 1 Review of Functions Quick Mathematics Review As it turns out, some of the most important mathematics for this course
More informationData Analysis Guidelines
Data Analysis Guidelines DESCRIPTIVE STATISTICS Standard Deviation Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated using a formula
More informationDUE By 11:59 PM on Thursday March 15 via make turnitin on acad. The standard 10% per day deduction for late assignments applies.
CSC 558 Data Mining and Predictive Analytics II, Spring 2018 Dr. Dale E. Parson, Assignment 2, Classification of audio data samples from assignment 1 for predicting numeric white-noise amplification level
More informationHomework 1 Excel Basics
Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationPivot Tables, Lookup Tables and Scenarios
Introduction Format and manipulate data using pivot tables. Using a grading sheet as and example you will be shown how to set up and use lookup tables and scenarios. Contents Introduction Contents Pivot
More informationPIVOT TABLES IN MICROSOFT EXCEL 2016
PIVOT TABLES IN MICROSOFT EXCEL 2016 A pivot table is a powerful tool that allows you to take a long list of data and transform it into a more compact and readable table. In the process, the tool allows
More informationData Management Project Using Software to Carry Out Data Analysis Tasks
Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min
More informationIntroduction to StatKey Getting Data Into StatKey
Introduction to StatKey 2016-17 03. Getting Data Into StatKey Introduction This handout assumes that you do not want to type in the data by hand. This handout shows you how to use Excel and cut and paste
More informationMachine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio
Machine Learning nearest neighbors classification Luigi Cerulo Department of Science and Technology University of Sannio Nearest Neighbors Classification The idea is based on the hypothesis that things
More informationSummer Packet 7 th into 8 th grade. Name. Integer Operations = 2. (-7)(6)(-4) = = = = 6.
Integer Operations Name Adding Integers If the signs are the same, add the numbers and keep the sign. 7 + 9 = 16 - + -6 = -8 If the signs are different, find the difference between the numbers and keep
More informationIn this IBM Watson User Guide, you will create dashboards and utilitize the following capabilities: Exploring, Predicting, and Collecting.
May 13, 2016: IBM Watson User Guide In this IBM Watson User Guide, you will create dashboards and utilitize the following capabilities: Exploring, Predicting, and Collecting. Contents Administration:...
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationKINETICS CALCS AND GRAPHS INSTRUCTIONS
KINETICS CALCS AND GRAPHS INSTRUCTIONS 1. Open a new Excel or Google Sheets document. I will be using Google Sheets for this tutorial, but Excel is nearly the same. 2. Enter headings across the top as
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationR (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.
Assignment No. 4 Title: SD Module- Data Science with R Program R (2) C (4) V (2) T (2) Total (10) Dated Sign Data analysis case study using R for readily available data set using any one machine learning
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationExcel 2013 Charts and Graphs
Excel 2013 Charts and Graphs Copyright 2016 Faculty and Staff Training, West Chester University. A member of the Pennsylvania State System of Higher Education. No portion of this document may be reproduced
More informationMATH 117 Statistical Methods for Management I Chapter Two
Jubail University College MATH 117 Statistical Methods for Management I Chapter Two There are a wide variety of ways to summarize, organize, and present data: I. Tables 1. Distribution Table (Categorical
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More information