Forgot to compute the new centroids (-1); error in centroid computations (-1); incorrect clustering results (-2 points); more than 2 errors: 0 points.

Similar documents
A Robust Sign Language Recognition System with Sparsely Labeled Instances Using Wi-Fi Signals

Nearest Neighbor Learning

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

Automatic Grouping for Social Networks CS229 Project Report

Language Identification for Texts Written in Transliteration

Sensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space

Research of Classification based on Deep Neural Network

Neural Network Enhancement of the Los Alamos Force Deployment Estimator

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion.

A Petrel Plugin for Surface Modeling

Outline. Introduce yourself!! What is Machine Learning? What is CAP-5610 about? Class information and logistics

Backing-up Fuzzy Control of a Truck-trailer Equipped with a Kingpin Sliding Mechanism

SIMILAR objects are ubiquitous in both natural and artificial

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM

Image Segmentation Using Semi-Supervised k-means

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm

Distance Weighted Discrimination and Second Order Cone Programming

Further Concepts in Geometry

understood as processors that match AST patterns of the source language and translate them into patterns in the target language.

Computer Graphics. - Shading & Texturing -

Sparse Representation based Face Recognition with Limited Labeled Samples

Fast Methods for Kernel-based Text Analysis

Response Surface Model Updating for Nonlinear Structures

Stereo. CS 510 May 2 nd, 2014

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining

Navigating and searching theweb

Exact fan-beam image reconstruction algorithm for truncated projection data acquired from an asymmetric half-size detector

Diagnosing Breast Cancer with a Neural Network

RDF Objects 1. Alex Barnell Information Infrastructure Laboratory HP Laboratories Bristol HPL November 27 th, 2002*

Outerjoins, Constraints, Triggers

ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega

MACHINE learning techniques can, automatically,

Agent architectures. Francesco Amigoni

AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program?

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm

A Memory Grouping Method for Sharing Memory BIST Logic

Automatic Hidden Web Database Classification

A NEW APPROACH FOR BLOCK BASED STEGANALYSIS USING A MULTI-CLASSIFIER

Interpreting Individual Classifications of Hierarchical Networks

A probabilistic fuzzy method for emitter identification based on genetic algorithm

AUTOMATIC gender classification based on facial images

Chapter Multidimensional Direct Search Method

Searching, Sorting & Analysis

Shape-based Object Recognition in Videos Using 3D Synthetic Object Models

Reference trajectory tracking for a multi-dof robot arm

Functions. 6.1 Modular Programming. 6.2 Defining and Calling Functions. Gaddis: 6.1-5,7-10,13,15-16 and 7.7

Dipartimento di Elettronica, Informazione e Bioingegneria Robotics

Timing-Driven Hierarchical Global Routing with Wire-Sizing and Buffer-Insertion for VLSI with Multi-Routing-Layer

InnerSpec: Technical Report

Improvement of Nearest-Neighbor Classifiers via Support Vector Machines

Extracting semistructured data from the Web: An XQuery Based Approach

Chapter 5 Combinational ATPG

Semi-Supervised Learning with Sparse Distributed Representations

Relative Positioning from Model Indexing

Solutions to the Final Exam

TRANSFORMATIONS AND SYMMETRY

Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, January 1997, pp. 777{ Partial Matching of Planar Polylines

A HYBRID FEATURE SELECTION METHOD BASED ON FISHER SCORE AND GENETIC ALGORITHM

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Alternative Approach for Solving Bi-Level Programming Problems

Modification of Support Vector Machine for Microarray Data Analysis

Slide 1 Lecture 18 Copyright

Interpreting Individual Classifications of Hierarchical Networks

Neural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell

Complex Human Activity Searching in a Video Employing Negative Space Analysis

Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning : Clustering

Optimization and Application of Support Vector Machine Based on SVM Algorithm Parameters

GPU Implementation of Parallel SVM as Applied to Intrusion Detection System

Fuzzy Equivalence Relation Based Clustering and Its Use to Restructuring Websites Hyperlinks and Web Pages

Name Class Date. Exploring Reflections

Endoscopic Motion Compensation of High Speed Videoendoscopy

Online Learning for Hierarchical Networks of Locally Arranged Models using a Support Vector Domain Model

Mobile App Recommendation: Maximize the Total App Downloads

Hiding secrete data in compressed images using histogram analysis

Neural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell

Handling Outliers in Non-Blind Image Deconvolution

Collinearity and Coplanarity Constraints for Structure from Motion

Load Balancing by MPLS in Differentiated Services Networks

Generalizing the Dubins and Reeds-Shepp cars: fastest paths for bounded-velocity mobile robots

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code

CSE 5243 INTRO. TO DATA MINING

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds

Constellation Models for Recognition of Generic Objects

Evolving Role Definitions Through Permission Invocation Patterns

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory

Research on the overall optimization method of well pattern in water drive reservoirs

TRANSFORMATIONS AND SYMMETRY

Application of Image Fusion Techniques on Medical Images

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

A Fast Block Matching Algorithm Based on the Winner-Update Strategy

Computer Networks. College of Computing. Copyleft 2003~2018

Ganit Learning Guides. Basic Geometry-2. Polygons, Triangles, Quadrilaterals. Author: Raghu M.D.

Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval

Professor: Alvin Chao

数据挖掘 Introduction to Data Mining

M. Badent 1, E. Di Giacomo 2, G. Liotta 2

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Hardware Components Illustrated

Transcription:

Probem 1 a. K means is ony capabe of discovering shapes that are convex poygons [1] Cannot discover X shape because X is not convex. [1] DBSCAN can discover X shape. [1] b. K-means is prototype based and it doesn t use actua object in the dataset as centroids. PAM uses actua objects in the data set as representatives [2]; PAM uses an expicit objective function that is minimized whereas K-means minimizes an objective function impicity without ever ca it. [1] c. Od centroids: (3,3) (4,3) (5,5) New custers: C1 (3,3): (2,2), (0,3) C2 (4,3): (4,4), (8,0) C3 (5,5): (4,6), (5,5) New centroids: (1,5/2) (6,2) (9/2,11/2) Forgot to compute the new centroids (-1); error in centroid computations (-1); incorrect custering resuts (-2 points); more than 2 errors: 0 points. d. i. Reativey efficient: O(t*k*n*d), where n is # objects, k is # custers, and t is # iterations, d is the # dimensions. ii. Storage ony O(n) in contrast to other representative-based agorithms, ony computes distances between centroids and objects in the dataset, and not between objects in the dataset; therefore, the distance matrix does not need to be stored. iii. Easy to use; we studied iv. Finds oca minimum of the SSE fitness function. v. Impicity uses a fitness function Each strength worth 1 point. Maximum of 3 points.

Probem 2 a. 1 custer {A, C, D, E} Each core point worth 1 point b. B & F are outiers, since they are not core or border points [2] A is border point because it is within the neighbor of a core point, but has ony one other point in its - neighborhood [1] No partia credit here! c. If Eps= 10, then there wi be one custer containing a the objects. d. 1. Seect a random point P 2. If P is a core point, retrieve a points density reachabe from P with Epsion and Midpoints 3. If P is a core point, a custer is formed 4. Visit the next point P that is not in a custer yet; if there is no such point terminate! 5. Continue with step 2 Verba descriptions how custers are formed are aso fine e. The two points wi be in the same custer [2]

Probem 3 One possibe answer: Ignore SSN as it is not important. Normaize Rate using Z-score and find distance by L-1 norm We convert the Oph rating vaues good, medium, and poor : 2:0 using a function Find distance by taking L-1 norm and dividing by range i.e. 3 Assign weights 0.4 to Rate, 0.4 to Services and 0.2 to Oph use the Jaccard distance function for the services: d services(ser1,ser2)= 1- ( ser1 ser2 )/( ser1 ser2 ) Now: d(u,v) = 0.4* (u.rate)/40 (v.rate)/40 + 0.2* (u.oph) (v.oph) /2 + 0.4*d services(u.services, v.services) 2 customers: c1=(111111111, good, 120, {A,B,C}) and c2=(222222222, poor, 80, {C,D,E})! = 0.4[( 120-80 )/40] + 0.2*[ (2-0)/2] + 0.4*4/5= 0.3+4+0.2=0.92 If distance functions do not make much sense give 2 points or ess Distance functions are not defined propery [-4-7] One error [-2 to -3]; two errors [-5 to -6]; more than 2 errors at most 1 point!

Probem 4 a. The box represents IQR/distribution of data [1] The size of the box is a good estimation for the standard deviation/iqr of the dataset [1] The ine across the box indicates median [0.5]; it has a vaue of 6 [0.5] for the dataset The whiskers indicate the minimum and maximum in the data without the outiers [1] They have the vaue of 1 [0.5] and 18 [0.5] in the above box pot The circe above the uppers whisker indicates outier [0.5] The ocation of the box pot between whiskers tes us distribution of data/information about spread of data/ data is skewed [2] According to the boxpot the foowing attribute vaues are outiers: 22 [0.5] b. The two attributes have no inear reationship. [2] c. There is overap between Benign and Maignant casses, Norma is easy to identify. Norma has horizonta and vertica spread whereas Benign has a horizonta spread. Function 1 is usefu for cassifying Norma vs Function 2 is more usefu for cassifying Benign. The distribution is unimoda [Not mentioned this point: -2] Verba descriptions how custers are formed are aso fine d. Men are taer than women [Not mentioned this point: -2] It has no significant gaps The number of men is more than number of women Femae height is skewed towards right There are no outiers. Verba descriptions how custers are formed are aso fine

Probem 6 a. Goas: 1) Find representatives for homogeneous groups Data Compression 2) Find natura custers and describe their properties natura Data Types 3) Find suitabe and usefu grouping usefu Data Casses 4) Find unusua data object Outier Detection Cassification is supervised earning. Datasets consists of attributes and cass abes. Here goa is to predict casses from the attribute vaues[4] b. Cassification: Find a mode for cass attribute as a function of the vaues of other attributes Prediction: Use some variabes to predict unknown or future vaues of other variabes Association Rue Discovery: Given a set of records each of which contain some number of items from a given coection, produce dependency rues which wi predict occurrence of an item based on occurrences of other items Sequentia Pattern Discovery: Given is a set of objects, with each object associated with its own timeine of events, find rues that predict strong sequentia dependencies among different events. Regression: Predict a vaue of a given continuous vaued variabe based on the vaues of other variabes, assuming a inear or noninear mode of dependency Deviation Detection / Anomay Detection: Detect significant deviations from norma behavior Each task worth 1 point. Maximum of 7.5 points.