DATA MINING II - 1DL460

Size: px
Start display at page:

Download "DATA MINING II - 1DL460"

Transcription

1 DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity, Uppsala, Swede

2 Aomaly Detectio (Ta, Steibach, Kumar ch. 10) Kjell Orsbor Departmet of Iformatio Techology Uppsala Uiversity, Uppsala, Swede

3 What are a aomaly or outlier? What are aomalies/outliers? Sigle or sets of data poits that are cosiderably differet tha the remaider of the data (i.e. ormal data) E.g. uusual credit card purchase, sports: Usai Bolt, Leo Messi, Outliers are differet from the oise data Noise is radom error or variace i a measured variable Noise should be removed before outlier detectio Outliers are iterestig: It violates the mechaism that geerates the ormal data Outlier detectio vs. ovelty detectio: early stage, outlier; but later merged ito the model Applicatios: Credit card fraud detectio, telecommuicatio fraud detectio, etwork itrusio detectio, fault detectio, customer segmetatio, medical aalysis

4 Aomaly/outlier detectio Variats of aomaly/outlier detectio problems: Give a database D, fid all the data poits x D with aomaly scores greater tha some threshold t Give a database D, fid all the data poits x D havig the top- largest aomaly scores f(x) Give a database D, cotaiig mostly ormal (but ulabeled) data poits, ad a test poit x, compute the aomaly score of x with respect to D

5 Types of outliers (I) Three kids: global, cotextual ad collective outliers Global outlier (or poit aomaly) Object is O g if it sigificatly deviates from the rest of the data set Ex. Auditig stock tradig trasactios Issue: Fid a appropriate measuremet of deviatio Cotextual outlier (or coditioal outlier, ote: special case is local outlier) Object is O c if it deviates sigificatly based o a selected cotext Ex. -20 o C i Uppsala: outlier? (depedig o summer or witer?) Attributes of data objects should be divided ito two groups Cotextual attributes: defies the cotext, e.g., time & locatio Behavioral attributes: characteristics of the object, used i outlier evaluatio, e.g., temperature Ca be viewed as a geeralizatio of local outliers whose desity sigificatly deviates from its local area. Issue: How to defie or formulate meaigful cotext? Global Outlier

6 Types of outliers (II) Collective Outliers A subset of data objects collectively deviate sigificatly from the whole data set, eve if the idividual data objects may ot be outliers Applicatios: : e.g., itrusio detectio: Collective outlier Whe a umber of computers keep sedig deial-of-service packages to each other Detectio of collective outliers Cosider ot oly behavior of idividual objects, but also that of groups of objects Need to have the backgroud kowledge o the relatioship amog data objects, such as a distace or similarity measure o objects. A data set may have multiple types of outliers Oe object may belog to more tha oe type of outlier

7 Challeges of outlier detectio Modelig ormal objects ad outliers properly Hard to eumerate all possible ormal behaviors i a applicatio The border betwee ormal ad outlier objects is ofte a gray area Applicatio-specific outlier detectio Choice of distace measure amog objects ad the model of relatioship amog objects are ofte applicatio-depedet E.g., cliic data: a small deviatio could be a outlier; while i marketig aalysis, larger fluctuatios Hadlig oise i outlier detectio Noise may distort the ormal objects ad blur the distictio betwee ormal objects ad outliers. It may help hide outliers ad reduce the effectiveess of outlier detectio

8 Challeges of outlier detectio cot Uderstadability Uderstad why these are outliers: Justificatio of the detectio Specify the degree of a outlier: the ulikelihood of the object beig geerated by a ormal mechaism How may outliers are there i the data? Whe method is usupervised Validatio ca be quite challegig (just like for clusterig) Outlier detectio ca be compared to fidig eedle i a haystack Workig assumptio: There are cosiderably more ormal observatios tha abormal observatios (outliers/aomalies) i the data

9 Ozoe depletio history: Importace of aomaly detectio I 1985 three researchers (Farma, Gardiar ad Shakli) were puzzled by data gathered by the British Atarctic Survey showig that ozoe levels for Atarctica had dropped 10% below ormal levels Why did the Nimbus 7 satellite, which had istrumets aboard for recordig ozoe levels, ot record similarly low ozoe cocetratios? The ozoe cocetratios recorded by the satellite were so low they were beig treated as outliers by a computer program ad discarded! Sources:

10 Aomaly detectio schemes Geeral steps Build a profile of the ormal behavior Profile ca be patters or summary statistics for the overall populatio Use the ormal profile to detect aomalies Aomalies are observatios whose characteristics differ sigificatly from the ormal profile Types of aomaly detectio schemes: Graphical & Statistical based Proximity based Desity based Clusterig based

11 Graphical approaches Boxplot (1-D), Scatter plot (2-D), Spi plot (3-D) Limitatios Time cosumig Subjective

12 Covex hull method Extreme poits are assumed to be outliers Use covex hull method to detect extreme values Data are assiged to layers of covex hulls that are peeled of to detect outliers What if the outlier occurs i the middle of the data?

13 Statistical approaches Assume a parametric model describig the distributio of the data (e.g., ormal distributio) Apply a statistical test that depeds o Data distributio Parameter of distributio (e.g., mea, variace) Number of expected outliers (cofidece limit)

14 The Grubbs test Detect outliers i uivariate data (i.e. data icludig oly oe attribute) assumig data sample comes from ormal distributio: The Grubb's test (also called maximum ormed residual test) Outlier coditio is defied as: G exp > G critical For each object x i a data set, compute its z-score (i.e. G exp ): z = x x where s is stadard deviatio ad x is the mea (also G exp ) s x is a outlier if: G exp = (also termed G critical ) where is the value take by a two-sided t-distributio at a sigificace level of α/(2n), ad N is the o of objects i the data set

15 Statistical-based likelihood approach Idetifyig outliers by calculatig the chage i likelihood whe movig a poit from oe distributio to aother i a mixture of 2 distributios. The overall probability distributio of the data: D = (1 λ) M + λ A, where λ is the expected fractio of outliers. M is a probability distributio estimated from data, usually Gaussia ca be based o ay modelig method (aïve Bayes, maximum etropy, etc) A is assumed to be a uiform distributio Likelihood ad log likelihood at time t: = = = = t i t t i t t i t t t i t t A x i A t M x i M t t A x i A A M x i M M N i i D t x P A x P M D LL x P x P x P D L ) ( log log ) ( log ) log(1 ) ( ) ( ) ( ) (1 ) ( ) ( 1 λ λ λ λ

16 Statistical-based likelihood approach Assume the data set D cotais samples from a mixture of two probability distributios: M (majority distributio, typically Gaussia) A (aomalous distributio, typically uiform) Geeral approach of algorithm 10.1 (Ta et al): Iitially, assume all the data poits belog to M Let LL t (D) be the log likelihood of D at time t For each poit x t that belogs to M, move it to A Let LL t+1 (D) be the ew log likelihood. Compute the differece, Δ = LL t (D) LL t+1 (D) If Δ > c (some threshold), the x t is declared as a aomaly ad moved permaetly from M to A

17 Statistical-based likelihood approach Algorithm 10.1 (Ta et al):

18 Limitatios of statistical approaches Most of the tests are for a sigle attribute I may cases, the data distributio may ot be kow For high dimesioal data, it may be difficult to estimate the true distributio

19 Proximity-based outlier detectio I proximity-based outlier detectio a object is a outlier if it is distat from most poits called distat-based outliers More geeral ad easily applied tha statistical approaches sice usually easier to defie proximity measure There are various ways to defie outliers: Data poits for which there are fewer tha p eighborig poits withi a distace D Data poits whose distace to the kth earest eighbor is greatest Ca be sesitive to value of k Data poits whose average distace to the k earest eighbors is greatest more robust tha oly distace to kth earest eighbor Compute the distace betwee every pair of data poits ca make it expesive, O(m 2 ) Grid-based methods ad idexig ca improve performace ad complexity Does ot hadle widely varyig desities well sice usig global thresholds

20 Nearest-eighbor based approach Example where the outlier score is give by the distace to its k-earest eighbor

21 Desity-based outlier: Desity-based outlier detectio Outliers are poits i regios of low desity The outlier score of a object is the iverse of the desity aroud the object. Iverse distace desity (iverse of averaged distace to the k-earest eighbours):, where N(x,k) is the set of k-earest eighbors of x, N(x,k) is the size of that set ad y is a earest eighbor. No of poits withi regio desity (DBSCAN): The desity aroud a object is equal to the o of objects that are withi a specified distace d of the object

22 Desity-based outlier detectio (the LOF approach) For each poit, compute the desity of its local eighborhood Compute the local outlier factor (LOF) of a sample p as the average of the ratios of the desity of sample p ad the desity of its earest eighbors Outliers are poits with largest LOF value p 2 p 1 I the Nearest-eighbor approach, p 2 is ot cosidered as outlier, while LOF approach fid both p 1 ad p 2 as outliers

23 Desity-based outlier detectio usig relative desity Average relative desity (ard) is e.g. give as the ratio of the desity of a poit x ad the average desity of its earest eighbors as follows: ard(x, k) = desity(x, k) y N (x,k) desity(y, k)/ N(x, k (Eq 10.7) A simplified versio of the LOF techique usig ard(x, k) is give by:

24 Example of relative desity (LOF) approach (usig k = 10)

25 Clusterig-based outlier detectio Clusterig-based outlier: a object is a cluster-based outlier if the object does ot strogly belog to ay cluster Basic idea: Cluster the data ito groups of differet desity Choose poits i small cluster as cadidate outliers Compute the distace betwee cadidate poits ad o-cadidate clusters. If cadidate poits are far from all other o-cadidate poits, they are outliers

26 Clusterig-based outlier example

27 Outliers i lower dimesioal projectio (a grid-based approach) I high-dimesioal space, data is sparse ad otio of proximity becomes meaigless Every poit is a almost equally good outlier from the perspective of proximity-based defiitios Lower-dimesioal projectio methods A poit is a outlier if i some lower dimesioal projectio, it is preset i a local regio of abormally low desity

28 Outliers i lower dimesioal projectio (a grid-based approach) Divide each attribute ito φ equal-depth itervals Each iterval cotais a fractio f = 1/φ of the records Cosider a k-dimesioal cube created by pickig grid rages from k differet dimesios If attributes are idepedet, we expect a regio to cotai a fractio f k of the records If there are N poits, we ca measure sparsity of a cube D icludig poits as by the sparsity coefficiet S: where expected fractio ad stadard deviatio of the poits i a k-dimesioal cube is Nf k Nf k (1 f k ) give by ad respectively. Negative sparsity idicates cube cotais smaller umber of poits tha expected Ref: Outlier Detectio for High Dimesioal Data, Charu C. Aggarwal ad Philip S. Yu, ACM SIGMOD 2001 May 21-24, Sata Barbara, Califoria, USA,

29 Example for sparsity coefficiet N=100, φ = 5, f = 1/5 = 0.2, N f 2 = 4 (expected fractio)

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical

More information

Performance Plus Software Parameter Definitions

Performance Plus Software Parameter Definitions Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios

More information

Image Segmentation EEE 508

Image Segmentation EEE 508 Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.

More information

Data Analysis. Concepts and Techniques. Chapter 2. Chapter 2: Getting to Know Your Data. Data Objects and Attribute Types

Data Analysis. Concepts and Techniques. Chapter 2. Chapter 2: Getting to Know Your Data. Data Objects and Attribute Types Data Aalysis Cocepts ad Techiques Chapter 2 1 Chapter 2: Gettig to Kow Your Data Data Objects ad Attribute Types Basic Statistical Descriptios of Data Data Visualizatio Measurig Data Similarity ad Dissimilarity

More information

Improving Template Based Spike Detection

Improving Template Based Spike Detection Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for

More information

Our Learning Problem, Again

Our Learning Problem, Again Noparametric Desity Estimatio Matthew Stoe CS 520, Sprig 2000 Lecture 6 Our Learig Problem, Agai Use traiig data to estimate ukow probabilities ad probability desity fuctios So far, we have depeded o describig

More information

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters. SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that

More information

SAMPLE VERSUS POPULATION. Population - consists of all possible measurements that can be made on a particular item or procedure.

SAMPLE VERSUS POPULATION. Population - consists of all possible measurements that can be made on a particular item or procedure. SAMPLE VERSUS POPULATION Populatio - cosists of all possible measuremets that ca be made o a particular item or procedure. Ofte a populatio has a ifiite umber of data elemets Geerally expese to determie

More information

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article Available olie www.jocpr.com Joural of Chemical ad Pharmaceutical Research, 2013, 5(12):745-749 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 K-meas algorithm i the optimal iitial cetroids based

More information

3D Model Retrieval Method Based on Sample Prediction

3D Model Retrieval Method Based on Sample Prediction 20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer

More information

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised

More information

Eigenimages. Digital Image Processing: Bernd Girod, Stanford University -- Eigenimages 1

Eigenimages. Digital Image Processing: Bernd Girod, Stanford University -- Eigenimages 1 Eigeimages Uitary trasforms Karhue-Loève trasform ad eigeimages Sirovich ad Kirby method Eigefaces for geder recogitio Fisher liear discrimat aalysis Fisherimages ad varyig illumiatio Fisherfaces vs. eigefaces

More information

IMP: Superposer Integrated Morphometrics Package Superposition Tool

IMP: Superposer Integrated Morphometrics Package Superposition Tool IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College

More information

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1 Eigeimages Uitary trasforms Karhue-Loève trasform ad eigeimages Sirovich ad Kirby method Eigefaces for geder recogitio Fisher liear discrimat aalysis Fisherimages ad varyig illumiatio Fisherfaces vs. eigefaces

More information

OCR Statistics 1. Working with data. Section 3: Measures of spread

OCR Statistics 1. Working with data. Section 3: Measures of spread Notes ad Eamples OCR Statistics 1 Workig with data Sectio 3: Measures of spread Just as there are several differet measures of cetral tedec (averages), there are a variet of statistical measures of spread.

More information

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules

More information

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb Chapter 3 Descriptive Measures Measures of Ceter (Cetral Tedecy) These measures will tell us where is the ceter of our data or where most typical value of a data set lies Mode the value that occurs most

More information

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:

More information

The isoperimetric problem on the hypercube

The isoperimetric problem on the hypercube The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose

More information

COMP9318: Data Warehousing and Data Mining

COMP9318: Data Warehousing and Data Mining COMP9318: Data Warehousig ad Data Miig L8: Clusterig COMP9318: Data Warehousig ad Data Miig 1 What is Cluster Aalysis? COMP9318: Data Warehousig ad Data Miig 2 What is Cluster Aalysis? Cluster: a collectio

More information

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Analysis of Documents Clustering Using Sampled Agglomerative Technique Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based

More information

Lecture 13: Validation

Lecture 13: Validation Lecture 3: Validatio Resampli methods Holdout Cross Validatio Radom Subsampli -Fold Cross-Validatio Leave-oe-out The Bootstrap Bias ad variace estimatio Three-way data partitioi Itroductio to Patter Recoitio

More information

Accuracy Improvement in Camera Calibration

Accuracy Improvement in Camera Calibration Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z

More information

Descriptive Statistics Summary Lists

Descriptive Statistics Summary Lists Chapter 209 Descriptive Statistics Summary Lists Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical lists of meas, couts, stadard

More information

Ones Assignment Method for Solving Traveling Salesman Problem

Ones Assignment Method for Solving Traveling Salesman Problem Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:

More information

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees. Comp 135 Machie Learig Computer Sciece Tufts Uiversity Fall 2017 Roi Khardo Some of these slides were adapted from previous slides by Carla Brodley Our secod algorithm Let s look at a simple dataset for

More information

Numerical Methods Lecture 6 - Curve Fitting Techniques

Numerical Methods Lecture 6 - Curve Fitting Techniques Numerical Methods Lecture 6 - Curve Fittig Techiques Topics motivatio iterpolatio liear regressio higher order polyomial form expoetial form Curve fittig - motivatio For root fidig, we used a give fuctio

More information

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically

More information

Information Metrics for Low-rate DDoS Attack Detection : A Comparative Evaluation

Information Metrics for Low-rate DDoS Attack Detection : A Comparative Evaluation Iformatio Metrics for Low-rate DDoS Attack Detectio : A Comparative Evaluatio Moowar. Bhuya Dept. of Computer Sciece ad Egg Kaziraga Uiversity Koraikhowa, Jorhat 785006, Assam moowar.tezu@gmail.com D.

More information

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro

More information

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5 Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:

More information

Analysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach

Analysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach Aalysis of Differet Similarity Measure Fuctios ad their Impacts o Shared Nearest Neighbor Clusterig Approach Ail Kumar Patidar School of IT, Rajiv Gadhi Techical Uiversity, Bhopal (M.P.), Idia Jitedra

More information

Evaluation scheme for Tracking in AMI

Evaluation scheme for Tracking in AMI A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:

More information

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs CHAPTER IV: GRAPH THEORY Sectio : Itroductio to Graphs Sice this class is called Number-Theoretic ad Discrete Structures, it would be a crime to oly focus o umber theory regardless how woderful those topics

More information

Chapter 2 and 3, Data Pre-processing

Chapter 2 and 3, Data Pre-processing CSI 4352, Itroductio to Data Miig Chapter 2 ad 3, Data Pre-processig Youg-Rae Cho Associate Professor Departmet of Computer Sciece Baylor Uiversity Why Need Data Pre-processig? Icomplete Data Missig values,

More information

Improved Random Graph Isomorphism

Improved Random Graph Isomorphism Improved Radom Graph Isomorphism Tomek Czajka Gopal Paduraga Abstract Caoical labelig of a graph cosists of assigig a uique label to each vertex such that the labels are ivariat uder isomorphism. Such

More information

Tracking individuals in surveillance video of a high-density crowd

Tracking individuals in surveillance video of a high-density crowd Trackig idividuals i surveillace video of a high-desity crowd Nighag Hu a,b, Heri Bouma a,*, Marcel Worrig b a TNO, P.O. Box 96864, 2509 JG The Hague, The Netherlads; b Uiversity of Amsterdam, P.O. Box

More information

Data diverse software fault tolerance techniques

Data diverse software fault tolerance techniques Data diverse software fault tolerace techiques Complemets desig diversity by compesatig for desig diversity s s limitatios Ivolves obtaiig a related set of poits i the program data space, executig the

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

arxiv: v1 [cs.lg] 5 Oct 2018

arxiv: v1 [cs.lg] 5 Oct 2018 CDF TRANSFORM-SHIFT: AN EFFECTIVE WAY TO DEAL WITH INHOMOGENEOUS DENSITY DATASETS A PREPRINT arxiv:1810.02897v1 [cs.lg] 5 Oct 2018 Ye Zhu ( ) School of Iformatio Techology Deaki Uiversit Victoria, Australia

More information

Performance Comparisons of PSO based Clustering

Performance Comparisons of PSO based Clustering Performace Comparisos of PSO based Clusterig Suresh Chadra Satapathy, 2 Guaidhi Pradha, 3 Sabyasachi Pattai, 4 JVR Murthy, 5 PVGD Prasad Reddy Ail Neeruoda Istitute of Techology ad Scieces, Sagivalas,Vishaapatam

More information

Capability Analysis (Variable Data)

Capability Analysis (Variable Data) Capability Aalysis (Variable Data) Revised: 0/0/07 Summary... Data Iput... 3 Capability Plot... 5 Aalysis Summary... 6 Aalysis Optios... 8 Capability Idices... Prefereces... 6 Tests for Normality... 7

More information

Math 10C Long Range Plans

Math 10C Long Range Plans Math 10C Log Rage Plas Uits: Evaluatio: Homework, projects ad assigmets 10% Uit Tests. 70% Fial Examiatio.. 20% Ay Uit Test may be rewritte for a higher mark. If the retest mark is higher, that mark will

More information

Descriptive Data Mining Modeling in Telecom Systems

Descriptive Data Mining Modeling in Telecom Systems Descriptive Data Miig Modelig i Telecom Systems Ivo Pejaović, Zora Sočir, Damir Medved 2 Faculty of Electrical Egieerig ad Computig, Uiversity of Zagreb Usa 3, HR-0000 Zagreb, Croatia Tel: +385 629 763;

More information

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only Edited: Yeh-Liag Hsu (998--; recommeded: Yeh-Liag Hsu (--9; last updated: Yeh-Liag Hsu (9--7. Note: This is the course material for ME55 Geometric modelig ad computer graphics, Yua Ze Uiversity. art of

More information

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals

UNIT 4 Section 8 Estimating Population Parameters using Confidence Intervals UNIT 4 Sectio 8 Estimatig Populatio Parameters usig Cofidece Itervals To make ifereces about a populatio that caot be surveyed etirely, sample statistics ca be take from a SRS of the populatio ad used

More information

Dynamic Programming and Curve Fitting Based Road Boundary Detection

Dynamic Programming and Curve Fitting Based Road Boundary Detection Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk

More information

. Written in factored form it is easy to see that the roots are 2, 2, i,

. Written in factored form it is easy to see that the roots are 2, 2, i, CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 26 Ehaced Data Models: Itroductio to Active, Temporal, Spatial, Multimedia, ad Deductive Databases Copyright 2016 Ramez Elmasri ad Shamkat B.

More information

The Extended Weibull Geometric Family

The Extended Weibull Geometric Family The Exteded Weibull Geometric Family Giovaa Oliveira Silva 1 Gauss M. Cordeiro 2 Edwi M. M. Ortega 3 1 Itroductio The literature o Weibull models is vast, disjoited, ad scattered across may differet jourals.

More information

ECE4050 Data Structures and Algorithms. Lecture 6: Searching

ECE4050 Data Structures and Algorithms. Lecture 6: Searching ECE4050 Data Structures ad Algorithms Lecture 6: Searchig 1 Search Give: Distict keys k 1, k 2,, k ad collectio L of records of the form (k 1, I 1 ), (k 2, I 2 ),, (k, I ) where I j is the iformatio associated

More information

Firewall and IDS. TELE3119: Week8

Firewall and IDS. TELE3119: Week8 Firewall ad IDS TELE3119: Week8 Outlie Firewalls Itrusio Detectio Systems (IDSs) Itrusio Prevetio Systems (IPSs) 8-2 Example Attacks Disclosure, modificatio, ad destructio of data Compromise a host ad

More information

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

New Fuzzy Color Clustering Algorithm Based on hsl Similarity IFSA-EUSFLAT 009 New Fuzzy Color Clusterig Algorithm Based o hsl Similarity Vasile Ptracu Departmet of Iformatics Techology Tarom Compay Bucharest Romaia Email: patrascu.v@gmail.com Abstract I this paper

More information

Pattern Recognition Systems Lab 1 Least Mean Squares

Pattern Recognition Systems Lab 1 Least Mean Squares Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig

More information

Lecture 5. Counting Sort / Radix Sort

Lecture 5. Counting Sort / Radix Sort Lecture 5. Coutig Sort / Radix Sort T. H. Corme, C. E. Leiserso ad R. L. Rivest Itroductio to Algorithms, 3rd Editio, MIT Press, 2009 Sugkyukwa Uiversity Hyuseug Choo choo@skku.edu Copyright 2000-2018

More information

Normal Distributions

Normal Distributions Normal Distributios Stacey Hacock Look at these three differet data sets Each histogram is overlaid with a curve : A B C A) Weights (g) of ewly bor lab rat pups B) Mea aual temperatures ( F ) i A Arbor,

More information

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Bayesian approach to reliability modelling for a probability of failure on demand parameter Bayesia approach to reliability modellig for a probability of failure o demad parameter BÖRCSÖK J., SCHAEFER S. Departmet of Computer Architecture ad System Programmig Uiversity Kassel, Wilhelmshöher Allee

More information

are two specific neighboring points, F( x, y)

are two specific neighboring points, F( x, y) $33/,&$7,212)7+(6(/)$92,',1* 5$1'20:$/.12,6(5('8&7,21$/*25,7+0,17+(&2/285,0$*(6(*0(17$7,21 %RJGDQ602/.$+HQU\N3$/86'DPLDQ%(5(6.$ 6LOHVLDQ7HFKQLFDO8QLYHUVLW\'HSDUWPHQWRI&RPSXWHU6FLHQFH $NDGHPLFND*OLZLFH32/$1'

More information

A Study on the Performance of Cholesky-Factorization using MPI

A Study on the Performance of Cholesky-Factorization using MPI A Study o the Performace of Cholesky-Factorizatio usig MPI Ha S. Kim Scott B. Bade Departmet of Computer Sciece ad Egieerig Uiversity of Califoria Sa Diego {hskim, bade}@cs.ucsd.edu Abstract Cholesky-factorizatio

More information

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS EM375 STATISTICS AND MEASUREMENT UNCERTAINTY LEAST SQUARES LINEAR REGRESSION ANALYSIS I this uit of the course we ivestigate fittig a straight lie to measured (x, y) data pairs. The equatio we wat to fit

More information

New HSL Distance Based Colour Clustering Algorithm

New HSL Distance Based Colour Clustering Algorithm The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics

More information

South Slave Divisional Education Council. Math 10C

South Slave Divisional Education Council. Math 10C South Slave Divisioal Educatio Coucil Math 10C Curriculum Package February 2012 12 Strad: Measuremet Geeral Outcome: Develop spatial sese ad proportioal reasoig It is expected that studets will: 1. Solve

More information

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual)

Wavelet Transform. CSE 490 G Introduction to Data Compression Winter Wavelet Transformed Barbara (Enhanced) Wavelet Transformed Barbara (Actual) Wavelet Trasform CSE 49 G Itroductio to Data Compressio Witer 6 Wavelet Trasform Codig PACW Wavelet Trasform A family of atios that filters the data ito low resolutio data plus detail data high pass filter

More information

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu

More information

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000. 5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator

More information

Elementary Educational Computer

Elementary Educational Computer Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified

More information

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation 6-0-0 Kowledge Trasformatio from Task Scearios to View-based Desig Diagrams Nima Dezhkam Kamra Sartipi {dezhka, sartipi}@mcmaster.ca Departmet of Computig ad Software McMaster Uiversity CANADA SEKE 08

More information

DATA stream classification has been a widely studied

DATA stream classification has been a widely studied 1484 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 7, JULY 2013 Classificatio ad Adaptive Novel Class Detectio of Feature-Evolvig Data Streams Mohammad M. Masud, Member, IEEE, Qig Che,

More information

Big-O Analysis. Asymptotics

Big-O Analysis. Asymptotics Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses

More information

Unsupervised Discretization Using Kernel Density Estimation

Unsupervised Discretization Using Kernel Density Estimation Usupervised Discretizatio Usig Kerel Desity Estimatio Maregle Biba, Floriaa Esposito, Stefao Ferilli, Nicola Di Mauro, Teresa M.A Basile Departmet of Computer Sciece, Uiversity of Bari Via Oraboa 4, 7025

More information

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Descriptive Statistics ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced by 50,

More information

Lower Bounds for Sorting

Lower Bounds for Sorting Liear Sortig Topics Covered: Lower Bouds for Sortig Coutig Sort Radix Sort Bucket Sort Lower Bouds for Sortig Compariso vs. o-compariso sortig Decisio tree model Worst case lower boud Compariso Sortig

More information

Introduction to OSPF. ISP Training Workshops

Introduction to OSPF. ISP Training Workshops Itroductio to OSPF ISP Traiig Workshops 1 OSPF p Ope Shortest Path First p Lik state or SPF techology p Developed by OSPF workig group of IETF (RFC 1247) p OSPFv2 stadard described i RFC2328 p Desiged

More information

Parametric curves. Reading. Parametric polynomial curves. Mathematical curve representation. Brian Curless CSE 457 Spring 2015

Parametric curves. Reading. Parametric polynomial curves. Mathematical curve representation. Brian Curless CSE 457 Spring 2015 Readig Required: Agel 0.-0.3, 0.5., 0.6-0.7, 0.9 Parametric curves Bria Curless CSE 457 Sprig 05 Optioal Bartels, Beatty, ad Barsy. A Itroductio to Splies for use i Computer Graphics ad Geometric Modelig,

More information

Relationship between augmented eccentric connectivity index and some other graph invariants

Relationship between augmented eccentric connectivity index and some other graph invariants Iteratioal Joural of Advaced Mathematical Scieces, () (03) 6-3 Sciece Publishig Corporatio wwwsciecepubcocom/idexphp/ijams Relatioship betwee augmeted eccetric coectivity idex ad some other graph ivariats

More information

Lecture 2: Spectra of Graphs

Lecture 2: Spectra of Graphs Spectral Graph Theory ad Applicatios WS 20/202 Lecture 2: Spectra of Graphs Lecturer: Thomas Sauerwald & He Su Our goal is to use the properties of the adjacecy/laplacia matrix of graphs to first uderstad

More information

Parabolic Path to a Best Best-Fit Line:

Parabolic Path to a Best Best-Fit Line: Studet Activity : Fidig the Least Squares Regressio Lie By Explorig the Relatioship betwee Slope ad Residuals Objective: How does oe determie a best best-fit lie for a set of data? Eyeballig it may be

More information

Computational Geometry

Computational Geometry Computatioal Geometry Chapter 4 Liear programmig Duality Smallest eclosig disk O the Ageda Liear Programmig Slides courtesy of Craig Gotsma 4. 4. Liear Programmig - Example Defie: (amout amout cosumed

More information

Data Preprocessing. Motivation

Data Preprocessing. Motivation Data Preprocessig Mirek Riedewald Some slides based o presetatio by Jiawei Ha ad Michelie Kamber Motivatio Garbage-i, garbage-out Caot get good miig results from bad data Need to uderstad data properties

More information

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP

Introduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP Nature-Ispired Computig Hadlig Costraits Dr. Şima Uyar September 2006 Itroductio may practical problems are costraied ot all combiatios of variable values represet valid solutios feasible solutios ifeasible

More information

DETECTION OF LANDSLIDE BLOCK BOUNDARIES BY MEANS OF AN AFFINE COORDINATE TRANSFORMATION

DETECTION OF LANDSLIDE BLOCK BOUNDARIES BY MEANS OF AN AFFINE COORDINATE TRANSFORMATION Proceedigs, 11 th FIG Symposium o Deformatio Measuremets, Satorii, Greece, 2003. DETECTION OF LANDSLIDE BLOCK BOUNDARIES BY MEANS OF AN AFFINE COORDINATE TRANSFORMATION Michaela Haberler, Heribert Kahme

More information

Relay Placement Based on Divide-and-Conquer

Relay Placement Based on Divide-and-Conquer Relay Placemet Based o Divide-ad-Coquer Ravabakhsh Akhlaghiia, Azadeh Kaviafar, ad Mohamad Javad Rostami, Member, IACSIT Abstract I this paper, we defie a relay placemet problem to cover a large umber

More information

DATA stream classification poses many challenges, some

DATA stream classification poses many challenges, some IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 6, JUNE 2011 859 Classificatio ad Novel Class Detectio i Cocept-Driftig Data Streams uder Time Costraits Mohammad M. Masud, Member, IEEE,

More information

A New Network-based Algorithm for Human Activity Recognition in Videos

A New Network-based Algorithm for Human Activity Recognition in Videos IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, A New Network-based Algorithm for Huma Activity Recogitio i Videos Weiyao Li, Yuazhe Che, Jiaxi Wu, Hali Wag, Bi Sheg, ad Hogxiag Li Abstract

More information

Reading. Parametric curves. Mathematical curve representation. Curves before computers. Required: Angel , , , 11.9.

Reading. Parametric curves. Mathematical curve representation. Curves before computers. Required: Angel , , , 11.9. Readig Required: Agel.-.3,.5.,.6-.7,.9. Optioal Parametric curves Bartels, Beatty, ad Barsky. A Itroductio to Splies for use i Computer Graphics ad Geometric Modelig, 987. Fari. Curves ad Surfaces for

More information

Octahedral Graph Scaling

Octahedral Graph Scaling Octahedral Graph Scalig Peter Russell Jauary 1, 2015 Abstract There is presetly o strog iterpretatio for the otio of -vertex graph scalig. This paper presets a ew defiitio for the term i the cotext of

More information

Dimensionality Reduction PCA

Dimensionality Reduction PCA Dimesioality Reductio PCA Machie Learig CSE446 David Wadde (slides provided by Carlos Guestri) Uiversity of Washigto Feb 22, 2017 Carlos Guestri 2005-2017 1 Dimesioality reductio Iput data may have thousads

More information

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process Vol.133 (Iformatio Techology ad Computer Sciece 016), pp.85-89 http://dx.doi.org/10.1457/astl.016. Euclidea Distace Based Feature Selectio for Fault Detectio Predictio Model i Semicoductor Maufacturig

More information

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets WSEAS TRANSACTIONS o SYSTEMS Ag Sau Loog, Og Hog Choo, Low Heg Chi Criterio i selectig the clusterig algorithm i Radial Basis Fuctioal Lik Nets ANG SAU LOONG 1, ONG HONG CHOON 2 & LOW HENG CHIN 3 Departmet

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies

More information

condition w i B i S maximum u i

condition w i B i S maximum u i ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility

More information

Appendix D. Controller Implementation

Appendix D. Controller Implementation COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Appedix D Cotroller Implemetatio Cotroller Implemetatios Combiatioal logic (sigle-cycle); Fiite state machie (multi-cycle, pipelied);

More information

A Kernel Density Based Approach for Large Scale Image Retrieval

A Kernel Density Based Approach for Large Scale Image Retrieval A Kerel Desity Based Approach for Large Scale Image Retrieval Wei Tog Departmet of Computer Sciece ad Egieerig Michiga State Uiversity East Lasig, MI, USA togwei@cse.msu.edu Rog Ji Departmet of Computer

More information

The Nature of Light. Chapter 22. Geometric Optics Using a Ray Approximation. Ray Approximation

The Nature of Light. Chapter 22. Geometric Optics Using a Ray Approximation. Ray Approximation The Nature of Light Chapter Reflectio ad Refractio of Light Sectios: 5, 8 Problems: 6, 7, 4, 30, 34, 38 Particles of light are called photos Each photo has a particular eergy E = h ƒ h is Plack s costat

More information

ANN WHICH COVERS MLP AND RBF

ANN WHICH COVERS MLP AND RBF ANN WHICH COVERS MLP AND RBF Josef Boští, Jaromír Kual Faculty of Nuclear Scieces ad Physical Egieerig, CTU i Prague Departmet of Software Egieerig Abstract Two basic types of artificial eural etwors Multi

More information

Novel pruning based hierarchical agglomerative clustering for mining outliers in financial time series

Novel pruning based hierarchical agglomerative clustering for mining outliers in financial time series Computatioal Fiace ad its Applicatios III 33 Novel pruig based hierarchical agglomerative clusterig for miig outliers i fiacial time series D. Wag, P. J. Fortier & H. E. Michel Wester Asset Maagemet, USA

More information

Decision Support Systems

Decision Support Systems Decisio Support Systems 50 (010) 93 10 Cotets lists available at ScieceDirect Decisio Support Systems joural homepage: www.elsevier.com/locate/dss The data complexity idex to costruct a efficiet cross-validatio

More information

Alpha Individual Solutions MAΘ National Convention 2013

Alpha Individual Solutions MAΘ National Convention 2013 Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5

More information

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve Advaces i Computer, Sigals ad Systems (2018) 2: 19-25 Clausius Scietific Press, Caada Aalysis of Server Resource Cosumptio of Meteorological Satellite Applicatio System Based o Cotour Curve Xiagag Zhao

More information