An Optimised Density Based Clustering Algorithm

Similar documents
Heterogeneous Density Based Spatial Clustering of Application with Noise

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

IP Network Design by Modified Branch Exchange Method

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES

Detection and Recognition of Alert Traffic Signs

Optical Flow for Large Motion Using Gradient Technique

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

Towards Adaptive Information Merging Using Selected XML Fragments

Lecture 27: Voronoi Diagrams

Separability and Topology Control of Quasi Unit Disk Graphs

HISTOGRAMS are an important statistic reflecting the

A Memory Efficient Array Architecture for Real-Time Motion Estimation

An Unsupervised Segmentation Framework For Texture Image Queries

Scaling Location-based Services with Dynamically Composed Location Index

A New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE

ART GALLERIES WITH INTERIOR WALLS. March 1998

An Extension to the Local Binary Patterns for Image Retrieval

A Two-stage and Parameter-free Binarization Method for Degraded Document Images

TESSELLATIONS. This is a sample (draft) chapter from: MATHEMATICAL OUTPOURINGS. Newsletters and Musings from the St. Mark s Institute of Mathematics

Topological Characteristic of Wireless Network

Also available at ISSN (printed edn.), ISSN (electronic edn.) ARS MATHEMATICA CONTEMPORANEA 3 (2010)

Positioning of a robot based on binocular vision for hand / foot fusion Long Han

A Recommender System for Online Personalization in the WUM Applications

Illumination methods for optical wear detection

Point-Biserial Correlation Analysis of Fuzzy Attributes

A Full-mode FME VLSI Architecture Based on 8x8/4x4 Adaptive Hadamard Transform For QFHD H.264/AVC Encoder

DUe to the recent developments of gigantic social networks

(a, b) x y r. For this problem, is a point in the - coordinate plane and is a positive number.

Shortest Paths for a Two-Robot Rendez-Vous

SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH

Performance Optimization in Structured Wireless Sensor Networks

Image Enhancement in the Spatial Domain. Spatial Domain

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

And Ph.D. Candidate of Computer Science, University of Putra Malaysia 2 Faculty of Computer Science and Information Technology,

A modal estimation based multitype sensor placement method

Obstacle Avoidance of Autonomous Mobile Robot using Stereo Vision Sensor

Controlled Information Maximization for SOM Knowledge Induced Learning

Quality Aware Privacy Protection for Location-based Services

Clustering Interval-valued Data Using an Overlapped Interval Divergence

Embeddings into Crossed Cubes

POMDP: Introduction to Partially Observable Markov Decision Processes Hossein Kamalzadeh, Michael Hahsler

Slotted Random Access Protocol with Dynamic Transmission Probability Control in CDMA System

Keith Dalbey, PhD. Sandia National Labs, Dept 1441 Optimization & Uncertainty Quantification

All lengths in meters. E = = 7800 kg/m 3

Color Correction Using 3D Multiview Geometry

DEADLOCK AVOIDANCE IN BATCH PROCESSES. M. Tittus K. Åkesson

High performance CUDA based CNN image processor

Modelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set

4.2. Co-terminal and Related Angles. Investigate

Reachable State Spaces of Distributed Deadlock Avoidance Protocols

Bo Gu and Xiaoyan Hong*

Input Layer f = 2 f = 0 f = f = 3 1,16 1,1 1,2 1,3 2, ,2 3,3 3,16. f = 1. f = Output Layer

The Dual Round Robin Matching Switch with Exhaustive Service

Fifth Wheel Modelling and Testing

Efficient Execution Path Exploration for Detecting Races in Concurrent Programs

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation

Efficient Maximal Poisson-Disk Sampling

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

The International Conference in Knowledge Management (CIKM'94), Gaithersburg, MD, November 1994.

A Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components

THE THETA BLOCKCHAIN

Hierarchically Clustered P2P Streaming System

Fast quality-guided flood-fill phase unwrapping algorithm for three-dimensional fringe pattern profilometry

Title. Author(s)NOMURA, K.; MOROOKA, S. Issue Date Doc URL. Type. Note. File Information

Comparisons of Transient Analytical Methods for Determining Hydraulic Conductivity Using Disc Permeameters

Generalized Grey Target Decision Method Based on Decision Makers Indifference Attribute Value Preferences

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc

On the Conversion between Binary Code and Binary-Reflected Gray Code on Boolean Cubes

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

PROBABILITY-BASED OPTIMAL PATH PLANNING FOR TWO-WHEELED MOBILE ROBOTS

Analysis of Wired Short Cuts in Wireless Sensor Networks

On Error Estimation in Runge-Kutta Methods

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

Event-based Location Dependent Data Services in Mobile WSNs

A New Finite Word-length Optimization Method Design for LDPC Decoder

Frequency Domain Approach for Face Recognition Using Optical Vanderlugt Filters

WIRELESS sensor networks (WSNs), which are capable

Effective Missing Data Prediction for Collaborative Filtering

A Family of Distributed Deadlock Avoidance Protocols and their Reachable State Spaces

Communication vs Distributed Computation: an alternative trade-off curve

Dynamic Topology Control to Reduce Interference in MANETs

FACE VECTORS OF FLAG COMPLEXES

Lecture # 04. Image Enhancement in Spatial Domain

Topic -3 Image Enhancement

Parallel processing model for XML parsing

On the Forwarding Area of Contention-Based Geographic Forwarding for Ad Hoc and Sensor Networks

A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM

Automatically Testing Interacting Software Components

Lifetime and Energy Hole Evolution Analysis in Data-Gathering Wireless Sensor Networks

Combinatorial Mobile IP: A New Efficient Mobility Management Using Minimized Paging and Local Registration in Mobile IP Environments

Conservation Law of Centrifugal Force and Mechanism of Energy Transfer Caused in Turbomachinery

Adaptation of Motion Capture Data of Human Arms to a Humanoid Robot Using Optimization

Data mining based automated reverse engineering and defect discovery

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

Any modern computer system will incorporate (at least) two levels of storage:

Color Interpolation for Single CCD Color Camera

A ROI Focusing Mechanism for Digital Cameras

SCALABLE ENERGY EFFICIENT AD-HOC ON DEMAND DISTANCE VECTOR (SEE-AODV) ROUTING PROTOCOL IN WIRELESS MESH NETWORKS

A Novel Automatic White Balance Method For Digital Still Cameras

Transcription:

Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 An Optimised Density Based Clusteing Algoithm J. Hencil Pete Depatment of Compute Science St. Xavie s College, Palayamkottai, India A. Antonysamy Depatment of Mathematics St. Xavie s College, Kathmandu, Nepal ABSTRACT The DBSCAN [1] algoithm is a popula algoithm in Data Mining field as it has the ability to mine the noiseless abitay shape Clustes in an elegant way. As the oiginal DBSCAN algoithm uses the distance measues to compute the distance between objects, it consumes so much pocessing time and its computation complexity comes as O (N ). In this pape we have poposed a new algoithm to impove the pefomance of DBSCAN algoithm. The existing algoithms A Fast DBSCAN Algoithm[6] and Memoy effect in DBSCAN algoithm[7] has been combined in the new solution to speed up the pefomance as well as impove the quality of the output. As the RegionQuey opeation takes long time to pocess the objects, only few objects ae consideed fo the expansion and the emaining missed bode objects ae handled diffeently duing the cluste expansion. Eventually the pefomance analysis and the cluste output show that the poposed solution is bette to the existing algoithms. Keywods Optimised DBSCAN, Density Cluste, Optimised RegionQuey, RegionQuey. 1. INTRODUCTION Data mining is a fast gowing field in which clusteing plays a vey impotant ole. Clusteing is the pocess of gouping a set of physical o abstact objects into classes of simila objects []. Among the many algoithms poposed in the clusteing field, DBSCAN is one of the most popula algoithms due to its high quality of noiseless output clustes. As the oiginal DBSCAN algoithm RegionQuey function is vey expensive facto in tems of time, we have poposed a solution to minimize the RegionQuey function call to cove the maximum neighbous in an elegant way. The Fast DBSCAN Algoithm s[6] seleted seed objects RegionQuey has been impoved to give the bette output, at the same time within less time using Memoy effect in DBSCAN algoithm[7]. The emaining objects pesent in the bode aea have been examined sepaately duing the cluste expansion which is not done in the Fast DBSCAN Algoithm. So the new algoithm is capable to give the bette pefomance than the existing DBSCAN algoithms. Rest of the pape is oganised as follows. Section gives the bief histoy about the elated woks in the same aea. Section 3 gives the intoduction of oiginal DBSCAN and section 4 explains the poposed algoithm. Afte the new algoithm s explanation, section 5 shows the Expeimental Results and final section 6 pesents the conclusion and futue wok associated with this algoithm.. RELATED WORK The DBSCAN (Density Based Spatial Clusteing of Application with Noise) [1] is the basic clusteing algoithm to mine the clustes based on objects density. In this algoithm, fist the numbe of objects pesent within the neighbou egion (Eps) is computed. If the neighbou objects count is below the given theshold value, the object will be maked as NOISE. Othewise the new cluste will be fomed fom the coe object by finding the goup of density connected objects that ae maximal w..t densityeachability. The CHAMELEON [3] is a two phase algoithm. It geneates a k-neaest gaph in the fist phase and hieachical cluste algoithm has been used in the second phase to find the cluste by combining the sub clustes. The OPTICS [4] algoithm adopts the oiginal DBSCAN algoithm to deal with vaiance density clustes. This algoithm computes an odeing of the objects based on the eachability distance fo epesenting the intinsic hieachical clusteing stuctue. The Valleys in the plot indicate the clustes. But the input paametes ξ is citical fo identifying the valleys as ξ clustes. The DENCLUE [5] algoithm uses kenel density estimation. The esult of density function gives the local density maxima value and this local density value is used to fom the clustes. If the local density value is vey small, the objects of clustes will be discaded as NOISE. A Fast DBSCAN (FDBSCAN) Algoithm[6] has been invented to impove the speed of the oiginal DBSCAN algoithm and the pefomance impovement has been achieved though consideing only few selected epesentative objects belongs inside a coe object s neighbou egion as seed objects fo the futhe expansion. Hence this algoithm is faste than the basic vesion of DBSCAN algoithm and suffes with the loss of esult accuacy. The MEDBSCAN [7] algoithm has been poposed ecently to impove the pefomance of DBSCAN algoithm, at the same time without loosing the esult accuacy. In this algoithm totally thee queues have been used, the fist queue will stoe the neighbous of the coe object which belong inside Eps distance, the second queue is used to stoe the neighbous of the coe object which belong inside * Eps distance and the thid queue is the seeds queue which stoe the unhandled objects fo futhe expansion. This algoithm guaantees some notable pefomance impovement if Eps value is not vey sensitive. Though the DBSCAN algoithm s complexity can be educed to O (N * log N) using some spatial tees, it is an exta effot to constuct, oganize the tee and the tee equies an additional memoy to hold the objects. In this new algoithm we have achieved good pefomance with oiginal computation complexity O (N ). 3. INTRODUCTION TO DBSCAN ALGORITHM In the following definitions, a database D with set of points of k- dimensional space S has been used. As we need to find out the object neighbous which ae exist/suounded with in the given adius (Eps), Euclidean function dist (p, q) has been used, whee p and q ae the two objects. This function takes two objects and gives the distance between them. 0

Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 Definition 1: Eps Neighbouhood of an object p The Eps Neighbouhood of an object p is efeed as NEps(p), defined as NEps(p) = {q D dist(p,q) <=Eps}. Definition : Coe Object Condition An Object p is efeed as coe object, if the neighbou objects count >= given theshold value (MinObjs). i.e. NEps(p) >=MinObjs Whee MinObjs efes the minimum numbe of neighbou objects to satisfy the coe object condition. In the above case, if p has neighbous which ae exist within the Eps adius count is >= MinObjs, p can be efeed as coe object. Definition 3: Diectly Density Reachable Object An Object p is efeed as diectly density eachable fom anothe object q w..t Eps and MinObjs if function calls, FDBSCAN Algoithm s [3] selected epesentative objects as seed objects appoach duing the cluste expansion has been used in this solution and this appoach has been poved theoetically using the following Lemmas 1 and. As the RegionQuey etieve the neighbou objects which belong inside the Eps adius, Cicle lemmas ae given and which can be diectly used in the RegionQuey optimization. Lemma 1: Minimum numbe of identical cicles equied to cove the cicumfeence of a cicle with same adius which passes though the centes of othe cicles is thee. Poof: Let C and C 1 be the identical cicles of adius with cente at O and O 1 espectively. Assume the cicle C passes though the cente O 1 of the cicle C 1 and the cicle C 1 passes though the cente O of the cicle C. Let the cicles intesect at P and Q. P p NEps(q) and NEps(q) >= MinObjs (Coe Object condition) o 10 C 1 O 1 O C Definition 4: Density Reachable Object An object p is efeed as density eachable fom anothe object q w..t Eps and MinObjs if thee is a chain of objects p1,,pn, p1=q, pn=p such that pi+1 is diectly density eachable fom pi. Q Definition 5: Density connected object An Object p is density connected to anothe object q if thee is an object o such that both, p and q ae density eachable fom o w..t Eps and MinObjs. Definition 6: Cluste A Cluste C is a non-empty subset of a Database D w..t Eps and MinObjs which satisfying the following conditions. Fo evey p and q, if p cluste C and q is density eachable fom p w..t Eps and MinObjs then q C. Fo evey p and q, q C; p is density connected to q w..t Eps and MinObjs. Definition 7: Noise An object which doesn t belong to any cluste is called noise. The DBSCAN algoithm finds the Eps Neighbouhood of each object in a Database duing the clusteing pocess. Befoe the cluste expansion, if the algoithm finds any non coe object, it will be maked as NOISE. With a coe object, algoithm initiate a cluste and suounding objects will be added into the queue fo the futhe expansion. Each queue objects will be popped out and find the Eps neighbou objects fo the popped out object. When the new object is a coe object, all its neighbou objects will be assigned with the cuent cluste id and its unpocessed neighbou objects will be pushed into queue fo futhe pocessing. This pocess will be epeated until thee is no object in the queue fo the futhe pocessing. 4. PROPOSED SOLUTION A new algoithm has been poposed in this pape to ovecome the poblem of the pefomance issue which exists in the density based clusteing algoithms. In this algoithm, numbe of RegionQuey call has been educed as well as some RegionQuey calls speed has been impoved. Fo educing the RegionQuey Figue 1 Two Identical Cicles Intesection with espect to fist cicle s Cente Point. Clealy, OP = OQ = ; O 1 P = O 1 Q = and OO 1 =. O 1 OP and POO 1 = QOO 1 = O 1 QO ae equilateal. 60 POQ = POO 1 + QOO 1 = Now length of ac PO 1 Q = 10 360 10 = 3 Thus acual length 3 of the cicumfeence of the given cicle C is coveed by C 1. In ode to cove the emaining pat of the cicumfeence of cicle C, daw a cicle C of same adius with cente O, passes though O and P. Let C intesect C at anothe point R (say). O 1 P Q O O O 3 R Figue Fou Identical Cicles intesection w..t fist Cicle s cente point. 1

Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 Length of ac PO R = 3 i.e., acual length 3 is coveed by C [poceeding as above] of the cicumfeence of the given cicle C Thus the cicles C 1 and C can able to cove only 3 pat of the cicumfeence of the cicle C. i.e., in ode to cove the complete cicumfeence of the cicle C we ae equied to daw one moe cicle C 3 passes though O, Q and R with cente at O 3 and adius. Length of ac RO 3 Q = 3 Now, Length of ac PO 1 Q + Length of ac PO R + Length =, which is the peimete of the of ac RO 3 Q = 3 3 cicumfeence of the cicle C. Hence minimum thee identical cicles equied to cove the cicumfeence of a cicle with same adius which passes though the centes of othe cicles. Lemma 1 poves that the minimum equiement to cove the cicumfeence of the cente cicle and these minimum cicles selection is equivalent to the RegionQuey call in the DBSCAN algoithm. In the eal scenaio, thee RegionQuey call is not sufficient to cove most the neighbous which exist in the cente object s neighbous when the objects in the dataset is distibuted unifomly(assume the objects ae distibuted unifomly and the distance between an object and its neighbou is 1). Moeove these thee RegionQuey function calls ae not sufficient to cove immediate neighbous of the cente object s neighbous and this poblem is explained below: 4 Figue 4 Minimum Cicles to cove the immediate neighbous. Poof: Clealy, O 1 OO P is a squae of side. OP = Diagonal of the squae of side = Distance AP = - = = 0.414 Thus fou cicles ae able to cove the objects which ae at most 0.414 distance apat fom the cicumfeence of the oiginal cicle C. So we need minimum fou RegionQuey call to cove all the immediate neighbous of the cente Object s neighbous and this will cove > 80% of the neighbou objects of cente object s neighbous. This can be poved as follows: Lemma 3: Fou Identical Cicles ae sufficient to cove moe than 80 % of the neighbou objects of cente cicle when objects ae distibuted unifomly. 1 Poof: Figue 3 Missing immediate neighbou. Above pictue shows that a cicle ( Oiginal Cicle ) is been intesected by thee othe identical cicles. Even though the thee cicles ae coveing the full cicumfeence of the oiginal cicle, these thee cicles ae not able to cove cente cicle s immediate neighbous which ae maked in ed colou (p1, p and p3). i.e. even if the distance between the intesection point and the immediate neighbou point is 1, above scenaio can t cove the all its immediate neighbous. So the Lemma has been intoduced to pove the minimum cicles equiement to cove all the immediate neighbous. Lemma : Fou identical cicles ae sufficient to cove all the immediate neighbou objects of the oiginal cicle when the objects ae distibuted unifomly. Figue 5 Neighbous uneachable aea. Aea of oute cicle (with adius ) = Aea of the squae PQRS (with side ) = = 4 = 4

Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 Total aea of fou semi cicles (each with adius ) = 4 = Hence aea of unmaked egion = 4 + 4.1 Poposed Queue Stuctue So InneRegion and OuteRegion queues will maintain the coesponding egion objects intenally in fou queues. Following diagam shows each queue s object stoage aeas. Aea of maked egion = 4 = ( ) ) = - (4 + - 4 Pecentage of aea in oute cicle coveed by the maked aea = ( - ) 4 100% 50% = = 18.169 % Hence the aea occupied by the maked egion is < 0 pecentage. So in the eal time scenaio we can conclude that if we select fou seed objects fo the cluste expansion fom the cente object s neighbous we have the chances to ignoe ~0 % of the objects which pesent in the bode egion and the pevious FDBSCAN algoithm ignoe these objects. In this solution, this poblem has been ectified and all the bode objects have been consideed fo the clusteing opeation. To impove the pefomance of the algoithm, MEDBSCAN Algoithm [6] appoach has been applied. So thee ae two types of Regionquey functions have been intoduced in this algoithm namely, LongRegionQuey and ShotRegionQuey. Fist LongRegionQuey function will be called to get the egion objects pesent in Eps neighbous as well as *Eps neighbous suounded by the given object, the Eps distance neighbous fom the cente object will be stoed in the InneRegion queue and the objects which ae geate than Eps and less than o equal to *Eps distance fom the cente objects will be stoed in OuteRegion queue espectively. Late the selected seed objects pesent in the Eps neighbou egion will be pocessed using the ShotRegionQuey function call. So the ShotRegionQuey function call will be always faste than the LongRegionQuey function as it needs to pocess only few objects which ae pesent in the InneRegion as well as OuteRegion and no need to pocess the entie objects pesent in the data set. Anothe change in the poposed solution to impove the speed is modification of queue stuctue. i.e InneRegion and OuteRegion queues ae the combination of fou sub queues. RegionQueue { TopRightQueue; RightBottomQueue; BottomLeftQueue; LeftTopQueue; } Figue 6 RegionQueue s stoage aea classification. This type of sepaation helps to minimize the unwanted distance computation while pocessing the bode objects. i.e. while pocessing OuteRegion queue s unpocessed objects, we can conside only the adjacent potion of the InneRegion queue s objects and othe non adjacent potions objects can be ignoed. This concept has been explained as follows. 4. Neighbou computation Ignoe Case Let O is an Oute Cicle with adius and I is an inne cicle with adius. Both of these cicles ae shaing the same Cente point C and these two cicles ae equally divided into fou pats as shown in the below pictue (to pefom the RegionQuey opeation). Figue 7 Inne and Oute Region. Hee the inne cicle objects neighbou objects ae pesent in the oute cicle s maked aea (with bown colou) and the inne cicle itself. Now we can confim that any object pesent in the inne cicle s any one of the quate aea (I 1 OR I OR I 3 OR I 4 ), will have its neighbou(s) in the 3 of the adjacent quate pat of the oute cicle and the inne cicle itself (fou quate pats). Thus we can ignoe the oute cicle s non adjacent quate pat fom the unnecessay computation. (e.g) 3

Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 Figue 8 Availability of Neighbou in the Cicle s Potion. In the above diagam Inne Cicle s I 3 quate potion has been consideed fo the neighbou computation. The object pesent in the I 3 quate potion will have its neighbous in O 4, O 3, O and the Inne cicle itself (I 1, I, I 3 and I 4 ), but not in O 1 potion. i.e. Maximum distance is the valid distance fo neighbou computation and I 3 s object equie minimum +1 distance to each anothe object which is pesent in the O 1 potion and this condition is not possible (Invalid condition has been shown in the diagam in ed colou). Similaly while pocessing the bode objects pesent in the OuteRegion, only the adjacent quate potion of inne egion objects ae enough fo the computation to know whethe it is density eachable to any of the objects pesent in the InneRegion. This is anothe optimization done in the new algoithm to speed up the computation as well as impove the accuacy of output. In the FDBSCAN algoithm, chances of missing the coe objects as well as bode objects ae applicable and in this new appoach all the bode objects have been coveed. Also it is poved that the coe objects loss is vey ae case and the new solution is bette in most of the cases in the eal time scenaio. 4.3 Algoithm 1. Read D, Eps and MinObjs.. Initialize all objects Cluste ID field as UNCLASSIFIED. 3. Fo each UNCLASSIFIED object o D 4. Call LongRegionQuey function with D, Eps and o paametes to be obtain InneRegion and OuteRegion. 5. IF o is a coe object Then 6. Get the ClusteID fo the new Cluste. 7. Select fou UNCLASSIFIED objects fom the InnteRegion TopRight, RightBottom, BottomLeft and LeftTop Queues each fo the futhe cluste expansion and push the selected objects to FouQueue. The selected objects should have the max distance fom the cente object o. 8. Assign ClusteID to all the UNCLASSIFIED and NOISE type pesent in the InnteRegion. 9. Fo each object T FouQueue 10. Call ShotRegionQuey function with InneRegion, OuteRegion, Eps and Object T to obtain the ShotRegion. 11. Select fou UNCLASSIFIED objects fom the ShotRegion TopRight, RightBottom, BottomLeft and LeftTop Queues fo the futhe cluste expansion. The selected should have the max distance fom the cente object T. Push the selected objects to SeedQueue fo the futhe pocessing. 1. Assign ClusteID to all the UNCLASSIFIED and NOISE type pesent in the ShotRegion. 13. End Fo 14. Remove the clusteed objects fom the OuteRegion and pocess the emaining (UNCLASSIFIED and NOISE type) to know if any one of the InneRegion neighbou pesent in the UNCLASSIFIED and NOISE type OuteRegion. i.e if any emaining objects pesent in the OuteRegion is density eachable fom the cente object o s neighbou, assign ClusteID to the Object. 15. Pop the objects s fom SeedQueue, Repeat the steps fom 4-14 and until the SeedQueue is Empty. Fo all the above steps eplace the object o with SeedQueue Object s wheeve it is applicable. 16. Else 17. Mak o as NOISE 18. End If 19. End Fo This algoithm ead the same input as like oiginal DBSCAN and all the objects ae initialized as UNCLASSIFIED in the beginning. Aftewads all the UNCLASSIFIED objects ae pocessed one by one. So the algoithm stats with LongRegionQuey function call to obtain the Neighbou objects (InneRegionobjects and OuteRegion) and the cluste expansion will happen only if the cuent object is a coe object, othewise the cuent object will be maket as NOISE. Duing the cluste expansion, the new Cluste ID will get ceated and fou UNCLASSIFIED objects ae selected fom the InneRegionobjects fou queues each and these objects should have the maximum distance fom the cente object. Afte assigning the Cluste ID to all the pesent in the InneRegion queue, the selected fou objects will be pocessed. Hee the fou objects ae the maximum count and if thee is no UNCLASSIFIED object pesent in one o moe specific queues, the selected objects count will be less than 4. Fo pocessing these objects, ShotRegionQuey has been used and each ShoRegionQuey opeation, maximum fou seed objects will be selected which meets the above condition and pushed into seed queue fo the futhe cluste expansion. The ShotRegionQuey takes the etun aay objects of LongRegionQuey function and will not pocess the whole Data set in the subsequent iteation. Thus the pefomance impovement has been guaanteed when the Eps value is easonably insensitive. The Cluste ID will be assigned to the ShotRegionQuey s output objects if the object is eithe UNCLASSIFIED o NOISE. Now the emaining UNCLASSIFIED o NOISE type objects pesent in the OuteRegion queue is pocessed and which uses the Neighbou computation Ignoe Case computation appoach to minimize the computation and speed up the pefomance. Afte epeating these steps as mentioned in the algoithm and when the SeedQueue become empty, the cuent cluste expansion will stop and the contol moves to pocess the next object UNCLASSIFIED type object using the paent fo loop. The whole clusteing pocess will be ove once the main loop visits the entie N objects pesent in the data set. 4

Numbe of Running time loss Running time loss Running time loss Intenational Jounal of Compute Applications (0975 8887) Volume 6 No.9, Septembe 010 5. PERFORMANCE ANALYSIS The basic DBSCAN, Fast DBSCAN and poposed Optimized DBSCAM algoithms ae implemented in Visual C++ (008) on Windows Vista OS and tested using two dimensional Dataset. To know the eal pefomance diffeence achieved in the new algoithm, we haven t used any additional data stuctues (like spatial tee) to impove the pefomance. These algoithms ae tested using two dimensional synthetic dataset and the pefomance esults ae shown below. Table 1 Running time of Algoithms in Seconds DBSCAN FDBSCAN ODBSCAN 300 0.096 0 0.078 3 0.064 0 500 0.74 0 0.185 11 0.18 1 700 0.483 0 0.56 6 0.177 3 100 1.04 0 0.581 34 0.345 7 500 4.850 0 1.01 77 0.66 13 Above table shows that the new algoithm s pefomance is bette to the existing algoithms in tems of computation time and the new algoithm has small numbe of object loss than the Fast DBSCAN algoithm. 6. CONCLUSION AND FUTURE WORK In this pape we have poposed ODBSCAN algoithm to impove the pefomance with less amount of object loss. In this new algoithm FDBSCAN and MEDBSCAN algoithms appoach has been used to impove the pefomance. Also some new techniques have been intoduced to minimize the distance computation duing the RegionQuey function call. Eventually the pefomance analysis and the output shows that the newly poposed ODBSCAN algoithm gives bette output, at the same time with good pefomance. In this algoithm, all the bode objects have been consideed fo the clusteing pocess. But thee ae few possibilities to miss the coe objects and which causes some loss of objects. Though the new algoithm gives bette esult than the pevious FDBSCAN algoithm, this poblem needs to be esolved in the futhe wok to give the accuate esult with same pefomance. 7. REFERENCES [1] Este M., Kiegel H.-P., Sande J., and Xu X. (1996) A Density-Based Algoithm fo Discoveing Clustes in Lage Spatial Databases with Noise In Poceedings of the nd Intenational Confeence on Knowledge Discovey and Data Mining (KDD 96), Potland: Oegon, pp. 6-31 [] J. Han and M. Kambe, Data Mining Concepts and Techniques. Mogan Kaufman, 006. [3] G. Kaypis, E. H. Han, and V. Kuma, CHAMELEON: A hieachical clusteing algoithm using dynamic modeling, Compute, vol. 3, no. 8, pp. 68 75, 1999. [4] M. Ankest, M. Beunig, H. P. Kiegel, and J. Sande, OPTICS: Odeing to Identify the Clusteing Stuctue, Poc. ACM SIGMOD, in Intenational Confeence on Management of Data, 1999, pp. 49 60. [5] A. Hinnebug and D. Keim, An efficient appoach to clusteing in lage multimedia data sets with noise, in 4th Intenational Confeence on Knowledge Discovey and Data Mining, 1998, pp. 58 65. [6] SHOU Shui-geng, ZHOU Ao-ying JIN Wen, FAN Ye and QIAN Wei-ning.(000) "A Fast DBSCAN Algoithm" Jounal of Softwae: 735-744. [7] Li Jian; Yu Wei; Yan Bao-Ping;, "Memoy effect in DBSCAN algoithm," Compute Science & Education, 009. ICCSE '09. 4th Intenational Confeence on, vol., no., pp.31-36, 5-8 July 009. AUTHOR PROFILES J. Hencil Pete is Reseach Schola, St. Xavie s College (Autonomous), Palayamkottai, Tiunelveli, India. He eaned his MCA (Maste of Compute Applications) degee fom Manonmaniam Sundaana Univesity, Tiunelveli. Now he is doing Ph.D in Compute Applications and Mathematics (Intedisciplinay) at Manonmaniam Sundana Univesity, Tiunelveli. His inteested eseach aea is algoithms inventions in data mining. D. A. Antonysamy is Pincipal of St. Xavie s College, Kathmandu, Nepal. He completed his Ph.D in Mathematics fo the eseach on An algoithmic study of some classes of intesection gaphs. He has guided and guiding many eseach students in Compute Science and Mathematics. He has published many eseach papes in national and intenational jounals. He has oganized Seminas and Confeences in state and national level. 5