Distributed Weighted Fuzzy C-Means Clustering Method with Encoding Based Search Structure for Large Multidimensional Dynamic Indexes

Size: px
Start display at page:

Download "Distributed Weighted Fuzzy C-Means Clustering Method with Encoding Based Search Structure for Large Multidimensional Dynamic Indexes"

Transcription

1 World Engineering & Applied Sciences Journal 8 (2): 57-65, 2017 ISSN IDOSI Publications, 2017 DOI: /idosi.weasj Distributed Weighted Fuzzy C-Means Clustering Meod wi Encoding Based Search Structure for Large Multidimensional Dynamic Indexes Sunil Sunny Chalakkal, M. Rajalakshmi and R. Vijayakumar 1 Department of Computer Science, St Thomas College, Trichur, Kerala, India 2 Department of CSE / IT, Coimbatore Institute of Technology, Coimbatore, India 3 School of Computer Science, M.G.University, Kottayam, India Abstract: Searching of data in huge dimensional space is a major issue now days. In is paper different index structure and improved cluster Tree++ structures are used to speed up e search process. Wi e help of high dimensional spaces wi Distributed Weighted Fuzzy C-Means (DWFCM) clustering algorim. All data points are grouped into clusters rough a DWFCM clustering algorim. Dual distance Driven encoding (DDE) approach is used to acquire e uniform ID number of every data point. Where every cluster is partitioned in to two based on e centroid distance. Finally, a uniform index key is obtained by integrating e unique ID number and centroid displacement. In addition, encoding based indexing approach is added to e improved Cluster Tree++ for distributed multidimensional data point. Evaluation on e effectiveness and efficiency of e proposed scheme is done rough effective assessments, correspondingly e results states at is meod is much better an e existing algorims of Distributed Weighted Possibility-Means (DWPCM) and Weighted Possiblistic C-Means (WPCM). Key words: Clustering Indexing Encoding Fuzzy c-means Index key Dual-distance-Driven B+-tree Cluster Tree++ distributed Multi-dimensional data sets INTRODUCTION Latest review states, large volumes of data wi high dimensionality are generated in different fields. A lot of meods have been proposed for effectively indexing high-dimensional datasets for organized querying. When dimensionality scales up higher, e performance reduces wi e present meods which can support nearest neighbor search for low dimensional datasets. Along wi it, e process of adding new data can lead original structures to no longer maintain e dataset effectively, since it might considerably increase e amount of data accessed for a particular query. Amongst all ese meods, Cluster Tree [1] is e startup work towards e direction of creating an efficient index structure from clustering for high dimensional datasets. An analysis scheme for finding out e interesting data distributions and patterns in e underlying dataset is termed as clustering. Given a set of ndata points in a d dimensional space, e clustering meod provides e data points to groups in relation to e computation of e degree of similarity among data points in a group are extremely similar to each oer compared to e data points in different groups. Each and every group is a cluster. In order to help well-organized queries e Cluster Tree scheme constructs an index structure on e cluster structures. At e present time, a huge part of e data set is time associated; in addition to it availability of e outdated or unnecessary data in e data set shall critically degrade e dataset processing. On e oer hand, few schemes are formulated in order to device e management of e complete data set to keep away from e outdated data and maintain e dataset constantly in a better and updated position for e easiness and effectiveness of dataset process such as query and insertion. Wiout having to linearly search for e high-dimensional dataset The Cluster Tree can support e retrieval of e Corresponding Auor: Sunil Sunny Chalakkal, Department of Computer Science, St Thomas College, Trichur, Kerala, India. 57

2 neighbors effectively. The key aim is to reduce e dimensional point data structures and its variants are: The response time to e entire users query. It is e first hb-tree [6], e BD-Tree [7], e hybrid tree [8] and e scheme at uses e Cluster Tree for e purpose of quad tree. generating effective index Structure for high dimensional The binary search tree model includes The K-D Tree datasets. which initiates a recursive subdivision of e data spaceto In is paper, a symmetrical encoding meod is partitions rough (d-1)-dimensional hyper planes.the created by double partitioning. To deliver a key division most common problem to all e K-D schemes is at for of e uniform index key e uniform ID number of each some distributions hyper planes at partitions e data point can be gained. In order to facilitate highly efficient objects equally are not found. Using e is-oriented hyper DWFCM search a symmetrical encoding based Improved planes e K-D-Tree, e quad-tree [9] decomposes e cluster Tree++ (EIC-Tree++) is proposed. Using a universe. A major difference is at e quad-trees are not DWFCM clustering algorim all data are partitioned in to binary trees.until e amount of objects in every partition different clusters. Where all cluster spheres is segmented is under a given reshold e subspaces are e twice in relation wi e two distances of start and subspaces are disintegrated. Quad trees are not stabilized centroid distance. and all e sub trees of populated trees need to deeper Section 2 reviews e related work cluster tree and an sparingly populated areas leading a bad worst case index structure designs and clustering meods. Section behavior [10]. 3 describes innovative symmetrical encoding-based index Pyramid Technique [11] is e transformation based structure wi clustering approach suitable to distributed high dimensional indexing meod. Works better incase of multidimensional datasets. Section 4 focuses on e window based queries, here NB-tree [12, 13] and idistance processing of Cluster Tree results and section 5 presents [14] are used to manage e search process. For example e conclusion and future research progress. NB-Tree is a kind of single reference point based meod. A B+-tree is used for indexing e distances on Related Work: Different types of schemes are being which all e operations are carried out. framed in order to manage effectively wi The drawback of NB-Tree is at its inability to multidimensional data. In particular, Space Filling Curve reduce e region of search, when e volume of data (SFC) schemes take part an important role. Space Filling increase as well as e dimensionality shoot-up it will Curve (SFC) meods include Z-order curve [2], Hilbert directly affect e searching process, e efficiency also Curve [3] and gray codes [4] where data is divided onto decrees. idistance [14] is framed by selecting certain multidimensional pixels in relation wi e bottom reference points wi e intention of furer pruning. granularity and uses a curve at passes rough all e To increase e efficiency we are using e M-Tree [10], pixels in multidimensional space. An order of pixels in Omni-family [12] and empirically [14]. The efficiency of space is generated by is curve. This meod (ordering) data searching is mainly depends on e portion permits usage existing one dimensional access structures. mechanism, segmenting and clustering. For example, The B++Trees. Data representation on a part of e curve is enabled by e leaf pages of e access Proposed Clustering Based Indexing Structure: In is structure, generating a primary index where close by data section, e proposed clustering based indexed structure are clustered wi a high probability. The key drawback of is explained.the purpose of clustering e similar data e SFC scheme is at e CPU utilization rate is very points is done by e Distributed Weighted Fuzzy high and ey have issues such as high overlap among C-Means (DWFCM) algorim which includes encoding pages and e query interval. based indexing scheme employed for serving search For multidimensional data The UB-Tree [5] integrates query results are detailed. a space filling curve and a B+-Tree generates e primary The major concept of Clustering-based indexing index. Each leaf node indicates a block of data on a part of structures initially make use of clustering schemes wi e curve since it s a kind of paginated index. It divides e idea of clustering data points and make use of e space into linear segments of a Z Curve(or any estimation during search phase in such a way at e SFC).The problem of e UB-Tree is at it requires search shall be completed in e obtained clusters which amendments to e DBMS kernel for e purpose of likely includes e nearest neighbor of e query point. integration and like oer SFC s e partitions are not The architecture proposed for clustering based naturally not hyper cubic and ey may even represent structures is shown in e below figure.1 which includes disjoint space. The K-D-Tree is e most famous d- two phases e clustering phase and e search phase. 58

3 The proposed architecture of e clustering-based structures is shown in e following Fig. 1 Clusteringbased indexing structures includes two phases: clustering and search phase. Clustering Phase Data Distributed Multi dimensional Database Distributed weighted fuzzy c-means Clustering algorim (DWFCM) User Query World Eng. & Appl. Sci. J., 8 (2): 57-65, 2017 Clustered Result Search Phase (Indexed rough eir Centroid and indexing structure done using Tree approach) Cluster Tree++ Indexing approach wi encoding scheme Query based Result Weighted Point Calculation: The FCM is defined as; Calculation can be done using v i Presents e i cluster centroid and n presents e amount of examples and c indicates e volume of 2 clusters. d ik (x k, v i) = x k v i Presents e rule.at is stage e Euclidean distance meod is used. A weighted FCM algorim is taken into account which takes e weight of examples, clusters data into c partitions. Using Fig. 1: Overall architecture diagram of proposed system n d examples examples which are loaded in memory in e d as partial data access (PDA). Clustering Phase: In is section, a Distributed WFCM Case 1: d=1: When e value of d is equitable to one, algorim (DWFCM) is formulated in accordance wi in e first PDA, in is situation, As FCM WFCM is Map Reduce in is section. Based on e steps of identical. Successively after clustering, consider vi DWFCM, ere are two most important operations: indicate e cluster centroids, where 1 i c. Considering computing e degree of membership µ ij and computing U ij shows e membership values, where 1 i c and e clustering centers v i. In e map phase, e Map 1 i n d. Consider w shows e weights of e points in function is intended to compute e degree of membership e memory. In such a situation, all e entire n d points µ ij. have weight 1 since weighted points from previous PDC Loading a huge amount of data in to memory makes do not exist.at is point of time e memory is let free some problems as same as in [15]. In e case only a rough reducing e clustering result in to c weighted particular amount of data is loaded in to memory. points, which are indicated by e c cluster centroids v i, Suppose we are loading 1% of data into memory, we have where 1 i c and eir weights are shown below: to search 100 times scan to complete e entire search process. This meod is called partial data access (PDA). (3) The volume of data loaded for a given time will decide e amount of partial data access. In is meod subsequent to e first PDA, Using Fuzzy C Means (FCM) data is (4) grouped in to different clusters and reduced its weighted points. The additional mechanism we are going to Weights of e c points, after e process of implement is PDA. PDA is directly connected wi reduction e clustering results, in memory is shown weighted points. The major difference between e crisp below: followinge process of condensing e clustering [15] scenario and is meod is e absence of clusteringresults, in memory is as given below: fuzzy membership. For every PDA, new singleton points are added into memory. This process will continue till e (5) data is searched ones. The function of FCM is to accommodate weights. To get accurate clustering Result d It is observed at when n, In all e succeeding each data set will be closely connecting wi centroids. PDA (d > 1) new singleton points are added. Their indices This process of knowledge transmission permits for faster related wi w shall begin at c + 1 and finally end at n d + merging. c i.e. (1) (2) 59

4 For each n we are assigning a value of one. d World Eng. & Appl. Sci. J., 8 (2): 57-65, 2017 (6) Case 2: d > 1: In is case, On singleton points clustering will be applied by newly loading in e d PDA along wi (12) cweighted points following e condensation from (d 1) PDC. Resulting in n d + c points in e memory for For e purpose of e encoding scheme data point clustering by means of WFCM. The derived new nd clustering results are taken. In is meod, uniform, ID singleton points have weight of one.after e clustering number of all data point is taken wi e help of a process, e data in e memory (togeersingletons encoding mechanism. wi e help of unique ID and and weighted points) is reduced to c new weighted distance from e centroid, we create unique index key points. All e new weighted points are indicated rough B+tree. e c cluster v i, where 1 i c and eir weights are formulated as below: Symmetrical Dual Distance Encoding Scheme: Many observations has led to e initiation on design of improved cluster tree++. First we will measure e (7) unlikeness of e data points in association wi eire distance. Using e value we can create B+tree structure. To formulate e improved version of cluster tree++ we Subsequently e memory is going to free, we can can combine two distance metrics. Through e use of a revise wi is equation: novel symmetrical encoding approach for e purpose of getting a uniform index key expression, to furer diminish (8) e search region. Distance function, shown as d(v i, V j) starting Weighted Fuzzy C Mean: To take into consideration of distance is e distance between it and V o(0,0,..0), e weighted points e main objective of FCM is naturally given as; modified [16]. Each clustering group should have e c weighted point to accomplish perfect PDC. The closed SD(V i) = d(v i, V o) centroids and e WFCM calculated as follows: It is observed at n data points are clustered into T clusters, e O j of each and every cluster C j is derived in (9) e beginning, where j [1,T]. For calculating centroid distance, V i is e data point and distance from e centriod is O j. CD(V ) = d(v, O ) i i j where i [1, C ] and j [1, T]. (10) Dual distance encoding scheme DDE is used to get Wi e focus of reviewing cluster wi e objective a uniform index key by means of double slicing (t) (t) of cluster centers in parallel, two parameters, i and i is mechanism. Wi e assistance of a DWFCM clustering introduced, where t shows e serial number of data node. algorim, e start and centroid distances of every data Following After e process of calculating e point is computed, as a result V i can be designed as a four (t) (t) membership µ ij, Map function calculates i and i by tuple: using equation (11) and (12); V i :: = < i, CID, SD, CD > Here i indicates to e i data pointand CID shows (11) e cluster id. 60

5 Start Slice: Fora given cluster sphere (O, j CR) j e start slice of it is shown as SS(, j) centroid slice is a cluster sphere (O j, CR j) where µ [1, ] here, V :: = < i, CID, SD_ID, CD_ID > i In e given equation i is representing e ID number Vi, CID is e cluster ID, SD-ID means e starting slice ID number and CD-ID is e centoid distance ID number. From at we can form e equation given below; when shows an even integer and /2+1=. This results, a uniform ID number of V which can be indicated as linearly integrating esd_id and CD_ID as shown i below: UID(V ) = c SD_ID(V ) + CD_ID(V ) i i i In relation to e encoding scheme, e uniform encoding identifier of each data point can be obtained rough Algorim 1. Algorim 1. Problem: Calculate e points DDE (1 to n) from e set, Starting cluster is, e centroid slice is. Steps: 1. Using DWFCM clustering algorim, e data points in e set are grouped into T number of clusters. 2. Set j=1 3. Repeat steps 4 rough 9 until j<=t 4. Repeat steps 5 rough 8 while V i (O j, CR j) 5. Compute e centroid distance and start distance of each point. 6. finding out e result of e Eq.(13) 7. construct e unique ID of V i(dde(v i)). 8. Take next V i 9. Increment j by one 8. End The data structure is given in relation wi e above algorim.wi e above definitions, obtain e index key of V as given below: i (13) (14) The above Eq.(14) shows e index key expression of V. On e oer side, in relation wi e symmetrical encoding i scheme, ere exists a chance of availability f two different data points (e.g., V i and V j) wiin e identical cluster sphere, where its index keys are equal. Bo e two data points must meet e condition at: (SD(O ) SD(V )) (SD(O ) j i j SD(V )) < 0. j 61

6 In order to keep away from e overlapping of e index we using e equation below: (15) Finally, n index keys obtained by means of Eq. (15) is inserted into a partitioned B+- tree, which will be discussed in e search phase. L field of e entries in e leaf node of improved Cluster Tree++ must be pointed to e created leaf node in simple prefix B+ tree. Search Phase: In is section, e hierarchical structure Algorim 2: Improved clustertree++ Index Construction of e improved cluster++ is presented. Then a scheme is Problem: Given e data point set, e algorim presented for breaking up a cluster inosubclusters and e constructs e index bt(1 to T) for Improved clustertree++ scheme for e purpose generating a clustertree++ by means of decomposing a cluster into subclusters and a Steps: meod for e purpose of building a ClusterTree++ by 1. Using DWFCM clustering algorim, e data points emans of decomposing clusters repeatedly. The enhanced in e set µ are grouped into T number of clusters. Cluster Tree++ is used to encode for e purpose of 2. Set j = 1 reducing e dead space and searching time. The key 3. Repeat steps 4 rough 11 until j<=t advantage of is scheme is at e loading process goes 4. Create index header file, bt(j) new tree File() rapidly because e output can be given sequentially and 5. Repeat steps 6 rough 9 for each data point Vi in e it makes only one pass over e data and no blocks are j- cluster required to be rearranged. Benefits in e tree are loaded 6. In is step we are calculating e centroid distance. and en e blocks are 100% complete. Sequential 7. Using algorim 1 obtain e ID number of Vi. loading generates a degree of spatial locality inside e 8. Set Key (V i) = TransValue(V i) file==>. 9. Insert it to B+-tree using e function Binsert Seeking can be reduced. The key ree conditions (Key(V i), bt(j)) are: When blocks are split in e sequence set, a new 10. Return bt(j) separator has to be inserted into e index set. Then when 11. Increment j by one blocks are combined in e sequence set, a separator has 12. End to be eliminated from e index set. When records are re-distributed among blocks in e sequence set, e value Processing of e ClusterTree: Insertion, Query and of separator in e index set has to be changed. Deletion are e ree main process in e ClusterTree++: Construction: The building of improved Cluster Tree++is based on e generation of Cluster Tree++ and e construction of simple prefix B+ tree in parallel wi encoding. Using a hierarchical structure and T sliced indexes. During e construction of ClusterTree++ all internal node and leaf node must set e ct (current time) and ht (historic time) as e current time as similar to e current time. Since e insertion time of e original data points are not known in e dataset, resulting in it is necessary to set e insertion times of e entire original data points as e current one. When e new data points are inserted into e structure, different times can be recorded into e structure. During e meantime, a leaf node is developed in simple prefix B+tree which includes e current time data. The entire L field of e entries in e leaf node of improved clustertree ++ must be pointed to e created leaf node in simple prefix+ tree. The entire Insertion: For new incoming data point, its classified into one ree categories: Clusterpoints: In a cluster inside a given reshold extremely near to certain data pointsclose by points: The data points which are neighbors to certain points in e clusters inside a specific reshold. Random points: are e data points which are far from all e clusters and cannot be bounded rough any bounding sphere of e cluster Tree, or shall be included in e bounding spheres of e clustering during every level, but ey do not have any nearest cluster points inside a particular reshold.as a result in agreement wi e category of e new coming data point, e insertion algorim of Cluster Tree ++ shall be applied repeatedly to put in data point to a specific leaf node of cluster Tree++along wi e insertion time in e T(time) field of e new entry in e leaf node, at e same time including a new entry which includes e insertion time details into 62

7 e simple prefix B+tree. Subsequently, point e L(link) field of e new entry in e certain leaf node in ClusterTree++ to e new entry in e leaf node of simple prefix B+ tree and point e L(link) field of e new entry in e particular leaf node in simple prefix B+ tree to e new entry in e leaf node of ClusterTree++. Query: The time-irrelevant queries and time-associated queries are e two categories of queries. The timeirrelevant queries comprise e range queries and p Nearest Neighbor queries. The time-associated queries comprise certain time period condition.for instance, certain clients possibly will request e nearby clients to a particular data point at are added into e structure, approximately e same time of at data point insertion. The original Cluster Tree query algorim can be employed to resolve e previous category of queries. The second category of queries is solved by using e algorim time-related-queries. Deletion: Wiin several systems e superseded or unnecessary data must be removed once in a while. Managers might desire to remove e dataset at are included, during e past, or ey desire to remove ose data inserted at some stage in a certain period. We can use time -related-deletion algorim to solve is problem. In addition, ey can just point out to e data system to automatically regulate itself. The ClusterTree+ can support such user specified deletions. Using e recursive automatic adjustment algorim we can make new ClusterTree. EXPERIMENTAL RESULTS AND DISCUSSION Here, distributed multi dimensional dataset is taken as input. The real-world datasets are utilized for carrying out e tests: CAR comprises 2, 249, 727 road segments of California obtained from Tiger/Line datasets; (b) HYD comprises 40, 995, 718 line segments symbolizing China rivers and (c) TLK includes up to 157, 425, 887 points obtained from e China s elevation data. Memory Usage: Given at DWFCM process data as amount of chunks calculated as e memory utilization of every chunk independently and get e largest value as e closing memory consumption for DWFCM. In view of e fact at e dataset is streaming in character, it is not necessary for DWFCM to access over one chunk at a time. Figure 2 shows e percentage of improvement in terms of memory consumption by proposed (DWFCM) as compared against e Baseline Algorim (DWPCM) and (WPCM). Fig. 2: Memory Usage Comparison It is clear from Table 1, e improvement is in excess of 97% for e entire ree datasets. WPCM makes use of e complete dataset at a time and at's why it needs adequate memory to hold e entire dataset. This is e cause why WPCM needs much higher memory an e proposed algorim. Table 1: Memory Usage Comparison DWPCM wi Input Dataset DWFCM (%) Map reduce (%) WPCM (%) CAR HYD TLK Selectivity: The time and e quantity of page writes for e purpose of bounding boxes insertion by means of several data chunks are illustrated in Figure 3. Since, only a predetermined size dataset undergoes partitioning, e size of e data chunk reduces, ereby increasing e amount of leaf nodes in e indexing trees. If e data chunk size is (or ), approximately 40, 000 data chunks are inserted into e indexing trees, however when e data chunk size is , approximately 560, 000 data chunks can be inserted. As per fact, file pages accession reduces e response time to a definite query and increase e selectivity. Hence multi-dimensional indexing structures should supply low amount during e explore for accessing a file. It is clear from e table 2 at e proposed indexing scheme of DWFCM wi cluster tree++ encoding is perform well an e DWPCM wi cluster tree++ and WPCM wi cluster tree+. 63

8 Fig. 3: Selectivity Comparison Table 2: Selectivity Comparison Number of Data Chunks DWFCM wi Cluster++ encoding DWPCM wi Cluster++2 WPCM wi Cluster Scalability wi Query: Figure 4 demonstrates e query experiment result. It is obvious from e results at Cluster Tree+ can resolve e multi-dimensional query inefficient setback; however Cluster Tree++ encoding execute much better an Cluster Tree++. Fig. 4: Scalability Comparison Table 3 shows e values of scalability of e meods. It proves at e multi-dimensional distributed index scales almost linearly wi e number of nodes in e system. Table 3: Scalability Comparison Number of Data Chunks DWFCM wi Cluster++ encoding DWPCM wi Cluster++ WPCM wi Cluster Table 3 shows e values of scalability of e meods. Wi e number of nodes in e system, e multidimensional distributed index confirms e rank wi approximate linear. 64

9 CONCLUSION 5. Berchtold, S. and C. Bohm, Implementation of multidimensional index structures for knowledge This paper presents a novel high-dimensional discovery in relational databases-international indexing in e midst of clustering scheme. This is conference on Data Warehousing (). phrased as symmetrical Encoding-based improved 6. Lomet, D. and B. Salzberg, The Hb-Tree: A cluster-tree++ (EIC-Tree++). To build an EIC-Tree++, robust multiattribute search structure Proceedings process carries out various steps. In e first step, wi in IEEE international conference on Data engineering e support of a DWFCM algorim, e complete (n) data (1989) Y. Ohsawa, M.S.: Bd-tree: A new n- points are clustered into T clusters. In e second step, dimensional data structure wi efficient dynamic rough an undersized encoding effort, e uniform ID characteristics. Proceedings of e Nin World number of each point is gained, where every cluster Computer Congress, IFIP, pp: sphere is segregated two times in accordance wi e 7. Chakrabarti, K., The Hybrid tree-an index dual distances of start- and centroid-distance. Through structure for high dimensional feature spacese merging process of uique ID and centroid distanc Proceedings of e fifteen international conference Finally we getting e unique index key. The uniform index on Data Engineering, pp: keys are rough a partitioned B+-tree. Consequently, 8. Samet, H., The quadtree and related an advanced encoded ClusterTree++ will be capable of hierarchical data structures. ACM Computing unvaryingly maintain e most renewed status of e Surveys 16(2): dataset in order to sustain e competence and efficiency 9. Ciaccia, P., M. Patella and P. Zezula, M-treesof data manipulations like insertion, modification and Efficent access meod for similarity search in metric deletion. An advanced encoded ClusterTree++of rd space. Proceedings of e 23 VLDB Conference, hierarchy of clusters and sub clusters at integrates e pp: representation of clusters into e index structure in order 10. Berchtold, S., C. Bohm and C. Bohm, The to accomplish e retrieval effectually and proficiently. pyramid technique: towards breaking e curse of The experimental outcomes reveal at e proposed dimensionaliy. Proceedings of SIGMOD conference. meod do one better an e existing Distributed 11. Filho, R., A. Traina and C. Faloutsos, Similarity Weighted Possibilistic C-Means (DWPCM) Algorims search wiout tears-the omni family of all purpose and Weighted Possibilistic C-Means (WPCM) access meods.in proceedings of ICDE Conference, Algorims. Future research must be concentrated on e pp: storage and also must enhance e performance of e 12. Fonseca, M.J. and J.A. George, Indexing Hightree structure. Dimensional data for Content Based retrieval in large databases.in proceedings of e DASSFA REFERENCES Conference, Kyoto, Japan, pp: Jagadish, H.V., B.C. Ooi and K.L. Tani, Yu, Dantong and Aidong Zhang, Integration of Distance: An adaptive B+-Tree based Indexing Cluster representation and Nearest Neighbor search Meod for Nearest Neighbor search, ACM for Image Databases. IEEE international Conference Transactions on Database Systems, pp: on Multimedia, New York. 14. Fredrik Farnstorm, Jameslewis and Charles Elkan, 2. Orenstein and J.A. Merrett, A class of data Scalability for Clustering algorims revisisted, structures for associative searching, Proceedings of ACM SIGKDD Explorations, 2: rd 3 ACM symposium on principles of database 15. Steven Eschrich, Jingweike Lawrence, O. Hall and systems, New York. B. Dmitry, Fast accurate Fuzzy Clustering 3. Faloutsos, C. and S. Roseman, Fractals for secondary rough data reduction, IEEE Transactions on Fuzzy Key retrieval in e proceedings of e 8 ACM Systems, 11: SIGACT-SIGMOD symposium on principles of data base. 4. Faloutsos, C., Mutihashing using gray codes, SIGMOD 86 Proceedings of e 1986 ACM international conference on Management of Data, NewYork, ACM Press, pp:

An encoding-based dual distance tree high-dimensional index

An encoding-based dual distance tree high-dimensional index Science in China Series F: Information Sciences 2008 SCIENCE IN CHINA PRESS Springer www.scichina.com info.scichina.com www.springerlink.com An encoding-based dual distance tree high-dimensional index

More information

Indexing High-Dimensional Data in Dual Distance Spaces: A Symmetrical Encoding Approach

Indexing High-Dimensional Data in Dual Distance Spaces: A Symmetrical Encoding Approach Indexing High-Dimensional Data in Dual Distance Spaces: A Symmetrical Encoding Approach Yi Zhuang 1 Yueting Zhuang 1 Qing Li Lei Chen 3 Yi Yu 4 1 College of Computer Science, Zheiang University, P.R.China

More information

Clustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V.

Clustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V. Clustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V.Krishnaiah *3 #1 M.Tech, Computer Science Engineering, DRKIST, Hyderabad, Andhra

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is

More information

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree X-tree Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d a Department of Computer and Information Science, University of Konstanz b Department of Computer Science, University

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

1st Quarter (41 Days)

1st Quarter (41 Days) DUA Nor 4 Grade Ma Scope & Sequence 1st Quarter (41 Days) st 1 : Aug 8-10 (3 days) nd 2 : Aug 13-17 Reporting Categories ( TEKS SEs) Welcome Survey- Getting to know you Collect and Log supplies received

More information

DARUL ARQAM Scope & Sequence Mathematics Grade 5. Revised 05/31/ st Quarter (43 Days) Resources: 1 : Aug 9-11 (3 days)

DARUL ARQAM Scope & Sequence Mathematics Grade 5. Revised 05/31/ st Quarter (43 Days) Resources: 1 : Aug 9-11 (3 days) Maematics Grade 5 1st Quarter (43 Days) st 1 : Aug 9-11 (3 days) Orientation Topic 1: Place Value Lesson 1 Lesson 5 nd 2 : Aug 14-18 rd 3 : Aug 21-25 4 : Aug 28-30 (3 days) Topic 2: Adding and Subtracting

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems

A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems Kyu-Woong Lee Hun-Soon Lee, Mi-Young Lee, Myung-Joon Kim Abstract We address the problem of designing index structures that

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,

More information

Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data

Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data Evangelos Dellis, Bernhard Seeger, and Akrivi Vlachou Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße,

More information

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Using Natural Clusters Information to Build Fuzzy Indexing Structure Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Proceedings of the Twenty-Third Australasian Database Conference (ADC 2012)

Proceedings of the Twenty-Third Australasian Database Conference (ADC 2012) Indexing RFID data using the VG-curve Author Terry, Justin, Stantic, Bela, Sattar, Abdul Published 2012 Conference Title Proceedings of the Twenty-Third Australasian Database Conference (ADC 2012) Copyright

More information

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.

More information

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of

More information

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz

More information

Fast Similarity Search for High-Dimensional Dataset

Fast Similarity Search for High-Dimensional Dataset Fast Similarity Search for High-Dimensional Dataset Quan Wang and Suya You Computer Science Department University of Southern California {quanwang,suyay}@graphics.usc.edu Abstract This paper addresses

More information

International Journal of Computer Engineering and Applications, Volume VIII, Issue III, Part I, December 14

International Journal of Computer Engineering and Applications, Volume VIII, Issue III, Part I, December 14 International Journal of Computer Engineering and Applications, Volume VIII, Issue III, Part I, December 14 DESIGN OF AN EFFICIENT DATA ANALYSIS CLUSTERING ALGORITHM Dr. Dilbag Singh 1, Ms. Priyanka 2

More information

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM CHAPTER 4 K-MEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

So, we want to perform the following query:

So, we want to perform the following query: Abstract This paper has two parts. The first part presents the join indexes.it covers the most two join indexing, which are foreign column join index and multitable join index. The second part introduces

More information

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD)

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD) American-Eurasian Journal of Scientific Research 12 (5): 255-259, 2017 ISSN 1818-6785 IDOSI Publications, 2017 DOI: 10.5829/idosi.aejsr.2017.255.259 Rule-Based Method for Entity Resolution Using Optimized

More information

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser

Datenbanksysteme II: Multidimensional Index Structures 2. Ulf Leser Datenbanksysteme II: Multidimensional Index Structures 2 Ulf Leser Content of this Lecture Introduction Partitioned Hashing Grid Files kdb Trees kd Tree kdb Tree R Trees Example: Nearest neighbor image

More information

The Effects of Dimensionality Curse in High Dimensional knn Search

The Effects of Dimensionality Curse in High Dimensional knn Search The Effects of Dimensionality Curse in High Dimensional knn Search Nikolaos Kouiroukidis, Georgios Evangelidis Department of Applied Informatics University of Macedonia Thessaloniki, Greece Email: {kouiruki,

More information

Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index

Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index B. SATYA MOUNIKA 1, J. VENKATA KRISHNA 2 1 M-Tech Dept. of CSE SreeVahini Institute of Science and Technology TiruvuruAndhra

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

Creating Time-Varying Fuzzy Control Rules Based on Data Mining

Creating Time-Varying Fuzzy Control Rules Based on Data Mining Research Journal of Applied Sciences, Engineering and Technology 4(18): 3533-3538, 01 ISSN: 040-7467 Maxwell Scientific Organization, 01 Submitted: April 16, 01 Accepted: May 18, 01 Published: September

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

Foundations of Multidimensional and Metric Data Structures

Foundations of Multidimensional and Metric Data Structures Foundations of Multidimensional and Metric Data Structures Hanan Samet University of Maryland, College Park ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

Detecting Clusters and Outliers for Multidimensional

Detecting Clusters and Outliers for Multidimensional Kennesaw State University DigitalCommons@Kennesaw State University Faculty Publications 2008 Detecting Clusters and Outliers for Multidimensional Data Yong Shi Kennesaw State University, yshi5@kennesaw.edu

More information

Nearest Neighbors Classifiers

Nearest Neighbors Classifiers Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,

More information

Indexing High-Dimensional Data for. Content-Based Retrieval in Large Databases

Indexing High-Dimensional Data for. Content-Based Retrieval in Large Databases Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University

More information

Code Transformation of DF-Expression between Bintree and Quadtree

Code Transformation of DF-Expression between Bintree and Quadtree Code Transformation of DF-Expression between Bintree and Quadtree Chin-Chen Chang*, Chien-Fa Li*, and Yu-Chen Hu** *Department of Computer Science and Information Engineering, National Chung Cheng University

More information

Accelerating Ray Tracing

Accelerating Ray Tracing Accelerating Ray Tracing Ray Tracing Acceleration Techniques Faster Intersections Fewer Rays Generalized Rays Faster Ray-Object Intersections Object bounding volumes Efficient intersection routines Fewer

More information

Spatial Index Keyword Search in Multi- Dimensional Database

Spatial Index Keyword Search in Multi- Dimensional Database Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases

More information

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method. IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY IMPROVED ROUGH FUZZY POSSIBILISTIC C-MEANS (RFPCM) CLUSTERING ALGORITHM FOR MARKET DATA T.Buvana*, Dr.P.krishnakumari *Research

More information

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis 1 Storing data on disk The traditional storage hierarchy for DBMSs is: 1. main memory (primary storage) for data currently

More information

Fuzzy Ant Clustering by Centroid Positioning

Fuzzy Ant Clustering by Centroid Positioning Fuzzy Ant Clustering by Centroid Positioning Parag M. Kanade and Lawrence O. Hall Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract We

More information

Research Article Does an Arithmetic Coding Followed by Run-length Coding Enhance the Compression Ratio?

Research Article Does an Arithmetic Coding Followed by Run-length Coding Enhance the Compression Ratio? Research Journal of Applied Sciences, Engineering and Technology 10(7): 736-741, 2015 DOI:10.19026/rjaset.10.2425 ISSN: 2040-7459; e-issn: 2040-7467 2015 Maxwell Scientific Publication Corp. Submitted:

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET

ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET zk0 ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information

Research Article A Novel Steganalytic Algorithm based on III Level DWT with Energy as Feature

Research Article A Novel Steganalytic Algorithm based on III Level DWT with Energy as Feature Research Journal of Applied Sciences, Engineering and Technology 7(19): 4100-4105, 2014 DOI:10.19026/rjaset.7.773 ISSN: 2040-7459; e-issn: 2040-7467 2014 Maxwell Scientific Publication Corp. Submitted:

More information

Partition Based Perturbation for Privacy Preserving Distributed Data Mining

Partition Based Perturbation for Privacy Preserving Distributed Data Mining BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 2 Sofia 2017 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2017-0015 Partition Based Perturbation

More information

Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data

Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data PRABHJOT KAUR DR. A. K. SONI DR. ANJANA GOSAIN Department of IT, MSIT Department of Computers University School

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Image Compression Using BPD with De Based Multi- Level Thresholding

Image Compression Using BPD with De Based Multi- Level Thresholding International Journal of Innovative Research in Electronics and Communications (IJIREC) Volume 1, Issue 3, June 2014, PP 38-42 ISSN 2349-4042 (Print) & ISSN 2349-4050 (Online) www.arcjournals.org Image

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

idistance: An Adaptive B + -tree Based Indexing Method for Nearest Neighbor Search

idistance: An Adaptive B + -tree Based Indexing Method for Nearest Neighbor Search : An Adaptive B + -tree Based Indexing Method for Nearest Neighbor Search H.V. Jagadish University of Michigan Beng Chin Ooi National University of Singapore Kian-Lee Tan National University of Singapore

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Performance Based Study of Association Rule Algorithms On Voter DB

Performance Based Study of Association Rule Algorithms On Voter DB Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,

More information

Spatial Data Management

Spatial Data Management Spatial Data Management [R&G] Chapter 28 CS432 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Privacy Preservation Data Mining Using GSlicing Approach Mr. Ghanshyam P. Dhomse

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering

A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering Gurpreet Kaur M-Tech Student, Department of Computer Engineering, Yadawindra College of Engineering, Talwandi Sabo,

More information

3-Dimensional Object Modeling with Mesh Simplification Based Resolution Adjustment

3-Dimensional Object Modeling with Mesh Simplification Based Resolution Adjustment 3-Dimensional Object Modeling with Mesh Simplification Based Resolution Adjustment Özgür ULUCAY Sarp ERTÜRK University of Kocaeli Electronics & Communication Engineering Department 41040 Izmit, Kocaeli

More information

Spatial Data Management

Spatial Data Management Spatial Data Management Chapter 28 Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 10, October 2018, pp. 1322 1330, Article ID: IJCIET_09_10_132 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=9&itype=10

More information

Benchmarking Access Structures for High-Dimensional Multimedia Data

Benchmarking Access Structures for High-Dimensional Multimedia Data Benchmarking Access Structures for High-Dimensional Multimedia Data by Nathan G. Colossi and Mario A. Nascimento Technical Report TR 99-05 December 1999 DEPARTMENT OF COMPUTING SCIENCE University of Alberta

More information

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification

Extended R-Tree Indexing Structure for Ensemble Stream Data Classification Extended R-Tree Indexing Structure for Ensemble Stream Data Classification P. Sravanthi M.Tech Student, Department of CSE KMM Institute of Technology and Sciences Tirupati, India J. S. Ananda Kumar Assistant

More information

Parallel string matching for image matching with prime method

Parallel string matching for image matching with prime method International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.42-46 Chinta Someswara Rao 1, 1 Assistant Professor,

More information

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets

TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets Philippe Cudré-Mauroux Eugene Wu Samuel Madden Computer Science and Artificial Intelligence Laboratory Massachusetts Institute

More information

Indexing by Shape of Image Databases Based on Extended Grid Files

Indexing by Shape of Image Databases Based on Extended Grid Files Indexing by Shape of Image Databases Based on Extended Grid Files Carlo Combi, Gian Luca Foresti, Massimo Franceschet, Angelo Montanari Department of Mathematics and ComputerScience, University of Udine

More information

Enhanced Hemisphere Concept for Color Pixel Classification

Enhanced Hemisphere Concept for Color Pixel Classification 2016 International Conference on Multimedia Systems and Signal Processing Enhanced Hemisphere Concept for Color Pixel Classification Van Ng Graduate School of Information Sciences Tohoku University Sendai,

More information

SILC: Efficient Query Processing on Spatial Networks

SILC: Efficient Query Processing on Spatial Networks Hanan Samet hjs@cs.umd.edu Department of Computer Science University of Maryland College Park, MD 20742, USA Joint work with Jagan Sankaranarayanan and Houman Alborzi Proceedings of the 13th ACM International

More information

Dimension Induced Clustering

Dimension Induced Clustering Dimension Induced Clustering Aris Gionis Alexander Hinneburg Spiros Papadimitriou Panayiotis Tsaparas HIIT, University of Helsinki Martin Luther University, Halle Carnegie Melon University HIIT, University

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Processing a multimedia join through the method of nearest neighbor search

Processing a multimedia join through the method of nearest neighbor search Information Processing Letters 82 (2002) 269 276 Processing a multimedia join through the method of nearest neighbor search Harald Kosch a,, Solomon Atnafu b a Institute of Information Technology, University

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Clustering Billions of Images with Large Scale Nearest Neighbor Search Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton

More information

Hierarchical Clustering 4/5/17

Hierarchical Clustering 4/5/17 Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction

More information

Fractal Dimension and Space Filling Curve

Fractal Dimension and Space Filling Curve Fractal Dimension and Space Filling Curve Ana Karina Gomes anakgomes@ist.utl.pt Instituto Superior Técnico, Lisboa, Portugal May 25 Abstract Space-filling curves are computer generated fractals that can

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

HISTORICAL BACKGROUND

HISTORICAL BACKGROUND VALID-TIME INDEXING Mirella M. Moro Universidade Federal do Rio Grande do Sul Porto Alegre, RS, Brazil http://www.inf.ufrgs.br/~mirella/ Vassilis J. Tsotras University of California, Riverside Riverside,

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

I. INTRODUCTION II. RELATED WORK.

I. INTRODUCTION II. RELATED WORK. ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A New Hybridized K-Means Clustering Based Outlier Detection Technique

More information

OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS

OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS DEEVI RADHA RANI Department of CSE, K L University, Vaddeswaram, Guntur, Andhra Pradesh, India. deevi_radharani@rediffmail.com NAVYA DHULIPALA

More information