Using Novel Method ProMiSH Search Nearest keyword Set In Multidimensional Dataset

Size: px
Start display at page:

Download "Using Novel Method ProMiSH Search Nearest keyword Set In Multidimensional Dataset"

Transcription

1 Using Novel Method ProMiSH Search Nearest keyword Set In Multidimensional Dataset Miss. Shilpa Bhaskar Thakare 1, Prof. Jayshree.V.Shinde 2 1 Department of Computer Engineering, Late G.N.Sapkal C.O.E, Nasik 2 Department of Computer Engineering, Late G.N.Sapkal C.O.E, Nasik Abstract In this paper,we proposed novel method that is ProMiSH (Projection and Multi Scale Hashing) that uses random projection and hash-based index structure. Consider object that are embedded in vector space and tagged with keywords. By using this algorithm we find tightest group of keyword as well as we use five different method i.e Euclidean Distance, Jaccard Distance, Cosine Similarity and correlation Distance for finding more accurate result. ProMiSH has up to 60 time faster as compare to the State-of-the art tee-based techniques. In this system study nearest keyword queries on text reach multidimensional dataset. Keywords Multi-dimensional data, Indexing, Hashing, Querying, projection I. INTRODUCTION In proposed system consider nearest keyword set NKS queries on text reach multi-dimensional dataset. Query contains k set of data points and any data point contains all query keywords and forms top-k tightest cluster in multidimensional space. NKS query can be useful in many applications, such as graph pattern search, geo-location search in GIS system, photo sharing in social networks. In photo-sharing social network, where photos are tagged with people names and location. These photos can be embedded in a high dimensional feature space of texture, color or shape.nks query find a group of similar photos which contains a set of people. NKS queries retrieves top-k candidates depend on its least diameter. If two candidates diameter is same then it retrieves candidate ranked by their cardinality. In previous system tree-based indexes technique is used in NKS query but if dataset size increase or dimensionality in dataset then algorithm performance deteriorates sharply. Algorithm take more time for terminate when multidimensional dataset of millions of points. Therefore, required efficient algorithm which perform better performance in case of large dataset and scales with dataset dimension. In this paper propose ProMiSH has fast processing. ProMiSH-E is always retrieve top-k result and ProMiSH-A that is more efficient in term of time and space, and obtain near-optimal result ProMiSh-E uses hash table and inverted indexes to perform localized search. Hashing technique use in the state-of-the-art method for nearest neighbor search in high dimensional spaces and index structure in ProMiSHE supports accurate search.promish-e creates hash table at multiple bin-widths called index levels. ProMiSH-A is an approximate variation of ProMiSH-E. Empirical result show the ProMiSH-A is up to 16 time faster than ProMiSH-E obtaining near optimal result. Exploring this system assign weights to the keywords of a points using tf-idf techniques. Based on distance between points and weights of keyword can be scored each group of point. ProMiSH is 60 times faster as compare to state-of-the-art tree-based index techniques. Advantage of the proposed system is, efficient search algorithms that work with the multi-scale indexes for fast query processing. We can use NKS queries for many application such as,(1) geographic patterns can characterized a region by a high dimensional set of attribute, such as pressure humidity, and soil types. These regions can also be tagged with information such as diseases. An epidemiologist can formulate NKS queries to discover patterns by finding a set of similar region with all the diseases of her interest.(2) Photo-sharing in social Network (3) Graph Pattern search Our DOI : /IJRTER T00Y5 380

2 Contributions is summarized as follows.(1) For exact and approximate NSK query processing we propose novel multi-scale index.(2) Develop search algorithm for fast query processing(3)develop five different method (i.e Jaccard, Manhhant, Cosine, Co-relation, Euclidean) for getting more accurate result of subset search. II. RELATED WORK A different type of queries studied in literature on text reach multidimensional datasets. Locating Mapped Resources in Web 2.0 [9] In this proposed an efficient tag-centric query processing strategy also locating geographic locations. find the set of nearest co- located objects which together match the query tags. Develop efficient search algorithm that scale up in term of number of objects and tags.felipe et al. [1] present an efficient method to answer top-k special keyword queries. Indexing structure IR2-Tree (Information Retrieval R-Tree) which combines an superimposed text signatures with R-Tree. Maintain an IR2- Tree and use it to answer top-k special keyword queries. Top-k spatial keyword queries which is based on tight integration of data structure and algorithm used in special database search and information retrieval R-Tree (IR2-Tree )which is structure based on the R-Tree at query time and incremental algorithm is employed that uses IR2-Tree which is structure based on the RTree at query time and incremental algorithms. Aggregate Nearest Keyword Search in Spatial Dataset [4], in this retrieves k objects from Q with minimum sum of distances to its nearest point in D such that each nearest point matches at least one query keyword for processing this query several algorithm proposed using IR2-Tree as index structure. Another track of related works deal with m-closest keyword queries [2]. In [2],bR*-Tree is developed based on R*-tree[3] that stores bitmaps and minimum bounding rectangles(mbrs) of keywords in every node along with points MBRs.bR-Tree also suffers from a high storage cost; therefore Zang et al Modified br*-tree to create virtual br*-tree in memory at run time. Virtual br*-tree is created from a pre-stored r*- Tree, which indexes all the points, and an inverted index which stored keyword information and path from the root node in R*-Tree for each point. Both br*-tree and virtual br*-tree shares similar performance weaknesses as br*- Tree. Tree-based indexes, such as M-tree[5],is proposed to organize and search large dataset from generics-tree always balanced several heuristic split alternatives are considered and experimentally evaluated. This M-Tree have been extensively investigated for nearest neighbor search in high dimensional spaces. this index fails to scale to dimensions greater than 10 because of the curse of dimensionality. Random projection with hashing[6][7][8] has comes to be the state-of-the-art method for nearest neighbor search in high dimensional dataset. Jon M. Kleinberg[8] Develop new approach to the nearest-neighbor problem, combining randomly chosen one dimensional projections of the underlying point set based on method. Two algorithms are introduce in this first for finding epsilon-approximate nearest neighbors and second epsilon approximate nearest-neighbor algorithm with near linear storage and query time improves asymptotically linear search in all dimensions. Aristides Gionis [6] examine a novel scheme for approximate similarity search based on hashing. the basic idea is to hash the points from the database. High dimensional spaces based on hierarchical tree de-composition the method gives significant improvement in running time over other methods for searching in. This scheme scales well even for relatively large number of dimensions(more than 50). previous technique[6] solve this problem efficiently only for the approximate case Accurate and efficient Near neighbor Search in High Dimensional Spaces [7] In this are design to solve r-near neighbor queries for a fixed query range or for set of query ranges with probabilistic guarantees. and then extend for nearest neighbor queries. Vishwakarma Singh introduce novel indexing and querying scheme called Spatial Intersection and Metric Pruning(SIMP) Empirical study of this method on three real datasets having dimensions between 32 to 256 and size up to 10 million show a superior performance of SIMP over All Rights Reserved 381

3 III. SYSTEM ARCHITECTURE In existing, Euclidean Distance used for create subsets. But it is not enough for get accurate nearest keyword set search. We cannot satisfy this one Euclidean distance Result for accuracy. So we will use, Euclidean Distance with Manhattan distance, Cosine Distance, Correlation Distance and Jaccard Distance for accurate Nearest Keyword Set Search. Figure 1. System architecture Manhattan distance: Manhattan distance is the sum of the vertical and horizontal distances from the current node to the goal node/tile AND the number of moves to reach the goal node from the initial position. BFS is used to find the closest point. outweight = outweight + (distance - existing) Cosine Distance: Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 is 1, and it is less than 1 for any other angle. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. The name derives from the term direction cosine : in this case, note that unit vectors are maximally similar if they re parallel and maximally dissimilar if they re orthogonal (perpendicular). It should not escape the alert reader s attention that this is analogous to cosine, which is unity (maximum value) when the segments subtend a zero angle and zero (uncorrelated) when the segments are perpendicular. Given two vectors of attributes, A and B, the cosine similarity, cos Ɵ, is represented using a dot product and magnitude All Rights Reserved 382

4 Correlation Distance: The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation and distance covariance. These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient, Jaccard Distance: The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Jaccard similarity will be calculated. Therefore the dimensions of the respective arrays of the output matrix are set, and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular part and the diagonal are computed. When errors occur during computation the function returns FALSE. For practical reasons the implementation of the algorithm does not necessarily need true binary data. It distinguishes whether a value is 0 or within a certain threshold close to it. In this case it will be interpreted as logical FALSE, e.g. Absence. Values being larger than the given threshold are interpreted as logical TRUE, e.g. Presence. Thus, it is possible without further preparation to pass a count matrix to the function. As the given threshold affects all values equally it does not alter its metric characteristic. To calculate the Jaccard dissimilarity the Jaccard similarity matrix is computed first and thereafter transformed. Refer Figure 1 it shows the Architecture of proposed system. It consist of following modules: 3.1. Search Algorithm Module ProMiSH referred to as ProMiSH-A. We start with the algorithm description of ProMiSH-A, and then analyze its approximation quality. Finds top- k results from a subset of data points that ProMiSH-E highly depends on an efficient search algorithm HI Construction Module It consists of multiple hash tables and inverted indexes referred to as HI. HI is controlled by three All Rights Reserved 383

5 Index Level (L). HI at all the index level then it performs a search in the complete dataset D Number of Random Unit Vectors. We partition the segment into 2(L-s+1) + 1 overlapping bins, where each bin has width and is equally overlapped with two other bins and consider its projection space as a segment [0, pmax].all m random unit vectors partition into Projection space Number of Random Unit Vectors. A given a dictionary V and hash table H(s), we create the inverted index I(s) khb. Keys are still keywords in inverted index. Inverted index shown in the dotted rectangle and HI with one pair of hash table Dataset Our evaluation employs synthetic datasets. We generate synthetic datasets. In particular, the data generation process is governed by the following parameters: (1) Dimension d specifies the dimensionality of each data point; (2) Dataset size N indicates the total number of multi-dimensional points in a synthetic dataset; (3) Keywords per point t suggests the number of keywords for each data point; and (4) Dictionary size U denotes the total number of keywords in a dataset. For each data point, its coordinate in each dimension is randomly sampled between 0 and 10; 000, and its keyword is randomly selected following a uniform distribution. We create multiple synthetic datasets to investigate how these parameters affect the performance of ProMiSH. IV. ALGORITHM Following steps Show the execution of proposed System: Input: Q : query keywords; HI : Hash Index ; Ikhb : Keyword bucket inverted index V : A directory of unique keywords in D; v : A keyword Process: Step 1: Load Dataset Step 2: Enter query keyword Step 3: Keyword point Invert index IKp Step 4: For each,we create key entry in Ikp, and this key entry points set to the data points Dv Step 5: repeat until all keyword in V processed Step 6: Keyword bucket inverted index Ikhb Step 7: Get HI at S Step 8: E[ ] O /* List of hash Bucket Step 9: For all VQ Q do Step 10: For all bid Ikhb [VQ] Step 11: E[bId] E[bId] + 1 Step 12: End for Step 13: End for Step 14: Subset Search Step 15:Find the Euclidean Distance,Jaccard Distance Correlation Distance, Cosine Distance, Manhattan Distance of each All Rights Reserved 384

6 Step 16: Compare all 5 Distances result Step 17: Accurate Nearest Keyword set International Journal of Recent Trends in Engineering & Research (IJRTER) V. RESULT In this section we evaluate the most tightest group of Nearest data point set. Accuracy: In Result table shows that,tightest top-k group of nearest data point and No of method we can obtain this tightest group.for eg. as shown in result table 1st tightest group 1,2,3,1,2 we obtain by using all five distance calculation method that s why it is most accurate tightest group of nearest data point. Table 1. Nearest Data point Accuracy Figure. 2. Accuracy of Nearest Data point V. CONCLUSION In this paper, we proposed solution for The problem of nearest keyword set search in multidimensional datasets. We Proposed a novel method called ProMiSH based on random projection and hashing for finding nearest keyword set Based on this index, developed ProMiSH-E that find an optimal result with better efficiency. As well as we use five different type of distance calculation method for obtain more accurate subset of nearest data point and our result shows that the more accurate subset of data point. We plan to explore the extension of ProMiSH to disk.promish-e sequentially reads only required bukets from Ikp to find points containing at least one query keyword. Therefore, Ikp can be stored on disk using dictionary file All Rights Reserved 385

7 REFERENCES 1. I. De Felipe, V. Hristidis, and N. Rishe, Keyword search on spatial databases, in ICDE, 2008, pp. 656? D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa, Keyword search in spatial databases: Towards searching by document, in ICDE, N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, The R*-tree: An efficient and robust access method for points and rectangles, in SIGMOD, Z. Li, H. Xu, Y. Lu, and A. Qian, Aggregate nearest keyword search in spatial databases, in Asia-Pacific Web Conference, P. Ciaccia, M. Patella, and P. Zezula, M-tree: An efficient access method for similarity search in metric spaces, in VLDB, A. Gionis, P. Indyk, and R. Motwani, Similarity search in high dimensions via hashing, in VLDB, V. Singh and A. K. Singh, Simp: accurate and efficient near neighbour search in high dimensional spaces, in EDBT, J. M. Kleinberg, Two algorithms for nearest-neighbour search in high dimensions, in STOC, D. Zhang, B. C. Ooi, and A. K. H. Tung, Locating mapped resources in web 2.0, in ICDE, 2010, 10. V. Singh, S. Venkatesha, and A. K. Singh, Geo-clustering of images with missing geotags, in GRC, H.-H. Park, G.-H. Cha, and C.-W. Chung, Multi-way spatial joins using r-trees: Methodology and performance evaluation, in SASD, D. Papadias, N. Mamoulis, and Y. Theodoridis, Processing and optimization of multiway spatial joins using r-trees, in PODS, T. Ibaraki and T. Kameda, On the optimal nesting order for computing n-relational joins, ACM Trans. Database Syst., vol. 9, W. Li and C. X. Chen,Efficient data modeling and querying system for multi-dimensional spatial data, in GIS, V. Singh, A. Bhattacharya, and A. K. Singh, Querying spatial patterns, in EDBT, C. Long, R. C.-W. Wong, K. Wang, and A. W.-C. Fu, Collective spatial keyword queries: a distance owner-driven approach, in SIGMOD, N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, The R*- tree: An efficient and robust access method for points and rectangles, in SIGMOD, All Rights Reserved 386

Spatial Index Keyword Search in Multi- Dimensional Database

Spatial Index Keyword Search in Multi- Dimensional Database Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases

More information

Efficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme

Efficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme Efficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme 1 N.NAVEEN KUMAR, 2 YASMEEN ANJUM 1 Assistant Professor, Department of CSE, School of Information

More information

Nearest Keyword Set Search In Multi- Dimensional Datasets

Nearest Keyword Set Search In Multi- Dimensional Datasets Nearest Keyword Set Search In Multi- Dimensional Datasets 1 R. ANITHA, 2 R. JAYA SUNDARI, 3 V. KANIMOZHI, 4 K. MUMTAJ BEGAM 5 Mr. D.SATHYAMURTHY ME 1,2,3,4 Students, 5 Assistant Professor U.G Scholar MRK

More information

2 RELATED WORK. TABLE 1 A glossary of notations used in the paper.

2 RELATED WORK. TABLE 1 A glossary of notations used in the paper. Nearest Keyword Set Search in Multi-dimensional Datasets Vishwakarma Singh, Bo Zong, Ambuj K. Singh Abstract Keyword-based search in text-rich multi-dimensional datasets facilitates many novel applications

More information

Spatial Keyword Search. Presented by KWOK Chung Hin, WONG Kam Kwai

Spatial Keyword Search. Presented by KWOK Chung Hin, WONG Kam Kwai Spatial Keyword Search Presented by KWOK Chung Hin, WONG Kam Kwai Outline Background/ Motivations Spatial Keyword Search Applications Two types of spatial keyword query Individual Object Object Sets Background

More information

NOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE

NOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE NOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE 1 Asma Akbar, 2 Mohammed Naqueeb Ahmad 1 M.Tech Student, Department of CSE, Deccan College of Engineering and Technology, Darussalam

More information

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Best Keyword Cover Search

Best Keyword Cover Search Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Using Natural Clusters Information to Build Fuzzy Indexing Structure Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Efficient Index Based Query Keyword Search in the Spatial Database

Efficient Index Based Query Keyword Search in the Spatial Database Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 5 (2017) pp. 1517-1529 Research India Publications http://www.ripublication.com Efficient Index Based Query Keyword Search

More information

Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index

Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index B. SATYA MOUNIKA 1, J. VENKATA KRISHNA 2 1 M-Tech Dept. of CSE SreeVahini Institute of Science and Technology TiruvuruAndhra

More information

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018

Spatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018 Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Efficient Nearest Keyword Set Search in Multi-dimensional Datasets using pruning algorithm

Efficient Nearest Keyword Set Search in Multi-dimensional Datasets using pruning algorithm Efficient Nearest Keyword Set Search in Multi-dimensional Datasets using pruning algorithm J. Amutha1,T.Meyyappan 2, SM.Thamarai3 *1Departmentmet of Computer Science,AlagappaUniversity,Karaikudi, Tamilnadu,

More information

Enhanced Methodology for supporting approximate string search in Geospatial data

Enhanced Methodology for supporting approximate string search in Geospatial data International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Enhanced Methodology for supporting approximate string search in Geospatial data Ashwina.R 1, Mrs.T.Megala 2 1, 2 (MCA-III year,

More information

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure

Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Fast Similarity Search for High-Dimensional Dataset

Fast Similarity Search for High-Dimensional Dataset Fast Similarity Search for High-Dimensional Dataset Quan Wang and Suya You Computer Science Department University of Southern California {quanwang,suyay}@graphics.usc.edu Abstract This paper addresses

More information

Survey of Spatial Approximate String Search

Survey of Spatial Approximate String Search Survey of Spatial Approximate String Search B.Ramya M.Tech 1 1 Department of Computer Science and Engineering, Karunya University, Coimbatore, Tamil Nadu, India Abstract: Several applications require finding

More information

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues

10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree

X-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree X-tree Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d a Department of Computer and Information Science, University of Konstanz b Department of Computer Science, University

More information

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering

More information

Nonparametric Clustering of High Dimensional Data

Nonparametric Clustering of High Dimensional Data Nonparametric Clustering of High Dimensional Data Peter Meer Electrical and Computer Engineering Department Rutgers University Joint work with Bogdan Georgescu and Ilan Shimshoni Robust Parameter Estimation:

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz

More information

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using Euclidean Distance in Recommendation Engine Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance

More information

Efficiency of Hybrid Index Structures - Theoretical Analysis and a Practical Application

Efficiency of Hybrid Index Structures - Theoretical Analysis and a Practical Application Efficiency of Hybrid Index Structures - Theoretical Analysis and a Practical Application Richard Göbel, Carsten Kropf, Sven Müller Institute of Information Systems University of Applied Sciences Hof Hof,

More information

Algorithms for Nearest Neighbors

Algorithms for Nearest Neighbors Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline

More information

The Effects of Dimensionality Curse in High Dimensional knn Search

The Effects of Dimensionality Curse in High Dimensional knn Search The Effects of Dimensionality Curse in High Dimensional knn Search Nikolaos Kouiroukidis, Georgios Evangelidis Department of Applied Informatics University of Macedonia Thessaloniki, Greece Email: {kouiruki,

More information

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique

Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

Closest Keywords Search on Spatial Databases

Closest Keywords Search on Spatial Databases Closest Keywords Search on Spatial Databases 1 A. YOJANA, 2 Dr. A. SHARADA 1 M. Tech Student, Department of CSE, G.Narayanamma Institute of Technology & Science, Telangana, India. 2 Associate Professor,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Hierarchical Clustering

Hierarchical Clustering What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

HYBRID GEO-TEXTUAL INDEX STRUCTURE FOR SPATIAL RANGE KEYWORD SEARCH

HYBRID GEO-TEXTUAL INDEX STRUCTURE FOR SPATIAL RANGE KEYWORD SEARCH HYBRID GEO-TEXTUAL INDEX STRUCTURE FOR SPATIAL RANGE KEYWORD SEARCH Su Nandar Aung 1 and Myint Mint Sein 2 1 University of Computer Studies, Yangon, Myanmar 2 Research and Development Department, University

More information

Experimental Evaluation of Spatial Indices with FESTIval

Experimental Evaluation of Spatial Indices with FESTIval Experimental Evaluation of Spatial Indices with FESTIval Anderson Chaves Carniel 1, Ricardo Rodrigues Ciferri 2, Cristina Dutra de Aguiar Ciferri 1 1 Department of Computer Science University of São Paulo

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

CLSH: Cluster-based Locality-Sensitive Hashing

CLSH: Cluster-based Locality-Sensitive Hashing CLSH: Cluster-based Locality-Sensitive Hashing Xiangyang Xu Tongwei Ren Gangshan Wu Multimedia Computing Group, State Key Laboratory for Novel Software Technology, Nanjing University xiangyang.xu@smail.nju.edu.cn

More information

Processing Rank-Aware Queries in P2P Systems

Processing Rank-Aware Queries in P2P Systems Processing Rank-Aware Queries in P2P Systems Katja Hose, Marcel Karnstedt, Anke Koch, Kai-Uwe Sattler, and Daniel Zinn Department of Computer Science and Automation, TU Ilmenau P.O. Box 100565, D-98684

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

On Indexing High Dimensional Data with Uncertainty

On Indexing High Dimensional Data with Uncertainty On Indexing High Dimensional Data with Uncertainty Charu C. Aggarwal Philip S. Yu Abstract In this paper, we will examine the problem of distance function computation and indexing uncertain data in high

More information

Parallelizing String Similarity Join Algorithms

Parallelizing String Similarity Join Algorithms Parallelizing String Similarity Join Algorithms Ling-Chih Yao and Lipyeow Lim University of Hawai i at Mānoa, Honolulu, HI 96822, USA {lingchih,lipyeow}@hawaii.edu Abstract. A key operation in data cleaning

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Locality-Sensitive Hashing

Locality-Sensitive Hashing Locality-Sensitive Hashing & Image Similarity Search Andrew Wylie Overview; LSH given a query q (or not), how do we find similar items from a large search set quickly? Can t do all pairwise comparisons;

More information

Secure and Advanced Best Keyword Cover Search over Spatial Database

Secure and Advanced Best Keyword Cover Search over Spatial Database Secure and Advanced Best Keyword Cover Search over Spatial Database Sweety Thakare 1, Pritam Patil 2, Tarade Priyanka 3, Sonawane Prajakta 4, Prof. Pathak K.R. 4 B. E Student, Dept. of Computer Engineering,

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

Lesson 3. Prof. Enza Messina

Lesson 3. Prof. Enza Messina Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

A Novel Framework to Measure the Degree of Difficulty on Keyword Query Routing

A Novel Framework to Measure the Degree of Difficulty on Keyword Query Routing ISSN (Online): 2349-7084 GLOBAL IMPACT FACTOR 0.238 DIIF 0.876 A Novel Framework to Measure the Degree of Difficulty on Keyword Query Routing 1 Kallem Rajender Reddy, 2 Y.Sunitha 1 M.Tech (CS),Department

More information

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome

More information

Tree Based Index (TBI) System. Getting Started with TBI

Tree Based Index (TBI) System. Getting Started with TBI Tree Based Index (TBI) System Getting Started with TBI Jia Xu 1 Zhenjie Zhang 2 Anthony K. H. Tung 2 Ge Yu 1 1 {xujia,yuge}@ise.neu.edu.cn 2 {zhenjie,atung}@comp.nus.edu.sg May 5, 2010 1 System Introduction

More information

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions

More information

Edge Classification in Networks

Edge Classification in Networks Charu C. Aggarwal, Peixiang Zhao, and Gewen He Florida State University IBM T J Watson Research Center Edge Classification in Networks ICDE Conference, 2016 Introduction We consider in this paper the edge

More information

Algorithms for Nearest Neighbors

Algorithms for Nearest Neighbors Algorithms for Nearest Neighbors State-of-the-Art Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Yandex Tech Seminar, April 2007 1 / 28 Outline 1 Problem Statement Applications Data Models

More information

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

On Processing Location Based Top-k Queries in the Wireless Broadcasting System

On Processing Location Based Top-k Queries in the Wireless Broadcasting System On Processing Location Based Top-k Queries in the Wireless Broadcasting System HaRim Jung, ByungKu Cho, Yon Dohn Chung and Ling Liu Department of Computer Science and Engineering, Korea University, Seoul,

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

University of Florida CISE department Gator Engineering. Clustering Part 5

University of Florida CISE department Gator Engineering. Clustering Part 5 Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean

More information

Distance-based Outlier Detection: Consolidation and Renewed Bearing

Distance-based Outlier Detection: Consolidation and Renewed Bearing Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction

More information

A Survey on Nearest Neighbor Search with Keywords

A Survey on Nearest Neighbor Search with Keywords A Survey on Nearest Neighbor Search with Keywords Shimna P. T 1, Dilna V. C 2 1, 2 AWH Engineering College, KTU University, Department of Computer Science & Engineering, Kuttikkatoor, Kozhikode, India

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

Automatic Cluster Number Selection using a Split and Merge K-Means Approach

Automatic Cluster Number Selection using a Split and Merge K-Means Approach Automatic Cluster Number Selection using a Split and Merge K-Means Approach Markus Muhr and Michael Granitzer 31st August 2009 The Know-Center is partner of Austria's Competence Center Program COMET. Agenda

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li

DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

ECS 234: Data Analysis: Clustering ECS 234

ECS 234: Data Analysis: Clustering ECS 234 : Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

More information

Detect tracking behavior among trajectory data

Detect tracking behavior among trajectory data Detect tracking behavior among trajectory data Jianqiu Xu, Jiangang Zhou Nanjing University of Aeronautics and Astronautics, China, jianqiu@nuaa.edu.cn, jiangangzhou@nuaa.edu.cn Abstract. Due to the continuing

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

More information

Cloud-Based Multimedia Content Protection System

Cloud-Based Multimedia Content Protection System Cloud-Based Multimedia Content Protection System Abstract Shivanand S Rumma Dept. of P.G. Studies Gulbarga University Kalaburagi Karnataka, India shivanand_sr@yahoo.co.in In day to day life so many multimedia

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Nearest Neighbor Search by Branch and Bound

Nearest Neighbor Search by Branch and Bound Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to

More information

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang NICTA & CSE UNSW COMP9314 Advanced Database S1 2007 jzhang@cse.unsw.edu.au Reference Papers and Resources Papers: Colour spaces-perceptual, historical

More information

Using Statistics for Computing Joins with MapReduce

Using Statistics for Computing Joins with MapReduce Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat

More information

Chapter 4: Text Clustering

Chapter 4: Text Clustering 4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Clustering Billions of Images with Large Scale Nearest Neighbor Search Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton

More information

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Text Documents clustering using K Means Algorithm

Text Documents clustering using K Means Algorithm Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Diversity in Skylines

Diversity in Skylines Diversity in Skylines Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong Sha Tin, New Territories, Hong Kong taoyf@cse.cuhk.edu.hk Abstract Given an integer k, a diverse

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

Comparative Study of Subspace Clustering Algorithms

Comparative Study of Subspace Clustering Algorithms Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that

More information

Nearest Neighbour Expansion Using Keyword Cover Search

Nearest Neighbour Expansion Using Keyword Cover Search Nearest Neighbour Expansion Using Keyword Cover Search [1] P. Sai Vamsi Aravind MTECH(CSE) Institute of Aeronautical Engineering, Hyderabad [2] P.Anjaiah Assistant Professor Institute of Aeronautical Engineering,

More information

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Introduction to Indexing R-trees. Hong Kong University of Science and Technology Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records

More information