Using Novel Method ProMiSH Search Nearest keyword Set In Multidimensional Dataset
|
|
- Grant Whitehead
- 5 years ago
- Views:
Transcription
1 Using Novel Method ProMiSH Search Nearest keyword Set In Multidimensional Dataset Miss. Shilpa Bhaskar Thakare 1, Prof. Jayshree.V.Shinde 2 1 Department of Computer Engineering, Late G.N.Sapkal C.O.E, Nasik 2 Department of Computer Engineering, Late G.N.Sapkal C.O.E, Nasik Abstract In this paper,we proposed novel method that is ProMiSH (Projection and Multi Scale Hashing) that uses random projection and hash-based index structure. Consider object that are embedded in vector space and tagged with keywords. By using this algorithm we find tightest group of keyword as well as we use five different method i.e Euclidean Distance, Jaccard Distance, Cosine Similarity and correlation Distance for finding more accurate result. ProMiSH has up to 60 time faster as compare to the State-of-the art tee-based techniques. In this system study nearest keyword queries on text reach multidimensional dataset. Keywords Multi-dimensional data, Indexing, Hashing, Querying, projection I. INTRODUCTION In proposed system consider nearest keyword set NKS queries on text reach multi-dimensional dataset. Query contains k set of data points and any data point contains all query keywords and forms top-k tightest cluster in multidimensional space. NKS query can be useful in many applications, such as graph pattern search, geo-location search in GIS system, photo sharing in social networks. In photo-sharing social network, where photos are tagged with people names and location. These photos can be embedded in a high dimensional feature space of texture, color or shape.nks query find a group of similar photos which contains a set of people. NKS queries retrieves top-k candidates depend on its least diameter. If two candidates diameter is same then it retrieves candidate ranked by their cardinality. In previous system tree-based indexes technique is used in NKS query but if dataset size increase or dimensionality in dataset then algorithm performance deteriorates sharply. Algorithm take more time for terminate when multidimensional dataset of millions of points. Therefore, required efficient algorithm which perform better performance in case of large dataset and scales with dataset dimension. In this paper propose ProMiSH has fast processing. ProMiSH-E is always retrieve top-k result and ProMiSH-A that is more efficient in term of time and space, and obtain near-optimal result ProMiSh-E uses hash table and inverted indexes to perform localized search. Hashing technique use in the state-of-the-art method for nearest neighbor search in high dimensional spaces and index structure in ProMiSHE supports accurate search.promish-e creates hash table at multiple bin-widths called index levels. ProMiSH-A is an approximate variation of ProMiSH-E. Empirical result show the ProMiSH-A is up to 16 time faster than ProMiSH-E obtaining near optimal result. Exploring this system assign weights to the keywords of a points using tf-idf techniques. Based on distance between points and weights of keyword can be scored each group of point. ProMiSH is 60 times faster as compare to state-of-the-art tree-based index techniques. Advantage of the proposed system is, efficient search algorithms that work with the multi-scale indexes for fast query processing. We can use NKS queries for many application such as,(1) geographic patterns can characterized a region by a high dimensional set of attribute, such as pressure humidity, and soil types. These regions can also be tagged with information such as diseases. An epidemiologist can formulate NKS queries to discover patterns by finding a set of similar region with all the diseases of her interest.(2) Photo-sharing in social Network (3) Graph Pattern search Our DOI : /IJRTER T00Y5 380
2 Contributions is summarized as follows.(1) For exact and approximate NSK query processing we propose novel multi-scale index.(2) Develop search algorithm for fast query processing(3)develop five different method (i.e Jaccard, Manhhant, Cosine, Co-relation, Euclidean) for getting more accurate result of subset search. II. RELATED WORK A different type of queries studied in literature on text reach multidimensional datasets. Locating Mapped Resources in Web 2.0 [9] In this proposed an efficient tag-centric query processing strategy also locating geographic locations. find the set of nearest co- located objects which together match the query tags. Develop efficient search algorithm that scale up in term of number of objects and tags.felipe et al. [1] present an efficient method to answer top-k special keyword queries. Indexing structure IR2-Tree (Information Retrieval R-Tree) which combines an superimposed text signatures with R-Tree. Maintain an IR2- Tree and use it to answer top-k special keyword queries. Top-k spatial keyword queries which is based on tight integration of data structure and algorithm used in special database search and information retrieval R-Tree (IR2-Tree )which is structure based on the R-Tree at query time and incremental algorithm is employed that uses IR2-Tree which is structure based on the RTree at query time and incremental algorithms. Aggregate Nearest Keyword Search in Spatial Dataset [4], in this retrieves k objects from Q with minimum sum of distances to its nearest point in D such that each nearest point matches at least one query keyword for processing this query several algorithm proposed using IR2-Tree as index structure. Another track of related works deal with m-closest keyword queries [2]. In [2],bR*-Tree is developed based on R*-tree[3] that stores bitmaps and minimum bounding rectangles(mbrs) of keywords in every node along with points MBRs.bR-Tree also suffers from a high storage cost; therefore Zang et al Modified br*-tree to create virtual br*-tree in memory at run time. Virtual br*-tree is created from a pre-stored r*- Tree, which indexes all the points, and an inverted index which stored keyword information and path from the root node in R*-Tree for each point. Both br*-tree and virtual br*-tree shares similar performance weaknesses as br*- Tree. Tree-based indexes, such as M-tree[5],is proposed to organize and search large dataset from generics-tree always balanced several heuristic split alternatives are considered and experimentally evaluated. This M-Tree have been extensively investigated for nearest neighbor search in high dimensional spaces. this index fails to scale to dimensions greater than 10 because of the curse of dimensionality. Random projection with hashing[6][7][8] has comes to be the state-of-the-art method for nearest neighbor search in high dimensional dataset. Jon M. Kleinberg[8] Develop new approach to the nearest-neighbor problem, combining randomly chosen one dimensional projections of the underlying point set based on method. Two algorithms are introduce in this first for finding epsilon-approximate nearest neighbors and second epsilon approximate nearest-neighbor algorithm with near linear storage and query time improves asymptotically linear search in all dimensions. Aristides Gionis [6] examine a novel scheme for approximate similarity search based on hashing. the basic idea is to hash the points from the database. High dimensional spaces based on hierarchical tree de-composition the method gives significant improvement in running time over other methods for searching in. This scheme scales well even for relatively large number of dimensions(more than 50). previous technique[6] solve this problem efficiently only for the approximate case Accurate and efficient Near neighbor Search in High Dimensional Spaces [7] In this are design to solve r-near neighbor queries for a fixed query range or for set of query ranges with probabilistic guarantees. and then extend for nearest neighbor queries. Vishwakarma Singh introduce novel indexing and querying scheme called Spatial Intersection and Metric Pruning(SIMP) Empirical study of this method on three real datasets having dimensions between 32 to 256 and size up to 10 million show a superior performance of SIMP over All Rights Reserved 381
3 III. SYSTEM ARCHITECTURE In existing, Euclidean Distance used for create subsets. But it is not enough for get accurate nearest keyword set search. We cannot satisfy this one Euclidean distance Result for accuracy. So we will use, Euclidean Distance with Manhattan distance, Cosine Distance, Correlation Distance and Jaccard Distance for accurate Nearest Keyword Set Search. Figure 1. System architecture Manhattan distance: Manhattan distance is the sum of the vertical and horizontal distances from the current node to the goal node/tile AND the number of moves to reach the goal node from the initial position. BFS is used to find the closest point. outweight = outweight + (distance - existing) Cosine Distance: Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 is 1, and it is less than 1 for any other angle. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. The name derives from the term direction cosine : in this case, note that unit vectors are maximally similar if they re parallel and maximally dissimilar if they re orthogonal (perpendicular). It should not escape the alert reader s attention that this is analogous to cosine, which is unity (maximum value) when the segments subtend a zero angle and zero (uncorrelated) when the segments are perpendicular. Given two vectors of attributes, A and B, the cosine similarity, cos Ɵ, is represented using a dot product and magnitude All Rights Reserved 382
4 Correlation Distance: The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation and distance covariance. These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient, Jaccard Distance: The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Jaccard similarity will be calculated. Therefore the dimensions of the respective arrays of the output matrix are set, and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular part and the diagonal are computed. When errors occur during computation the function returns FALSE. For practical reasons the implementation of the algorithm does not necessarily need true binary data. It distinguishes whether a value is 0 or within a certain threshold close to it. In this case it will be interpreted as logical FALSE, e.g. Absence. Values being larger than the given threshold are interpreted as logical TRUE, e.g. Presence. Thus, it is possible without further preparation to pass a count matrix to the function. As the given threshold affects all values equally it does not alter its metric characteristic. To calculate the Jaccard dissimilarity the Jaccard similarity matrix is computed first and thereafter transformed. Refer Figure 1 it shows the Architecture of proposed system. It consist of following modules: 3.1. Search Algorithm Module ProMiSH referred to as ProMiSH-A. We start with the algorithm description of ProMiSH-A, and then analyze its approximation quality. Finds top- k results from a subset of data points that ProMiSH-E highly depends on an efficient search algorithm HI Construction Module It consists of multiple hash tables and inverted indexes referred to as HI. HI is controlled by three All Rights Reserved 383
5 Index Level (L). HI at all the index level then it performs a search in the complete dataset D Number of Random Unit Vectors. We partition the segment into 2(L-s+1) + 1 overlapping bins, where each bin has width and is equally overlapped with two other bins and consider its projection space as a segment [0, pmax].all m random unit vectors partition into Projection space Number of Random Unit Vectors. A given a dictionary V and hash table H(s), we create the inverted index I(s) khb. Keys are still keywords in inverted index. Inverted index shown in the dotted rectangle and HI with one pair of hash table Dataset Our evaluation employs synthetic datasets. We generate synthetic datasets. In particular, the data generation process is governed by the following parameters: (1) Dimension d specifies the dimensionality of each data point; (2) Dataset size N indicates the total number of multi-dimensional points in a synthetic dataset; (3) Keywords per point t suggests the number of keywords for each data point; and (4) Dictionary size U denotes the total number of keywords in a dataset. For each data point, its coordinate in each dimension is randomly sampled between 0 and 10; 000, and its keyword is randomly selected following a uniform distribution. We create multiple synthetic datasets to investigate how these parameters affect the performance of ProMiSH. IV. ALGORITHM Following steps Show the execution of proposed System: Input: Q : query keywords; HI : Hash Index ; Ikhb : Keyword bucket inverted index V : A directory of unique keywords in D; v : A keyword Process: Step 1: Load Dataset Step 2: Enter query keyword Step 3: Keyword point Invert index IKp Step 4: For each,we create key entry in Ikp, and this key entry points set to the data points Dv Step 5: repeat until all keyword in V processed Step 6: Keyword bucket inverted index Ikhb Step 7: Get HI at S Step 8: E[ ] O /* List of hash Bucket Step 9: For all VQ Q do Step 10: For all bid Ikhb [VQ] Step 11: E[bId] E[bId] + 1 Step 12: End for Step 13: End for Step 14: Subset Search Step 15:Find the Euclidean Distance,Jaccard Distance Correlation Distance, Cosine Distance, Manhattan Distance of each All Rights Reserved 384
6 Step 16: Compare all 5 Distances result Step 17: Accurate Nearest Keyword set International Journal of Recent Trends in Engineering & Research (IJRTER) V. RESULT In this section we evaluate the most tightest group of Nearest data point set. Accuracy: In Result table shows that,tightest top-k group of nearest data point and No of method we can obtain this tightest group.for eg. as shown in result table 1st tightest group 1,2,3,1,2 we obtain by using all five distance calculation method that s why it is most accurate tightest group of nearest data point. Table 1. Nearest Data point Accuracy Figure. 2. Accuracy of Nearest Data point V. CONCLUSION In this paper, we proposed solution for The problem of nearest keyword set search in multidimensional datasets. We Proposed a novel method called ProMiSH based on random projection and hashing for finding nearest keyword set Based on this index, developed ProMiSH-E that find an optimal result with better efficiency. As well as we use five different type of distance calculation method for obtain more accurate subset of nearest data point and our result shows that the more accurate subset of data point. We plan to explore the extension of ProMiSH to disk.promish-e sequentially reads only required bukets from Ikp to find points containing at least one query keyword. Therefore, Ikp can be stored on disk using dictionary file All Rights Reserved 385
7 REFERENCES 1. I. De Felipe, V. Hristidis, and N. Rishe, Keyword search on spatial databases, in ICDE, 2008, pp. 656? D. Zhang, Y. M. Chee, A. Mondal, A. K. H. Tung, and M. Kitsuregawa, Keyword search in spatial databases: Towards searching by document, in ICDE, N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, The R*-tree: An efficient and robust access method for points and rectangles, in SIGMOD, Z. Li, H. Xu, Y. Lu, and A. Qian, Aggregate nearest keyword search in spatial databases, in Asia-Pacific Web Conference, P. Ciaccia, M. Patella, and P. Zezula, M-tree: An efficient access method for similarity search in metric spaces, in VLDB, A. Gionis, P. Indyk, and R. Motwani, Similarity search in high dimensions via hashing, in VLDB, V. Singh and A. K. Singh, Simp: accurate and efficient near neighbour search in high dimensional spaces, in EDBT, J. M. Kleinberg, Two algorithms for nearest-neighbour search in high dimensions, in STOC, D. Zhang, B. C. Ooi, and A. K. H. Tung, Locating mapped resources in web 2.0, in ICDE, 2010, 10. V. Singh, S. Venkatesha, and A. K. Singh, Geo-clustering of images with missing geotags, in GRC, H.-H. Park, G.-H. Cha, and C.-W. Chung, Multi-way spatial joins using r-trees: Methodology and performance evaluation, in SASD, D. Papadias, N. Mamoulis, and Y. Theodoridis, Processing and optimization of multiway spatial joins using r-trees, in PODS, T. Ibaraki and T. Kameda, On the optimal nesting order for computing n-relational joins, ACM Trans. Database Syst., vol. 9, W. Li and C. X. Chen,Efficient data modeling and querying system for multi-dimensional spatial data, in GIS, V. Singh, A. Bhattacharya, and A. K. Singh, Querying spatial patterns, in EDBT, C. Long, R. C.-W. Wong, K. Wang, and A. W.-C. Fu, Collective spatial keyword queries: a distance owner-driven approach, in SIGMOD, N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, The R*- tree: An efficient and robust access method for points and rectangles, in SIGMOD, All Rights Reserved 386
Spatial Index Keyword Search in Multi- Dimensional Database
Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases
More informationEfficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme
Efficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme 1 N.NAVEEN KUMAR, 2 YASMEEN ANJUM 1 Assistant Professor, Department of CSE, School of Information
More informationNearest Keyword Set Search In Multi- Dimensional Datasets
Nearest Keyword Set Search In Multi- Dimensional Datasets 1 R. ANITHA, 2 R. JAYA SUNDARI, 3 V. KANIMOZHI, 4 K. MUMTAJ BEGAM 5 Mr. D.SATHYAMURTHY ME 1,2,3,4 Students, 5 Assistant Professor U.G Scholar MRK
More information2 RELATED WORK. TABLE 1 A glossary of notations used in the paper.
Nearest Keyword Set Search in Multi-dimensional Datasets Vishwakarma Singh, Bo Zong, Ambuj K. Singh Abstract Keyword-based search in text-rich multi-dimensional datasets facilitates many novel applications
More informationSpatial Keyword Search. Presented by KWOK Chung Hin, WONG Kam Kwai
Spatial Keyword Search Presented by KWOK Chung Hin, WONG Kam Kwai Outline Background/ Motivations Spatial Keyword Search Applications Two types of spatial keyword query Individual Object Object Sets Background
More informationNOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE
NOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE 1 Asma Akbar, 2 Mohammed Naqueeb Ahmad 1 M.Tech Student, Department of CSE, Deccan College of Engineering and Technology, Darussalam
More informationISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationBest Keyword Cover Search
Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.
More informationInverted Index for Fast Nearest Neighbour
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationUsing Natural Clusters Information to Build Fuzzy Indexing Structure
Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationEfficient Index Based Query Keyword Search in the Spatial Database
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 5 (2017) pp. 1517-1529 Research India Publications http://www.ripublication.com Efficient Index Based Query Keyword Search
More informationSearching of Nearest Neighbor Based on Keywords using Spatial Inverted Index
Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index B. SATYA MOUNIKA 1, J. VENKATA KRISHNA 2 1 M-Tech Dept. of CSE SreeVahini Institute of Science and Technology TiruvuruAndhra
More informationSpatiotemporal Access to Moving Objects. Hao LIU, Xu GENG 17/04/2018
Spatiotemporal Access to Moving Objects Hao LIU, Xu GENG 17/04/2018 Contents Overview & applications Spatiotemporal queries Movingobjects modeling Sampled locations Linear function of time Indexing structure
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationEfficient Nearest Keyword Set Search in Multi-dimensional Datasets using pruning algorithm
Efficient Nearest Keyword Set Search in Multi-dimensional Datasets using pruning algorithm J. Amutha1,T.Meyyappan 2, SM.Thamarai3 *1Departmentmet of Computer Science,AlagappaUniversity,Karaikudi, Tamilnadu,
More informationEnhanced Methodology for supporting approximate string search in Geospatial data
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Enhanced Methodology for supporting approximate string search in Geospatial data Ashwina.R 1, Mrs.T.Megala 2 1, 2 (MCA-III year,
More informationDocument Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure
Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationFast Similarity Search for High-Dimensional Dataset
Fast Similarity Search for High-Dimensional Dataset Quan Wang and Suya You Computer Science Department University of Southern California {quanwang,suyay}@graphics.usc.edu Abstract This paper addresses
More informationSurvey of Spatial Approximate String Search
Survey of Spatial Approximate String Search B.Ramya M.Tech 1 1 Department of Computer Science and Engineering, Karunya University, Coimbatore, Tamil Nadu, India Abstract: Several applications require finding
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationTop-k Keyword Search Over Graphs Based On Backward Search
Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer
More informationX-tree. Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d. SYNONYMS Extended node tree
X-tree Daniel Keim a, Benjamin Bustos b, Stefan Berchtold c, and Hans-Peter Kriegel d a Department of Computer and Information Science, University of Konstanz b Department of Computer Science, University
More informationA Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods
A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering
More informationNonparametric Clustering of High Dimensional Data
Nonparametric Clustering of High Dimensional Data Peter Meer Electrical and Computer Engineering Department Rutgers University Joint work with Bogdan Georgescu and Ilan Shimshoni Robust Parameter Estimation:
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationPivoting M-tree: A Metric Access Method for Efficient Similarity Search
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz
More informationCollaborative Filtering using Euclidean Distance in Recommendation Engine
Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance
More informationEfficiency of Hybrid Index Structures - Theoretical Analysis and a Practical Application
Efficiency of Hybrid Index Structures - Theoretical Analysis and a Practical Application Richard Göbel, Carsten Kropf, Sven Müller Institute of Information Systems University of Applied Sciences Hof Hof,
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline
More informationThe Effects of Dimensionality Curse in High Dimensional knn Search
The Effects of Dimensionality Curse in High Dimensional knn Search Nikolaos Kouiroukidis, Georgios Evangelidis Department of Applied Informatics University of Macedonia Thessaloniki, Greece Email: {kouiruki,
More informationImproving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique
Improving Privacy And Data Utility For High- Dimensional Data By Using Anonymization Technique P.Nithya 1, V.Karpagam 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College,
More informationA Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase
More informationClosest Keywords Search on Spatial Databases
Closest Keywords Search on Spatial Databases 1 A. YOJANA, 2 Dr. A. SHARADA 1 M. Tech Student, Department of CSE, G.Narayanamma Institute of Technology & Science, Telangana, India. 2 Associate Professor,
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationHierarchical Clustering
What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationHYBRID GEO-TEXTUAL INDEX STRUCTURE FOR SPATIAL RANGE KEYWORD SEARCH
HYBRID GEO-TEXTUAL INDEX STRUCTURE FOR SPATIAL RANGE KEYWORD SEARCH Su Nandar Aung 1 and Myint Mint Sein 2 1 University of Computer Studies, Yangon, Myanmar 2 Research and Development Department, University
More informationExperimental Evaluation of Spatial Indices with FESTIval
Experimental Evaluation of Spatial Indices with FESTIval Anderson Chaves Carniel 1, Ricardo Rodrigues Ciferri 2, Cristina Dutra de Aguiar Ciferri 1 1 Department of Computer Science University of São Paulo
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCLSH: Cluster-based Locality-Sensitive Hashing
CLSH: Cluster-based Locality-Sensitive Hashing Xiangyang Xu Tongwei Ren Gangshan Wu Multimedia Computing Group, State Key Laboratory for Novel Software Technology, Nanjing University xiangyang.xu@smail.nju.edu.cn
More informationProcessing Rank-Aware Queries in P2P Systems
Processing Rank-Aware Queries in P2P Systems Katja Hose, Marcel Karnstedt, Anke Koch, Kai-Uwe Sattler, and Daniel Zinn Department of Computer Science and Automation, TU Ilmenau P.O. Box 100565, D-98684
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationOn Indexing High Dimensional Data with Uncertainty
On Indexing High Dimensional Data with Uncertainty Charu C. Aggarwal Philip S. Yu Abstract In this paper, we will examine the problem of distance function computation and indexing uncertain data in high
More informationParallelizing String Similarity Join Algorithms
Parallelizing String Similarity Join Algorithms Ling-Chih Yao and Lipyeow Lim University of Hawai i at Mānoa, Honolulu, HI 96822, USA {lingchih,lipyeow}@hawaii.edu Abstract. A key operation in data cleaning
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationLocality-Sensitive Hashing
Locality-Sensitive Hashing & Image Similarity Search Andrew Wylie Overview; LSH given a query q (or not), how do we find similar items from a large search set quickly? Can t do all pairwise comparisons;
More informationSecure and Advanced Best Keyword Cover Search over Spatial Database
Secure and Advanced Best Keyword Cover Search over Spatial Database Sweety Thakare 1, Pritam Patil 2, Tarade Priyanka 3, Sonawane Prajakta 4, Prof. Pathak K.R. 4 B. E Student, Dept. of Computer Engineering,
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationSupporting Fuzzy Keyword Search in Databases
I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationA Novel Framework to Measure the Degree of Difficulty on Keyword Query Routing
ISSN (Online): 2349-7084 GLOBAL IMPACT FACTOR 0.238 DIIF 0.876 A Novel Framework to Measure the Degree of Difficulty on Keyword Query Routing 1 Kallem Rajender Reddy, 2 Y.Sunitha 1 M.Tech (CS),Department
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More informationTree Based Index (TBI) System. Getting Started with TBI
Tree Based Index (TBI) System Getting Started with TBI Jia Xu 1 Zhenjie Zhang 2 Anthony K. H. Tung 2 Ge Yu 1 1 {xujia,yuge}@ise.neu.edu.cn 2 {zhenjie,atung}@comp.nus.edu.sg May 5, 2010 1 System Introduction
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More informationEdge Classification in Networks
Charu C. Aggarwal, Peixiang Zhao, and Gewen He Florida State University IBM T J Watson Research Center Edge Classification in Networks ICDE Conference, 2016 Introduction We consider in this paper the edge
More informationAlgorithms for Nearest Neighbors
Algorithms for Nearest Neighbors State-of-the-Art Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Yandex Tech Seminar, April 2007 1 / 28 Outline 1 Problem Statement Applications Data Models
More informationCLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16
CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationOn Processing Location Based Top-k Queries in the Wireless Broadcasting System
On Processing Location Based Top-k Queries in the Wireless Broadcasting System HaRim Jung, ByungKu Cho, Yon Dohn Chung and Ling Liu Department of Computer Science and Engineering, Korea University, Seoul,
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 5
Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean
More informationDistance-based Outlier Detection: Consolidation and Renewed Bearing
Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction
More informationA Survey on Nearest Neighbor Search with Keywords
A Survey on Nearest Neighbor Search with Keywords Shimna P. T 1, Dilna V. C 2 1, 2 AWH Engineering College, KTU University, Department of Computer Science & Engineering, Kuttikkatoor, Kozhikode, India
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationAutomatic Cluster Number Selection using a Split and Merge K-Means Approach
Automatic Cluster Number Selection using a Split and Merge K-Means Approach Markus Muhr and Michael Granitzer 31st August 2009 The Know-Center is partner of Austria's Competence Center Program COMET. Agenda
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationDS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li
Welcome to DS504/CS586: Big Data Analytics Data Management Prof. Yanhua Li Time: 6:00pm 8:50pm R Location: KH 116 Fall 2017 First Grading for Reading Assignment Weka v 6 weeks v https://weka.waikato.ac.nz/dataminingwithweka/preview
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationTowards a hybrid approach to Netflix Challenge
Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationDetect tracking behavior among trajectory data
Detect tracking behavior among trajectory data Jianqiu Xu, Jiangang Zhou Nanjing University of Aeronautics and Astronautics, China, jianqiu@nuaa.edu.cn, jiangangzhou@nuaa.edu.cn Abstract. Due to the continuing
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationCloud-Based Multimedia Content Protection System
Cloud-Based Multimedia Content Protection System Abstract Shivanand S Rumma Dept. of P.G. Studies Gulbarga University Kalaburagi Karnataka, India shivanand_sr@yahoo.co.in In day to day life so many multimedia
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationNearest Neighbor Search by Branch and Bound
Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to
More informationLecture 6: Multimedia Information Retrieval Dr. Jian Zhang
Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang NICTA & CSE UNSW COMP9314 Advanced Database S1 2007 jzhang@cse.unsw.edu.au Reference Papers and Resources Papers: Colour spaces-perceptual, historical
More informationUsing Statistics for Computing Joins with MapReduce
Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationClustering Billions of Images with Large Scale Nearest Neighbor Search
Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More informationQuadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase
Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,
More informationDiversity in Skylines
Diversity in Skylines Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong Sha Tin, New Territories, Hong Kong taoyf@cse.cuhk.edu.hk Abstract Given an integer k, a diverse
More informationInternational Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.
A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish
More informationComparative Study of Subspace Clustering Algorithms
Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that
More informationNearest Neighbour Expansion Using Keyword Cover Search
Nearest Neighbour Expansion Using Keyword Cover Search [1] P. Sai Vamsi Aravind MTECH(CSE) Institute of Aeronautical Engineering, Hyderabad [2] P.Anjaiah Assistant Professor Institute of Aeronautical Engineering,
More informationIntroduction to Indexing R-trees. Hong Kong University of Science and Technology
Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records
More information