Clustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V.
|
|
- Warren Rogers
- 6 years ago
- Views:
Transcription
1 Clustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V.Krishnaiah *3 #1 M.Tech, Computer Science Engineering, DRKIST, Hyderabad, Andhra Pradesh, India # Associate Professor, Department of CSE, DRKIST, Hyderabad, Andhra Pradesh, India # Principal, Department of CSE, DRKIST, Hyderabad, Andhra Pradesh, India Abstract--Data mining discovers knowhow required for decision making. In real world high-dimensional data is frequently used. Therefore it is essential for data mining techniques to work on high-dimensional data. Especially clustering algorithm has to work with highdimensional data. In this paper we explore the similarity search mechanisms with respect to highdimensional data. The existing techniques for indexing have certain drawbacks as they do not consider dependencies. For this reason their performance is suboptimal. In the process of clustering finding correlations of different dimensions is required. Pruning is a process of removing unnecessarydata is part of the techniques. Bounding hyper sphere and bounding rectangles are the main techniques used for pruning. They are n to efficient in Nearest Neighbor (NN) search. In this paper we proposed a novel algorithm to overcome the problem. Our technique is known as cluster-adaptive bounding which makes use of cluster based index. Our algorithm also features spatial filtering for reducing computational and storage overhead. The similarity measures such as Euclidean and Mahalanobis can also be used with our approach. We also built an application to show the proof of concept. The empirical results reveal that the proposed approach is effective with high-dimensional data for performing NN search. Keywords Data mining, high-dimensional data, similarity measures, indexing I. INTRODUCTION With new technologies invented in signal processing, it has become a common practice to process huge amount of data. In other words huge amount of data is mined in order to extract actionable knowledge. There is an increased development and the release of innovative electronic devices to process multimedia data. Apart from storage improvements the data mining has become a feasible act due to the increase in computational power of the systems. With these capabilities new technologies like GIS, CAD, and CAM came into existence besides techniques for processing medical images. Now the data in TBs is being processed. In such multimedia applications indexing the data will speed up query retrieval process. Spatial queries are used in case of high-dimensional data. Especially the queries are nearest neighbor queries. However, such queries with ED as measure are not suitable for high-dimensions data. The reason behind this is the curse of dimensionality besides the use of pessimistic metrics [1]. Search performance is to be given paramount importance in multimedia applications [2]. Prior works on this made many assumptions such as uniform data distribution and independent attributes. But in reality this is not the case with data. The data exhibits irregular data distributions besides having dependency among attributes. Such data results in the drawback curse of dimensionality [3]. Indexing such data using nearest neighbors and farthest neighbors will help in improving speed of queries. Euclidean distance measure can be used to index datasets in the real world. CBIR applications can also use such metric known as Weighted Euclidean [4]. In this paper, we throw light into the real world highdimensional datasets which are indexed for NN searches. II. RELATED WORK Indexing has been around as a popular technique to solve search problems in applications. In order to process multi-dimensional data many indexing techniques are available to help in speeding up search process. For low dimensional data R-tree structure can be used for recursive partitioning [5]. Other such indexing structures widely used include SR Tree [7], and SS-Tree [6]. There are instances where the combination of these two trees used for better ISSN: Page3711
2 performance. These are specialized in using ED as similarity measure. M-trees are very good candidates for distance functions [8]. Multidimensional indexes are used with low-dimensional spaces. They perform well when compared with sequential approaches. However, their performance is degraded when number of dimensions is increased beyond certain threshold. The sequential approach shown inferior performance. When dimensionality crosses 10, less performance is recorded with these methods due to the curse of dimensionality [9], [10] as the searchspace becomesexpone also existing that combine two or more methods. They include A-Tree and IQ-Tree. Future vectors and distance functions are used for approximations for finding similarity. The exact similarity search is achieved with approximation of perceptions with overhead in query rounds. Savings in the processing of query is done using certain strategies like MMDR [15], PAC-NN [14] and VA- LOW [16], [17]. These approaches used hashing mechanisms such as LSH (Locally Sensitive Hashing) [18]. More information is to be researched on approximation for better knowhow [19]. There are some limitations with approximate indexing as it causes tradeoffs between quality of search and the time taken to complete the search as explored in [20]. III. CLUSTER DISTANCE BOUNDING In this section estimation of distances to clusters is discussed. For making an effective cluster distance bound the following equation is used. Fig. 1 Performance comparison New approaches came into existence in order to overcome the dimensionality problem. The new technique is vector approximation file [9] which became very popular. It divides the space into small pieces known as hyper rectangles and they are subjected to quantized approximation. A new file for separate approximation is maintained in secondary storage which holds encoded data. When NN search is performed using, VA-file is searched sequentially with respect to lower and upper bounds, the distance measure is used to process the query faster. In the end the vectors from secondary storage are accessed in order to fine the nearest neighbors. With respect to VA-file the techniques which proved not appropriate include vector approximation and scalar quantization. With respect to VA-file in [11] datasets are altered for changed dimensions before they are subjected to approximation in the presence of multiple dimensions. As per the data distributions adaptive spacing is used along with approximation cells. There are many approximation techniques recently proposed in [13], [12] with a common aim of outperforming the sequential scan. Hybrid methods The distance function used for Mahalanobis distance is computed as follows. IV. CLUSTERING AND INDEX STRUCTURE Index construction is essential in case of processing queries with high-dimensional data. Such indexing is used in NN and Voronoi clusters in the real world application. Afterwards, many techniques came into existence for clustering data. The techniques include BIRCH [23], GLA [22], and fast K-means algorithm [21]. The results of algorithms are used for further processing. The results of generic clusteringalgorithms can be used further to process all pivot points. Then the dataset is scanned besides mapping of element is done with nearest pivot. Voronoi clusters are formed by mapping data to the pivot. The process is presented in algorithm 1. ISSN: Page3712
3 Algorithm 1 [24] As can be seen in algorithm 1, rearrangement of clusters is visible in order to make them more precise. Centroid is used as pivot. With the single scan results can be obtained using Voronoi clustering and with generic clustering approach. One scan is required by indexing scheme in order to take less time for making index. Algorithm 3 [24] V. EVALUATION OF RESUTLS We built a prototype application in order to demonstrate the proof of concept. Then the results are analyzed and presented in this section. The environment used to build the application includes a PC with 4 GB RAM, core 2 dual processor running Windows XP operating system. ALGORITHM 2 KNN-SEARCH (Q) [24] Fig. 2 - IO Performance of Distance Bounds (BIO- RETINA) As shown in figure 2, bio retina data set results are pages.\ ISSN: Page3713
4 Fig. 3 - IO Performance of Distance Bound (SENSORS) As shown in figure 3, SENSORS data set results are pages. Fig. 4 - IO Performance of Distance Bounds (AERIAL) As shown in figure 4, bio retina data set results are pages. Fig. 5 - IO Performance of Distance Bounds HISTOGRAM. As shown in figure 5, bio retina data set results are pages. VI. CONCLUSIONS In this paper we study indexing for high-dimensional data. Such data exhibits significant correlations and non uniform distributions. VA-file became very popular to index such data for best performance with respect to search. However, due to curse of dimensionality, this technique produces suboptimal results. In this paper we overcome this problem by proposing a new indexing method based on vector quantization. In this approach a dataset is divided into multiple voronoi clusters before further processing. Afterwards, cluster distance bounds were built using the byperplane boundaries. The new search technique can make use of distance measures like Mahalanobis and ED. The proposed indexing method also results in less IO cost and memory usage. It is also scalable and provides better performance when compared with MBS and MBR bounds. Our prototype demonstrates it and the empirical results revealed the same. REFERENCES [1] C.C. Aggarwal, A. Hinneburg, and D.A. Keim, On the Surprising Behavior of Distance Metrics in High Dimensional Spaces, Proc. Int l Conf. Database Theory (ICDT), pp , [2] B.U. Pagel, F. Korn, and C. Faloutsos, Deflating the Dimensionality Curse Using Multiple Fractal ISSN: Page3714
5 Dimensions, Proc. Int l Conf. Data Eng. (ICDE), pp , [3] K.S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is Nearest Neighbor Meaningful?, Proc. Int l Conf. Database Theory (ICDT), pp , [4] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon, Information- Theoretic Metric Learning, Proc. Int l Conf. Machine Learning (ICML), pp , [5] A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching, Proc. ACM SIGMOD Int l Conf. Management of Data, pp , [6] D.A. White and R. Jain, Similarity Indexing with the SS-Tree, Proc. Int l Conf. Data Eng. (ICDE), pp , [7] N. Katayama and S. Satoh, The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries, Proc. ACM SIGMOD Int l Conf. Management of Data, pp , May [8] P. Ciaccia, M. Patella, and P. Zezula, M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces, Proc. Int l Conf. Very Large Databases (VLDB), pp , [9] R. Weber, H. Schek, and S. Blott, A Quantitative Analysis and Performance Study for Similarity- Search Methods in High- Dimensional Spaces, Proc. Int l Conf. Very Large Data Bases (VLDB), pp , Aug [10] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton Univ. Press, [11] H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A.E. Abbadi, Vector Approximation Based Indexing for Non-Uniform High Dimensional Data Sets, Proc. Int l Conf. Information and Knowledge Management (CIKM), pp , [12] K. Vu, K. Hua, H. Cheng, and S. Lang, A Non- Linear Dimensionality-Reduction Technique for Fast Similarity Search in Large Databases, Proc. ACM SIGMOD Int l Conf. Management of Data, pp , [13] K. Chakrabarti and S. Mehrotra, Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces, Proc. Int l Conf. Very Large Databases (VLDB), pp , Sept [14] P. Ciaccia and M. Patella, PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces, Proc. Int l Conf. Data Eng. (ICDE), pp , [15] H. Jin, B.C. Ooi, H.T. Shen, C. Yu, and A. Zhou, An Adaptive and Efficient Dimensionality Reduction Algorithm for High-Dimensional Indexing, Proc. Int l Conf. Data Eng. (ICDE), pp , Mar [16] R. Weber and K. Bo hm, Trading Quality for Time with Nearest Neighbor Search, Proc. Seventh Int l Conf. Extending Database Technology (EDBT): Advances in Database Technology, pp , [17] E. Tuncel, H. Ferhatosmanoglu, and K. Rose, VQ-Index: An Index Structure for Similarity Searching in Multimedia Databases, Proc. ACM Int l Conf. Multimedia, pp , [18] A. Gionis, P. Indyk, and R. Motwani, Similarity Search in High Dimensions via Hashing, Proc. Int l Conf. Very Large Databases (VLDB), pp , Sept [19] P. Ciaccia and M. Patella, Approximate Similarity Queries: A Survey, Technical Report CSITE-08-01, May ISSN: Page3715
6 [20] E. Tuncel, P. Koulgi, and K. Rose, Rate- Distortion Approach to Databases: Storage and Content-Based Retrieval, IEEE Trans. Information Theory, vol. 50, no. 6, pp , June [21] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis.John Wiley & Sons, [22] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression.Kluwer Academic Publishers, [23] T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proc. ACM SIGMOD Int l Conf. Management of Data, pp , [24] Sharadh Ramaswamy, Student Member, IEEE, and Kenneth Rose, Fellow, IEEE, Adaptive Cluster Distance Bounding for High-Dimensional Indexing, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 6, JUNE AUTHORS Ashwini is student of DRK Institute of Science and Technology, Hyderabad, AP, INDIA. She has received B.Tech Degree computer science and engineering, M.Tech Degree in computer science and engineering. Her main research interest includes data mining, Databases and DWH. K.Praveen is working as an Associate Professor in DRK Institute of Science and Technology, JNTUH, Hyderabad, Andhra Pradesh, India. He has completed M.Tech (C.S.E) from Osmania University, Hyderabad. His main research interest includes Databases, Web Methods and Computer Networks. Dr.R.V.Krishnaiah (Ph.D) is working as Principal at DRK INSTITUTE OF SCINCE & TECHNOLOGY, Hyderabad, AP, INDIA. He has received M.Tech Degree EIE and CSE. His main research interest includes Data Mining, Software Engineering. ISSN: Page3716
Fast Similarity Search for High-Dimensional Dataset
Fast Similarity Search for High-Dimensional Dataset Quan Wang and Suya You Computer Science Department University of Southern California {quanwang,suyay}@graphics.usc.edu Abstract This paper addresses
More informationThe Effects of Dimensionality Curse in High Dimensional knn Search
The Effects of Dimensionality Curse in High Dimensional knn Search Nikolaos Kouiroukidis, Georgios Evangelidis Department of Applied Informatics University of Macedonia Thessaloniki, Greece Email: {kouiruki,
More informationNearest Neighbor Search on Vertically Partitioned High-Dimensional Data
Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data Evangelos Dellis, Bernhard Seeger, and Akrivi Vlachou Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße,
More informationSubspace Similarity Search: Efficient k-nn Queries in Arbitrary Subspaces
Subspace Similarity Search: Efficient k-nn Queries in Arbitrary Subspaces Thomas Bernecker, Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, Arthur Zimek Institut
More informationIndexing High-Dimensional Data for Content-Based Retrieval in Large Databases
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University
More informationEfficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme
Efficient NKS Queries Search in Multidimensional Dataset through Projection and Multi-Scale Hashing Scheme 1 N.NAVEEN KUMAR, 2 YASMEEN ANJUM 1 Assistant Professor, Department of CSE, School of Information
More informationClosest Keywords Search on Spatial Databases
Closest Keywords Search on Spatial Databases 1 A. YOJANA, 2 Dr. A. SHARADA 1 M. Tech Student, Department of CSE, G.Narayanamma Institute of Technology & Science, Telangana, India. 2 Associate Professor,
More informationClustering Technique with Potter stemmer and Hypergraph Algorithms for Multi-featured Query Processing
Vol.2, Issue.3, May-June 2012 pp-960-965 ISSN: 2249-6645 Clustering Technique with Potter stemmer and Hypergraph Algorithms for Multi-featured Query Processing Abstract In navigational system, it is important
More informationSearching of Nearest Neighbor Based on Keywords using Spatial Inverted Index
Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index B. SATYA MOUNIKA 1, J. VENKATA KRISHNA 2 1 M-Tech Dept. of CSE SreeVahini Institute of Science and Technology TiruvuruAndhra
More informationSpatial Index Keyword Search in Multi- Dimensional Database
Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases
More informationA Novel Quantization Approach for Approximate Nearest Neighbor Search to Minimize the Quantization Error
A Novel Quantization Approach for Approximate Nearest Neighbor Search to Minimize the Quantization Error Uriti Archana 1, Urlam Sridhar 2 P.G. Student, Department of Computer Science &Engineering, Sri
More informationISSN: (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationBenchmarking Access Structures for High-Dimensional Multimedia Data
Benchmarking Access Structures for High-Dimensional Multimedia Data by Nathan G. Colossi and Mario A. Nascimento Technical Report TR 99-05 December 1999 DEPARTMENT OF COMPUTING SCIENCE University of Alberta
More informationA Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems
A Scalable Index Mechanism for High-Dimensional Data in Cluster File Systems Kyu-Woong Lee Hun-Soon Lee, Mi-Young Lee, Myung-Joon Kim Abstract We address the problem of designing index structures that
More informationETP-Mine: An Efficient Method for Mining Transitional Patterns
ETP-Mine: An Efficient Method for Mining Transitional Patterns B. Kiran Kumar 1 and A. Bhaskar 2 1 Department of M.C.A., Kakatiya Institute of Technology & Science, A.P. INDIA. kirankumar.bejjanki@gmail.com
More informationIndexing High-Dimensional Space:
Indexing High-Dimensional Space: Database Support for Next Decade s Applications Stefan Berchtold AT&T Research berchtol@research.att.com Daniel A. Keim University of Halle-Wittenberg keim@informatik.uni-halle.de
More informationAn encoding-based dual distance tree high-dimensional index
Science in China Series F: Information Sciences 2008 SCIENCE IN CHINA PRESS Springer www.scichina.com info.scichina.com www.springerlink.com An encoding-based dual distance tree high-dimensional index
More informationIndexing High-Dimensional Data for. Content-Based Retrieval in Large Databases
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases Manuel J. Fonseca, Joaquim A. Jorge Department of Information Systems and Computer Science INESC-ID/IST/Technical University
More informationImproving Recommendations Through. Re-Ranking Of Results
Improving Recommendations Through Re-Ranking Of Results S.Ashwini M.Tech, Computer Science Engineering, MLRIT, Hyderabad, Andhra Pradesh, India Abstract World Wide Web has become a good source for any
More informationAnalysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data
Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department
More informationEfficient Index Based Query Keyword Search in the Spatial Database
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 5 (2017) pp. 1517-1529 Research India Publications http://www.ripublication.com Efficient Index Based Query Keyword Search
More informationIndexing and selection of data items in huge data sets by constructing and accessing tag collections
Indexing and selection of data items in huge data sets by constructing and accessing tag collections Sébastien Ponce CERN, Geneva LHCb Experiment sebastien.ponce@cern.ch tel +1-41-22-767-2143 Roger D.
More informationTwo Ellipse-based Pruning Methods for Group Nearest Neighbor Queries
Two Ellipse-based Pruning Methods for Group Nearest Neighbor Queries ABSTRACT Hongga Li Institute of Remote Sensing Applications Chinese Academy of Sciences, Beijing, China lihongga lhg@yahoo.com.cn Bo
More informationNOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE
NOVEL CACHE SEARCH TO SEARCH THE KEYWORD COVERS FROM SPATIAL DATABASE 1 Asma Akbar, 2 Mohammed Naqueeb Ahmad 1 M.Tech Student, Department of CSE, Deccan College of Engineering and Technology, Darussalam
More informationQuadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase
Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,
More informationSearch K Nearest Neighbors on Air
Search K Nearest Neighbors on Air Baihua Zheng 1, Wang-Chien Lee 2, and Dik Lun Lee 1 1 Hong Kong University of Science and Technology Clear Water Bay, Hong Kong {baihua,dlee}@cs.ust.hk 2 The Penn State
More informationidistance: An Adaptive B + -tree Based Indexing Method for Nearest Neighbor Search
: An Adaptive B + -tree Based Indexing Method for Nearest Neighbor Search H.V. Jagadish University of Michigan Beng Chin Ooi National University of Singapore Kian-Lee Tan National University of Singapore
More informationCONTENT BASED IMAGE RETRIEVAL SYSTEM USING IMAGE CLASSIFICATION
International Journal of Research and Reviews in Applied Sciences And Engineering (IJRRASE) Vol 8. No.1 2016 Pp.58-62 gopalax Journals, Singapore available at : www.ijcns.com ISSN: 2231-0061 CONTENT BASED
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More informationUsing Natural Clusters Information to Build Fuzzy Indexing Structure
Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:
More informationAn Efficient Approach for Color Pattern Matching Using Image Mining
An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,
More informationHike: A High Performance knn Query Processing System for Multimedia Data
Hike: A High Performance knn Query Processing System for Multimedia Data Hui Li College of Computer Science and Technology Guizhou University Guiyang, China cse.huili@gzu.edu.cn Ling Liu College of Computing
More informationClustering from Data Streams
Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting
More informationSimilarity Search without Tears: the OMNI-Family of All-Purpose Access Methods
Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods Roberto Figueira Santos Filho 1 Agma Traina 1 Caetano Traina Jr. 1 Christos Faloutsos 2 1 Department of Computer Science and
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:
More informationA Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods
A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering
More informationApproximate Nearest Neighbor Search. Deng Cai Zhejiang University
Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0
More informationEffective Pattern Similarity Match for Multidimensional Sequence Data Sets
Effective Pattern Similarity Match for Multidimensional Sequence Data Sets Seo-Lyong Lee, * and Deo-Hwan Kim 2, ** School of Industrial and Information Engineering, Hanu University of Foreign Studies,
More informationUsing Novel Method ProMiSH Search Nearest keyword Set In Multidimensional Dataset
Using Novel Method ProMiSH Search Nearest keyword Set In Multidimensional Dataset Miss. Shilpa Bhaskar Thakare 1, Prof. Jayshree.V.Shinde 2 1 Department of Computer Engineering, Late G.N.Sapkal C.O.E,
More informationA Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering
A Review: Content Base Image Mining Technique for Image Retrieval Using Hybrid Clustering Gurpreet Kaur M-Tech Student, Department of Computer Engineering, Yadawindra College of Engineering, Talwandi Sabo,
More informationISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com
More informationDistance-based Outlier Detection: Consolidation and Renewed Bearing
Distance-based Outlier Detection: Consolidation and Renewed Bearing Gustavo. H. Orair, Carlos H. C. Teixeira, Wagner Meira Jr., Ye Wang, Srinivasan Parthasarathy September 15, 2010 Table of contents Introduction
More informationHandling Multiple K-Nearest Neighbor Query Verifications on Road Networks under Multiple Data Owners
Handling Multiple K-Nearest Neighbor Query Verifications on Road Networks under Multiple Data Owners S.Susanna 1, Dr.S.Vasundra 2 1 Student, Department of CSE, JNTU Anantapur, Andhra Pradesh, India 2 Professor,
More informationIndexing High-Dimensional Data in Dual Distance Spaces: A Symmetrical Encoding Approach
Indexing High-Dimensional Data in Dual Distance Spaces: A Symmetrical Encoding Approach Yi Zhuang 1 Yueting Zhuang 1 Qing Li Lei Chen 3 Yi Yu 4 1 College of Computer Science, Zheiang University, P.R.China
More informationQuery- And User-Dependent Approach for Ranking Query Results in Web Databases
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 6, Issue 6 (Nov. - Dec. 2012), PP 36-43 Query- And User-Dependent Approach for Ranking Query Results in Web Databases
More informationNavigation Cost Modeling Based On Ontology
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 4, Issue 3 (Sep.-Oct. 2012), PP 34-39 Navigation Cost Modeling Based On Ontology 1 Madala Venkatesh, 2 Dr.R.V.Krishnaiah 1 Department
More informationApproximate Similarity Search in Metric Spaces using Inverted Files
Approximate Similarity Search in Metric Spaces using Inverted Files Giuseppe Amato ISTI-CNR Via G. Moruzzi, 56, Pisa, Italy giuseppe.amato@isti.cnr.it ABSTRACT We propose a new approach to perform approximate
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationAdaptively Discovering Meaningful Patterns in High-Dimensional Nearest Neighbor Search
Adaptively Discovering Meaningful Patterns in High-Dimensional Nearest Neighbor Search Yimin Wu and Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Buffalo,
More informationMaximum Average Minimum. Distance between Points Dimensionality. Nearest Neighbor. Query Point
Signicance-Sensitive Nearest-Neighbor Search for Ecient Similarity Retrieval of Multimedia Information Norio Katayama and Shin'ichi Satoh Research and Development Department NACSIS (National Center for
More informationA Hierarchical Bitmap Indexing Method for Similarity Search in High-Dimensional Multimedia Databases *
JOURNAL OF NFORMATON SCENCE AND ENGNEERNG 6, 393-407 (00) A Hierarchical Bitmap ndexing Method for Similarity Search in High-Dimensional Multimedia Databases * Department of Computer Science and Engineering
More informationInverted Index for Fast Nearest Neighbour
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationClustering in Big Data Using K-Means Algorithm
Clustering in Big Data Using K-Means Algorithm Ajitesh Janaswamy, B.E Dept of CSE, BITS Pilani, Dubai Campus. ABSTRACT: K-means is the most widely used clustering algorithm due to its fairly straightforward
More informationAn Improvement Video Search Method for VP-Tree by using a Trigonometric Inequality
J Inf Process Syst, Vol.9, No.2, June 203 pissn 976-93X eissn 2092-805X http://dx.doi.org/0.3745/jips.203.9.2.35 An Improvement Video Search Method for VP-Tree by using a Trigonometric Inequality Samuel
More informationContent based Image Retrieval Using Multichannel Feature Extraction Techniques
ISSN 2395-1621 Content based Image Retrieval Using Multichannel Feature Extraction Techniques #1 Pooja P. Patil1, #2 Prof. B.H. Thombare 1 patilpoojapandit@gmail.com #1 M.E. Student, Computer Engineering
More informationConstrained Skyline Query Processing against Distributed Data Sites
Constrained Skyline Query Processing against Distributed Data Divya.G* 1, V.Ranjith Naik *2 1,2 Department of Computer Science Engineering Swarnandhra College of Engg & Tech, Narsapuram-534280, A.P., India.
More informationImproving the Efficiency of Fast Using Semantic Similarity Algorithm
International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year
More informationA Study on Creating Assessment Model for Miniature Question Answer Using Nearest Neighbor Search Keywords
A Study on Creating Assessment Model for Miniature Question Answer Using Nearest Neighbor Search Keywords L.Mary Immaculate Sheela 1, R.J.Poovaraghan 2 M.Tech Student, Dept. of CSE, SRM University, Chennai,
More informationISSN (Online) ISSN (Print)
Accurate Alignment of Search Result Records from Web Data Base 1Soumya Snigdha Mohapatra, 2 M.Kalyan Ram 1,2 Dept. of CSE, Aditya Engineering College, Surampalem, East Godavari, AP, India Abstract: Most
More informationFast Indexing and Search. Lida Huang, Ph.D. Senior Member of Consulting Staff Magma Design Automation
Fast Indexing and Search Lida Huang, Ph.D. Senior Member of Consulting Staff Magma Design Automation Motivation Object categorization? http://www.cs.utexas.edu/~grauman/slides/jain_et_al_cvpr2008.ppt Motivation
More informationMetric Learning Applied for Automatic Large Image Classification
September, 2014 UPC Metric Learning Applied for Automatic Large Image Classification Supervisors SAHILU WENDESON / IT4BI TOON CALDERS (PhD)/ULB SALIM JOUILI (PhD)/EuraNova Image Database Classification
More informationA Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 2, April 2011 CSES International 2011 ISSN 0973-4406 A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
More informationPrivacy and Accuracy Monitoring Of Spatial Queries Using Voronoi Neighbors Using Data Mining
Privacy and Accuracy Monitoring Of Spatial Queries Using Voronoi Neighbors Using Data Mining Athira.S.Kumar 1, Dr. S.V.M.G.Bavithiraja 2 PG Scholar, Department of CSE, Hindusthan Institute of Technology,
More informationTree Based Index (TBI) System. Getting Started with TBI
Tree Based Index (TBI) System Getting Started with TBI Jia Xu 1 Zhenjie Zhang 2 Anthony K. H. Tung 2 Ge Yu 1 1 {xujia,yuge}@ise.neu.edu.cn 2 {zhenjie,atung}@comp.nus.edu.sg May 5, 2010 1 System Introduction
More informationCHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES
CHAPTER 7. PAPER 3: EFFICIENT HIERARCHICAL CLUSTERING OF LARGE DATA SETS USING P-TREES 7.1. Abstract Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of
More informationBest Keyword Cover Search
Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REAL TIME DATA SEARCH OPTIMIZATION: AN OVERVIEW MS. DEEPASHRI S. KHAWASE 1, PROF.
More informationTDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran
TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran M-Tech Scholar, Department of Computer Science and Engineering, SRM University, India Assistant Professor,
More informationIndexing High Dimensional Rectangles for Fast Multimedia Identification
Indexing High Dimensional Rectangles for Fast Multimedia Identification Jonathan Goldstein John C. Platt Christopher J. C. Burges 10/28/2003 Technical Report MSR-TR-2003-38 Microsoft Research Microsoft
More informationCOLOR AND SHAPE BASED IMAGE RETRIEVAL
International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.2, Issue 4, Dec 2012 39-44 TJPRC Pvt. Ltd. COLOR AND SHAPE BASED IMAGE RETRIEVAL
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationOnline Document Clustering Using the GPU
Online Document Clustering Using the GPU Benjamin E. Teitler, Jagan Sankaranarayanan, Hanan Samet Center for Automation Research Institute for Advanced Computer Studies Department of Computer Science University
More informationLocal Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces Kaushik Chakrabarti Department of Computer Science University of Illinois Urbana, IL 61801 kaushikc@cs.uiuc.edu Sharad
More information1722 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH X/$ IEEE
1722 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010 Fusion Coding of Correlated Sources for Storage and Selective Retrieval Sharadh Ramaswamy, Member, IEEE, Jayant Nayak, Member, IEEE,
More informationPivoting M-tree: A Metric Access Method for Efficient Similarity Search
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic tomas.skopal@vsb.cz
More informationSecure and Advanced Best Keyword Cover Search over Spatial Database
Secure and Advanced Best Keyword Cover Search over Spatial Database Sweety Thakare 1, Pritam Patil 2, Tarade Priyanka 3, Sonawane Prajakta 4, Prof. Pathak K.R. 4 B. E Student, Dept. of Computer Engineering,
More informationAutomated Information Retrieval System Using Correlation Based Multi- Document Summarization Method
Automated Information Retrieval System Using Correlation Based Multi- Document Summarization Method Dr.K.P.Kaliyamurthie HOD, Department of CSE, Bharath University, Tamilnadu, India ABSTRACT: Automated
More informationAn Investigation on Multi-Token List Based Proximity Search in Multi- Dimensional Massive Database
An Investigation on Multi-Token List Based Proximity Search in Multi- Dimensional Massive Database Haiying Shen, Ze Li, Ting Li Department of Computer Science and Engineering University of Arkansas, Fayetteville,
More informationNearest Neighbor Search by Branch and Bound
Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to
More informationTransforming Range Queries To Equivalent Box Queries To Optimize Page Access
Transforming Range Queries To Equivalent Box Queries To Optimize Page ccess Sakti Pramanik lok Watve Chad R. Meiners lex Liu Department of Computer Science and Engineering Michigan State University East
More informationDynamically Optimizing High-Dimensional Index Structures
Proc. 7th Int. Conf. on Extending Database Technology (EDBT), 2000. Dynamically Optimizing High-Dimensional Index Structures Christian Böhm and Hans-Peter Kriegel University of Munich, Oettingenstr. 67,
More informationUsing Association Rules for Better Treatment of Missing Values
Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University
More informationDOCUMENT CLUSTERING USING HIERARCHICAL METHODS. 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar. 3. P.Praveen Kumar. achieved.
DOCUMENT CLUSTERING USING HIERARCHICAL METHODS 1. Dr.R.V.Krishnaiah 2. Katta Sharath Kumar 3. P.Praveen Kumar ABSTRACT: Cluster is a term used regularly in our life is nothing but a group. In the view
More informationOUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS
OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS DEEVI RADHA RANI Department of CSE, K L University, Vaddeswaram, Guntur, Andhra Pradesh, India. deevi_radharani@rediffmail.com NAVYA DHULIPALA
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW ON CONTENT BASED IMAGE RETRIEVAL BY USING VISUAL SEARCH RANKING MS. PRAGATI
More informationA LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION
A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION Ming-Hwa Sheu, Sh-Chi Tsai and Ming-Der Shieh Dept. of Electronic Eng., National Yunlin Univ. of Science and Technology, Yunlin,
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationA Novel Method to Estimate the Route and Travel Time with the Help of Location Based Services
A Novel Method to Estimate the Route and Travel Time with the Help of Location Based Services M.Uday Kumar Associate Professor K.Pradeep Reddy Associate Professor S Navaneetha M.Tech Student Abstract Location-based
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationMamatha Nadikota et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (4), 2011,
Hashing and Pipelining Techniques for Association Rule Mining Mamatha Nadikota, Satya P Kumar Somayajula,Dr. C. P. V. N. J. Mohan Rao CSE Department,Avanthi College of Engg &Tech,Tamaram,Visakhapatnam,A,P..,India
More informationData Partitioning Method for Mining Frequent Itemset Using MapReduce
1st International Conference on Applied Soft Computing Techniques 22 & 23.04.2017 In association with International Journal of Scientific Research in Science and Technology Data Partitioning Method for
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationFUFM-High Utility Itemsets in Transactional Database
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 3, March 2014,
More informationImage Similarity Measurements Using Hmok- Simrank
Image Similarity Measurements Using Hmok- Simrank A.Vijay Department of computer science and Engineering Selvam College of Technology, Namakkal, Tamilnadu,india. k.jayarajan M.E (Ph.D) Assistant Professor,
More informationTHE CLASSIFICATION OF HIGH DIMENSIONAL INDICES FOR SPATIAL DATA SIMILARITY SEARCH
THE CLASSIFICATION OF HIGH DIMENSIONAL INDICES FOR SPATIAL DATA SIMILARITY SEARCH Yu XIA a, *, Xinyan ZHU b, Chang LI a a School of Remote Sensing and Information Engineering, Wuhan University - geoxy@26.com
More informationA NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY
A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY S.Shiva Reddy *1 P.Ajay Kumar *2 *12 Lecterur,Dept of CSE JNTUH-CEH Abstract Optimal route search using spatial
More information