Index and Search for Hierarchical Key-value Data Store

Size: px
Start display at page:

Download "Index and Search for Hierarchical Key-value Data Store"

Transcription

1 Journal of Computational Information Systems 11: 4 (2015) Available at Index and Search for Hierarchical Key-value Data Store Yongqing ZHU, Yang YU, Willie NG, Samsudin JUNIARTO Data Storage Institute, A*STAR (Agency for Science, Technology and Research), Singapore , Singapore Abstract Existing key-value stores support web applications with high scalability and availability. However, the limited data access via primary keys has restricted data retrieval from key-value stores. In this paper, we have proposed a searchable key-value store with hierarchical data model, HierKV, to support fast data retrieval and search with secondary attributes. HierKV accelerates data retrieval by attributes via a hierarchical data model and index structure. A hierarchical TF-IDF ranking mechanism is proposed for HierKV to properly reflect keyword matches in different hierarchies. Experiments have shown that the proposed HierKV system can achieve good search performance. Keywords: Key-value Store; Data Retrieval; Hierarchy; Keyword Search; Ranking 1 Introduction Key-value stores [1-3] have emerged in recent years to provide distributed data storage for largescale web applications. The high scalability and availability offered by key-value stores are much valuable properties that traditional database systems cannot afford. However, in most key-value store systems, data access is provided at the granularity of primary keys with simple APIs get, put, and delete. While modern applications require retrieving data with attributes other than the primary keys, it is necessary to index key-value data with secondary attributes and enable rich query features in key-value stores. In many applications, data are correlated by attributes and the correlated data are normally retrieved together. A good example is online forums, where an original post can have multiple reply posts, and a reply post can have its own reply posts as well. All these posts are correlated by the same topic and can form a hierarchical tree with the original post in the root and reply posts in the middle and leaf nodes. Traditional database systems store these correlated posts as individual data records. When these posts are retrieved based on topics, multiple disk reads are needed to read the individual data records one by one. If the correlated posts could be stored as a single data record, a single disk read will retrieve all these posts with much faster performance. There is a need to enclose all correlated data in a single data record properly. This paper presents a high-performance searchable key-value store system, HierKV, which keeps data in a hierarchical data model and provides rich search features as well. HierKV groups the Corresponding author. address: yqzhu@dsi.a-star.edu.sg (Yongqing ZHU) / Copyright 2015 Binary Information Press DOI: /jcis13141 February 15, 2015

2 1180 Y. Zhu et al. /Journal of Computational Information Systems 11: 4 (2015) correlated data together and encloses them in a single hierarchical data record, which can speed up data retrieval by the attributes. An index structure is presented to index HierKV data, so that HierKV can provide quick search by the secondary attributes. In addition, a hierarchical TF-IDF ranking mechanism is proposed for HierKV to rank the search results to properly reflect different relevance degrees of keyword matching in different hierarchies. The rest of the paper is organized as follows: Part 2 describes the HierKV data model and API; Part 3 elaborates HierKV data retrieval including index and search, as well as the novel ranking mechanism; Part 4 includes experiments and evaluation; and conclusions are included in Part 5. 2 HierKV Data Model and API Most key-value stores deploy either ring-based architecture [2-4] or tablet-based architecture [1, 5]. These systems generally operate via the basic key-value interfaces and access data by primary keys only. The proposed HierKV system provides not only the basic interfaces for key-value data access, but also rich query interface for search with secondary attributes. HierKV stores data as records in tables. Each record consists of a primary key as identifier, and one or more secondary attributes. In order to accelerate data retrieval by attributes, HierKV groups the correlated data by the secondary attributes and encloses them within a single data record. We introduce a hierarchical structure to organize the value of data record since the value may contain different levels/hierarchies of information, e.g. different posts in a hierarchical tree. Each hierarchy includes one or more attributes and the corresponding values. Consider a table named P osts that keeps information for all original posts and their reply posts in an online forum. An original post and its reply posts can form a hierarchical tree structure as shown in Fig. 1. Any recommendation on iphone? Is there anybody having experience in iphone? Which provider is better: SingTel, Starhub or M1? Re: Any recommendation on iphone? I only have experience with Starhub, not bad. Re: Re: Any recommendation on iphone? I heard Starhub is cheaper than SingTel, right? Re: Any recommendation on iphone? Can iphone use the same number? Re: Re: Any recommendation on iphone? Prepaid does not enjoy the portability from one SIM to another. Re: Re: Any recommendation on iphone? You d better check with service provider on this portability issue. Re: Re: Re: Any recommendation on iphone? Which department is responsible for this? Fig. 1: Hierarchical tree formed by correlated data Fig. 2 illustrates how the group of posts in Fig. 1 are organized in a single data record in table Posts with hierarchical model. The record is identified by the primary key post The value includes four hierarchies of information, representing four levels of posts in the hierarchical tree. The higher hierarchy resides in the value of the previous lower hierarchy. In the example record, the first hierarchy contains attributes T opic,, Reply1, Reply2, etc. that correspond to the original post and its direct reply posts in the tree, followed by the second

3 Y. Zhu et al. /Journal of Computational Information Systems 11: 4 (2015) hierarchy and so on. With this hierarchy model, all posts correlated by the same topic can be enclosed within a single data record. When data is retrieved by topics, all these posts can be retrieved with a single disk read quickly. 1 st hierarchy 2 nd hierarchy 3 rd hierarchy 4 th hierarchy Topic Reply 1 Reply 2 post_ Any recomm endation n on iphone Is there anybody having experience in iphone? Which provider is better: SingTel, Starhub or M1? I only have experien ce with Starhub, not bad. Reply1 I heard Starhub is cheaper than SingTel, right? Can iphone use the same number? Reply1 Prepaid does not enjoy the portability from one SIM to another. You d better check with service provider on this portability issue. Reply2 Reply1 Which department is responsible for this? Primary key Value Fig. 2: Example data record with hierarchical model HierKV provides rich APIs for data access and search. The basic APIs include get, put, and delete to read, write/update, and delete data with primary keys from the system. In addition, a search API is provided to search and retrieve data based on the secondary attributes. It defines a set of operations (e.g. $equal, $more, $less, $has, $phrase, etc.) to facilitate full-text keyword search by point search, range search, and phrase search. The search API provides t- wo attributes keyword and path to define the query command. An example query command used to search data records that contain keyword within attribute T opic can be expressed as: { search : { keyword : { $has :, path : { $equal : T opic in JSON format. The detailed search features will be described in Part Data Retrieval for HierKV Generally, data retrieval from key-value stores is based on primary keys only. If the application needs to retrieve data by other attributes, it will take long time to scan the whole data collection and match to the attributes. Recently, some key-value stores [6-8] appear to support search with secondary attributes. However, both [6] and [7] did not index data by the secondary attributes, so they need to traverse the partitions to identify the potential search results. In HierKV, data records are indexed with the secondary attributes when they are inserted the first time. Search and data retrieval can be accelerated by matching the query keywords against the index records instead of scanning the whole data collection. HierKV uses a hierarchical structure to organize the correlated data, with different levels of information stored in different hierarchies. The query commands can use this hierarchy information to search data records with given keywords inside specific attributes. Hence the index record should keep the hierarchy information to fulfill such query requirements. Keyword matches in different hierarchies have different degrees of relevance to the query. We propose a hierarchical TF-IDF ranking mechanism for HierKV to rank the search results with considering hierarchy. 3.1 Index structure Inverted index is used to index the hierarchical data in HierKV to accelerate full-text keyword search. Each unique term in the value of data records is indexed. Index records are stored in the dedicated index tables to keep the link between indexed terms and original data records. Here we

4 1182 Y. Zhu et al. /Journal of Computational Information Systems 11: 4 (2015) use data record to indicate the original data in the data tables, and index record to indicate the index of original data in the index tables. The hierarchy information of each term in the data record is maintained as relative path in the index record. Similar to the data record, the index record consists of a key and value as well. The key of the index record is the stemmed version of terms. The value of the index record is organized in hierarchies, each with one or more attributes and the corresponding values. Besides the relative path indicating hierarchy information, other necessary information (term frequency, document frequency, etc.) is also maintained in the index record for ranking purpose. Fig. 3 shows how the example data record in Fig. 2 is indexed with the key. The index record maintains the link between actual term and all data records in table Posts that include the actual terms, and the keys of these data records are stored in the value. The table Posts has total 28 data records including actual term. One of the original data records is post , where actual term appears 4 times in different relative path/hierarchies. Term appears in the position 4, 7, 2, and 1, respectively in relative path T opic,, Reply2, and Reply2 Reply1. The relative path includes attribute names of all family hierarchies of term that indicates the hierarchy information of in the data record. By keeping the hierarchy information in the indexed records, the system can support search within specific hierarchy. DF post_ post_ TF Topic 4 4 Reply2-7 2 Reply2- Reply1-1 Fig. 3: Example index record for HierKV 3.2 Search with hierarchical KVS To accelerate search with the secondary attributes, a search tree can be constructed by sorting and linking all index records to each other. Each index record is a node in the tree, either root node, intermediate node, or leaf node. Typical search tree for HierKV can be Binary Tree [9], B Tree [10], B+ Tree [11], etc. A search starts from the root node, and then recursively traverses to the leaf nodes until the query keyword matches with the key of node (index record). Before searching, the query keywords are split into individual terms and each term is stemmed. Then these stemmed terms are matched with the keys of nodes (index records) when traversing the search tree. After finding the index record whose key matches with the stemmed term, the value of index record is checked to get the keys of data records that include the actual terms. Besides keyword, the query command can include relative path that specifies which hierarchy the matched actual term should reside in. When checking the value of index record, the relative path of the actual term needs to be matched with the path in query. Let s take query command: { search : { keyword : { $has :, path : { $equal : T opic as an example. HierKV will first find the index record whose key matches with the term. Then the value of this index record is examined. Only the keys of data records including actual term in the relative path T opic will be extracted as search results. HierKV supports phrase search by utilizing the position information of the actual terms kept

5 Y. Zhu et al. /Journal of Computational Information Systems 11: 4 (2015) in the index records. Since keywords appear contiguously in the phrase, the returned keys of data records should contain the adjacent terms in the same hierarchy. For example, a query for keywords ip hone requires returning the keys of data records that contain adjacent terms and ip hone in the same hierarchy. HierKV will match the keyword for and ip hone, and check the hierarchy and position information for these terms. S- ince these two terms appear in the contiguous positions in hierarchy T opic,, and Reply2 in data record post , it will be returned as one of the search results. 3.3 Hierarchical TF-IDF ranking mechanism Ranking mechanism plays an important role in a search system, which decides how relevant the retrieved data are related to the query. Many search systems have used TF-IDF ranking mechanism [12-13] to calculate scores for retrieved data records. In the literature, concerns have arisen regarding the inappropriate term weighting on the final score. [14-15] have proposes term weighting schemes to improve the TF-IDF score for document retrieval. We propose a hierarchical TF-IDF ranking mechanism for HierKV to facilitate data retrieval with consideration of hierarchy. The score of each retrieved data record is decided by various parameters: term frequency, document frequency, hierarchy factor, etc. As keyword matches in different hierarchies have different degrees of relevance to the query, TF of each term is normalized with a hierarchy factor to improve the accuracy of TF-IDF score. In HierKV, Eq. (1) defines the enhanced TF-IDF score s t,d,k for matching of term t in data record d at a specific hierarchy k: s t,d,k = h t,k tf idf t,d,k = h t,k tf t,d,k log N df t (1) Where h t,k is the hierarchy factor indicating that term t occurs in the k th hierarchy within data record d, df t is the document frequency that expresses the total number of data records containing term t in the data collection, tf t,d,k is the term frequency showing the number of times term t occurring in the k th hierarchy in data record d, and N is the total number of data records in the data collection. The hierarchy factor h t,k is a function that can properly reflect the relevance degree of the hierarchy to the query. Considering there may be multiple matches of term t in data record d and each match may have different hierarchy factor h t,k, the accumulated TF-IDF score S t,d for term t in data record d is defined in Eq. (2): S t,d = k s t,d,k = k (h t,k tf t,d,k ) log N df t (2) With the hierarchical TF-IDF ranking mechanism, the search results can reflect more accuracy of relevance to the query than ranking with traditional TF-IDF scores. Fig. 4 shows the pseudo code of a search in HierKV including both searching and ranking. 4 Experiments and Evaluation We have conducted experiments to evaluate performance for the proposed HierKV system. Two kinds of performance are evaluated: searching and ranking. The first experiment aims to evaluate

6 1184 Y. Zhu et al. /Journal of Computational Information Systems 11: 4 (2015) Q: query command including a list of keywords {K 1, K 2, K n R: query result including a list of data record keys R = ; for (each keyword K i in Q) { stem K i into stemmed term t i ; while (search the index tree) { match t i with the key of each index record; if (t i matched with the key of index record r i ) { check the value part of index record r i ; get keys of data records {d i1, d i2, d im each containing the actual term k i ; for (each data record key d ij ) { get the value of term frequency tf and document frequency df; calculate the accumulated TF-IDF score: N s ; i,j hi,k tfi,j, k log k dfi merge {d i1, d i2, d im with R by intersection; for (each data record key d j in R) { calculate the ranking score: s j s i, j ; i rank the data record keys in R with descending order of ranking score s j ; return the ranked result R; Fig. 4: Search and rank of hierarchical KVS in HierKV P recision and Recall for HierKV system when searching with keywords. P recision and Recall are expressed in Eq. (3) and (4), respectively. P recision = Recall = {RelevantData {RetrievedData {RetrievedData {RelevantData {RetrievedData {RelevantData (3) (4) The collection of test data is downloaded from CACM that includes 3204 records of articles from the Communications of the ACM ( ). We sorted all records by size and chosen the first 100 biggest records for testing. Each record contains several fields about the article (e.g. title, abstract, authors, etc.). We formatted each record to a two-hierarchy structure, and selected five most common terms computer, program, storage, structure, and system as query keywords. All five query commands were issued to the search system with hierarchical TF-IDF ranking mechanism. One of the commands is: { search : { keyword : { $has : computer. The accumulated TF-IDF score (Eq. (2)) is used to rank the search results. Here the hierarchy factor for hierarchy i is defined as: h i = i 1, which reflect the higher hierarchies having higher degrees of relevance to the query. After searching and ranking, five query results are returned by HierKV system. We have verified the query results with the test data, and found that the retrieved data set are exactly the same to the relevant data set for all query commands. It means the HierKV system can achieve the highest value 1 for both P recision and Recall for all queries. Another experiment is used to evaluate the ranking efficiency of the proposed hierarchical TF- IDF ranking mechanism. It contains 16 test data each with two hierarchies as shown below. Term computer appears in either the first or the second hierarchy with different occurrences. Two query commands with keyword computer are issued to the HierKV system, while the

7 Y. Zhu et al. /Journal of Computational Information Systems 11: 4 (2015) proposed hierarchical TF-IDF ranking mechanism and normal TF-IDF ranking mechanism are used (with Hierarchy set to 1 and 0) respectively. The accumulated TF-IDF score (Eq. (2)) is calculated with the hierarchy factor for hierarchy i defined as: h i = i 1. After searching and ranking, the results returned by the HierKV system for the two query commands are as follows. From the results, we can find that the sequences of the returned keys are different for these two queries. The hierarchical TF-IDF ranking mechanism has given higher rank to the data with more keyword matches in the higher hierarchy, which reflects different hierarchies having different degrees of relevance to the query. The normal TF-IDF ranking mechanism ranks data just according to the keyword occurrences, without properly reflecting different degrees of relevance of different hierarchies to the query. 5 Conclusions In this paper, we have proposed a searchable key-value store HierKV to provide rich search features and fast data retrieval. The correlated data are grouped and organized in a hierarchical data object in HierKV to speed up data retrieval by attributes. HierKV indexes data objects

8 1186 Y. Zhu et al. /Journal of Computational Information Systems 11: 4 (2015) by the secondary attributes and keep the hierarchy information in the index to facilitate rich search features. A hierarchical TF-IDF ranking mechanism has been proposed for HierKV to apply hierarchy factors to the TF-IDF scores, thus to properly reflect different relevance degrees of keyword matching in different hierarchies. Experiments have been conducted to test the HierKV search system and evaluate the hierarchical TF-IDF ranking mechanism. According to the experimental results, the proposed HierKV system has achieved good performance with relatively high value of P recision and Recall. Compared to normal TF-IDF ranking mechanism, our hierarchical TF-IDF ranking mechanism can properly reflect different hierarchies having different degrees of relevance to the query. References [1] F. Chang, etl. BigTable: A Distributed Storage System for Structured Data. In Proc. of OSDI, Nov. 2006, pp [2] G. DeCandia, etl. Dynamo: Amazon s Highly Available Key-Value Store. In Proc. of SOSP, Oct. 2007, pp [3] A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev., Apr. 2010, Vol. 44, No. 2, pp [4] Project Voldemort, [5] Apache HBase, [6] R. Escriva, B. Wong, and E. G. Sirer. Hyperdex: A Distributed, Searchable Key-Value Store. In Proc. of ACM SIGCOMM, Aug. 2012, pp [7] M.T. Najaran and N.C. Hutchinson. Innesto: A Searchable Key/Value Store for Highly Dimensional Data. In Proc. of IEEE CloudCom, Dec. 2013, pp [8] Bin Liang, Yiqun Liu, Min Zhang, Shaoping Ma, Liyun Ru, Kuo Zhang. THUIR-DB: A Largescale, Highly-efficient Index, Fast-access Key-value Store. Journal of Computational Information Systems 9: 6 (2013), pp [9] Rowan Garnier and John Taylor. Discrete Mathematics: Proofs, Structures and Applications. Third Edition, CRC Press, [10] Douglas Comer. The Ubiquitous B-Tree. ACM Computing Surveys, Vol. 11, Issue 2, June 1979, pp [11] Ramez Elmasri and Shamkant B. Navathe. Fundamentals of database systems. 6th Edition, Addison- Wesley, [12] G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA, [13] G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18 (11): , Nov [14] Jiaul H. Paik. A Novel TF-IDF Weighting Scheme for Effective Ranking. In Proc. of ACM SIGIR, July 2013, pp [15] Ho Chung WU, etl. Interpreting TF-IDF Term Weights as Making Relevance Decisions. ACM Transactions on Information Systems, Vol. 26, No. 3, Article 13, June 2008.

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

Concept-Based Document Similarity Based on Suffix Tree Document

Concept-Based Document Similarity Based on Suffix Tree Document Concept-Based Document Similarity Based on Suffix Tree Document *P.Perumal Sri Ramakrishna Engineering College Associate Professor Department of CSE, Coimbatore perumalsrec@gmail.com R. Nedunchezhian Sri

More information

Cassandra- A Distributed Database

Cassandra- A Distributed Database Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional

More information

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488) Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-

More information

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News

Feature Selecting Model in Automatic Text Categorization of Chinese Financial Industrial News Selecting Model in Automatic Text Categorization of Chinese Industrial 1) HUEY-MING LEE 1 ), PIN-JEN CHEN 1 ), TSUNG-YEN LEE 2) Department of Information Management, Chinese Culture University 55, Hwa-Kung

More information

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach P.T.Shijili 1 P.G Student, Department of CSE, Dr.Nallini Institute of Engineering & Technology, Dharapuram, Tamilnadu, India

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

Weighted Suffix Tree Document Model for Web Documents Clustering

Weighted Suffix Tree Document Model for Web Documents Clustering ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS 10) Jinggangshan, P. R. China, 2-4, April. 2010, pp. 165-169 Weighted Suffix Tree

More information

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100

More information

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch 619 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

An Efficient Distributed B-tree Index Method in Cloud Computing

An Efficient Distributed B-tree Index Method in Cloud Computing Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 214, 8, 32-38 32 Open Access An Efficient Distributed B-tree Index Method in Cloud Computing Huang Bin 1,*

More information

A New Technique to Optimize User s Browsing Session using Data Mining

A New Technique to Optimize User s Browsing Session using Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH

A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A RECOMMENDER SYSTEM FOR SOCIAL BOOK SEARCH A thesis Submitted to the faculty of the graduate school of the University of Minnesota by Vamshi Krishna Thotempudi In partial fulfillment of the requirements

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags Hadi Amiri 1,, Yang Bao 2,, Anqi Cui 3,,*, Anindya Datta 2,, Fang Fang 2,, Xiaoying Xu 2, 1 Department of Computer Science, School

More information

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL

IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL IMPROVING THE RELEVANCY OF DOCUMENT SEARCH USING THE MULTI-TERM ADJACENCY KEYWORD-ORDER MODEL Lim Bee Huang 1, Vimala Balakrishnan 2, Ram Gopal Raj 3 1,2 Department of Information System, 3 Department

More information

The p-sized partitioning algorithm for fast computation of factorials of numbers

The p-sized partitioning algorithm for fast computation of factorials of numbers J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006

More information

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts

Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Balancing Manual and Automatic Indexing for Retrieval of Paper Abstracts Kwangcheol Shin 1, Sang-Yong Han 1, and Alexander Gelbukh 1,2 1 Computer Science and Engineering Department, Chung-Ang University,

More information

Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation

Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation Shashank Gugnani BITS-Pilani, K.K. Birla Goa Campus Goa, India - 403726 Rajendra Kumar Roul BITS-Pilani, K.K. Birla Goa Campus Goa,

More information

Context Based Web Indexing For Semantic Web

Context Based Web Indexing For Semantic Web IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 89-93 Anchal Jain 1 Nidhi Tyagi 2 Lecturer(JPIEAS) Asst. Professor(SHOBHIT

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University

Cloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

A Review to the Approach for Transformation of Data from MySQL to NoSQL

A Review to the Approach for Transformation of Data from MySQL to NoSQL A Review to the Approach for Transformation of Data from MySQL to NoSQL Monika 1 and Ashok 2 1 M. Tech. Scholar, Department of Computer Science and Engineering, BITS College of Engineering, Bhiwani, Haryana

More information

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM Ms.Susan Geethu.D.K 1, Ms. R.Subha 2, Dr.S.Palaniswami 3 1, 2 Assistant Professor 1,2 Department of Computer Science and Engineering, Sri Krishna

More information

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao Large Scale Chinese News Categorization --based on Improved Feature Selection Method Peng Wang Joint work with H. Zhang, B. Xu, H.W. Hao Computational-Brain Research Center Institute of Automation, Chinese

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI

More information

Query Evaluation Strategies

Query Evaluation Strategies Introduction to Search Engine Technology Term-at-a-Time and Document-at-a-Time Evaluation Ronny Lempel Yahoo! Research (Many of the following slides are courtesy of Aya Soffer and David Carmel, IBM Haifa

More information

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs

More information

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

Query Processing and Alternative Search Structures. Indexing common words

Query Processing and Alternative Search Structures. Indexing common words Query Processing and Alternative Search Structures CS 510 Winter 2007 1 Indexing common words What is the indexing overhead for a common term? I.e., does leaving out stopwords help? Consider a word such

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

A Tree-based Inverted File for Fast Ranked-Document Retrieval

A Tree-based Inverted File for Fast Ranked-Document Retrieval A Tree-based Inverted File for Fast Ranked-Document Retrieval Wann-Yun Shieh Tien-Fu Chen Chung-Ping Chung Department of Computer Science and Information Engineering National Chiao Tung University Hsinchu,

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information

Comparison of Online Record Linkage Techniques

Comparison of Online Record Linkage Techniques International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 p-issn: 2395-0072 www.irjet.net Comparison of Online Record Linkage Techniques Ms. SRUTHI.

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Column-Family Databases Cassandra and HBase

Column-Family Databases Cassandra and HBase Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed

More information

Term Graph Model for Text Classification

Term Graph Model for Text Classification Term Graph Model for Text Classification Wei Wang, Diep Bich Do, and Xuemin Lin University of New South Wales, Australia {weiw, s2221417, lxue}@cse.unsw.edu.au Abstract. Most existing text classification

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

International Journal of Modern Trends in Engineering and Research e-issn: p-issn:

International Journal of Modern Trends in Engineering and Research  e-issn: p-issn: International Journal of Modern Trends in Engineering and Research www.ijmter.com Fragmentation as a Part of Security in Distributed Database: A Survey Vaidik Ochurinda 1 1 External Student, MCA, IGNOU.

More information

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these

More information

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-based Retrieval Enhancing Internet Search Engines to Achieve Concept-based Retrieval Fenghua Lu 1, Thomas Johnsten 2, Vijay Raghavan 1 and Dennis Traylor 3 1 Center for Advanced Computer Studies University of Southwestern

More information

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages

An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages An Efficient Technique for Tag Extraction and Content Retrieval from Web Pages S.Sathya M.Sc 1, Dr. B.Srinivasan M.C.A., M.Phil, M.B.A., Ph.D., 2 1 Mphil Scholar, Department of Computer Science, Gobi Arts

More information

Relevance of a Document to a Query

Relevance of a Document to a Query Relevance of a Document to a Query Computing the relevance of a document to a query has four parts: 1. Computing the significance of a word within document D. 2. Computing the significance of word to document

More information

Retrieval of Highly Related Documents Containing Gene-Disease Association

Retrieval of Highly Related Documents Containing Gene-Disease Association Retrieval of Highly Related Documents Containing Gene-Disease Association K. Santhosh kumar 1, P. Sudhakar 2 Department of Computer Science & Engineering Annamalai University Annamalai Nagar, India. santhosh09539@gmail.com,

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

Impact of Term Weighting Schemes on Document Clustering A Review

Impact of Term Weighting Schemes on Document Clustering A Review Volume 118 No. 23 2018, 467-475 ISSN: 1314-3395 (on-line version) url: http://acadpubl.eu/hub ijpam.eu Impact of Term Weighting Schemes on Document Clustering A Review G. Hannah Grace and Kalyani Desikan

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

SURVEY ON SMART ANALYSIS OF CCTV SURVEILLANCE

SURVEY ON SMART ANALYSIS OF CCTV SURVEILLANCE International Journal of Computer Engineering and Applications, Volume XI, Special Issue, May 17, www.ijcea.com ISSN 2321-3469 SURVEY ON SMART ANALYSIS OF CCTV SURVEILLANCE Nikita Chavan 1,Mehzabin Shaikh

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback RMIT @ TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback Ameer Albahem ameer.albahem@rmit.edu.au Lawrence Cavedon lawrence.cavedon@rmit.edu.au Damiano

More information

THE weighting functions of information retrieval [1], [2]

THE weighting functions of information retrieval [1], [2] A Comparative Study of MySQL Functions for XML Element Retrieval Chuleerat Jaruskulchai, Member, IAENG, and Tanakorn Wichaiwong, Member, IAENG Abstract Due to the ever increasing information available

More information

Five Level Schema Architecture Of Distributed Database

Five Level Schema Architecture Of Distributed Database Five Level Schema Architecture Of Distributed Database A simple distributed relational database is shown in Figure 2 where the application article use the DB2 sample corporate database, created in schema

More information

Code Transformation of DF-Expression between Bintree and Quadtree

Code Transformation of DF-Expression between Bintree and Quadtree Code Transformation of DF-Expression between Bintree and Quadtree Chin-Chen Chang*, Chien-Fa Li*, and Yu-Chen Hu** *Department of Computer Science and Information Engineering, National Chung Cheng University

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming Florian Boudin LINA - UMR CNRS 6241, Université de Nantes, France Keyphrase 2015 1 / 22 Errors made by

More information

B-Trees & its Variants

B-Trees & its Variants B-Trees & its Variants Advanced Data Structure Spring 2007 Zareen Alamgir Motivation Yet another Tree! Why do we need another Tree-Structure? Data Retrieval from External Storage In database programs,

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction Organizing Internet Bookmarks using Latent Semantic Analysis and Intelligent Icons Note: This file is a homework produced by two students for UCR CS235, Spring 06. In order to fully appreacate it, it may

More information

Supervised classification of law area in the legal domain

Supervised classification of law area in the legal domain AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms

More information

Smart Sort and its Analysis

Smart Sort and its Analysis Smart Sort and its Analysis Varun Jain and Suneeta Agarwal Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Allahabad-211004, Uttar Pradesh, India. varun_jain22@yahoo.com,

More information

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C

Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of C Optimum Alphabetic Binary Trees T. C. Hu and J. D. Morgenthaler Department of Computer Science and Engineering, School of Engineering, University of California, San Diego CA 92093{0114, USA Abstract. We

More information

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents Send Orders for Reprints to reprints@benthamscience.ae 676 The Open Automation and Control Systems Journal, 2014, 6, 676-683 Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving

More information

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4 Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Digital Library Information Categorization, Visualization, and Retrieval

Digital Library Information Categorization, Visualization, and Retrieval Digital Library Information Categorization, Visualization, and Retrieval Jim X. Chen, Ken Alford and David Grossman Department of Computer Science George Mason University Ophir Frieder Department of Computer

More information

Large-Scale Data Stores and Probabilistic Protocols

Large-Scale Data Stores and Probabilistic Protocols Distributed Systems 600.437 Large-Scale Data Stores & Probabilistic Protocols Department of Computer Science The Johns Hopkins University 1 Large-Scale Data Stores and Probabilistic Protocols Lecture 11

More information

Linear Quadtree Construction in Real Time *

Linear Quadtree Construction in Real Time * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 1917-1930 (2010) Short Paper Linear Quadtree Construction in Real Time * CHI-YEN HUANG AND YU-WEI CHEN + Department of Information Management National

More information

A Novel PAT-Tree Approach to Chinese Document Clustering

A Novel PAT-Tree Approach to Chinese Document Clustering A Novel PAT-Tree Approach to Chinese Document Clustering Kenny Kwok, Michael R. Lyu, Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

indexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa

indexing and query processing. The inverted le was constructed for the retrieval target collection which contains full texts of two years' Japanese pa Term Distillation in Patent Retrieval Hideo Itoh Hiroko Mano Yasushi Ogawa Software R&D Group, RICOH Co., Ltd. 1-1-17 Koishikawa, Bunkyo-ku, Tokyo 112-0002, JAPAN fhideo,mano,yogawag@src.ricoh.co.jp Abstract

More information

P-RANK: Efficient Ranked Keyword Search Using P-tree

P-RANK: Efficient Ranked Keyword Search Using P-tree P-RANK: Efficient Ranked Keyword Search Using P-tree Fei Pan, Imad Rahal, Yue Cui, William Perrizo Computer Science Department North Dakota State University Fargo, ND 585 Tel: (7) 2-6257 Fax: (7) 2-8255

More information

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16

Federated Search. Jaime Arguello INLS 509: Information Retrieval November 21, Thursday, November 17, 16 Federated Search Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu November 21, 2016 Up to this point... Classic information retrieval search from a single centralized index all ueries

More information

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data.

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data. Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient and

More information

Textbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003.

Textbook(s) and other required material: Raghu Ramakrishnan & Johannes Gehrke, Database Management Systems, Third edition, McGraw Hill, 2003. Elective course in Computer Science University of Macau Faculty of Science and Technology Department of Computer and Information Science SFTW371 Database Systems II Syllabus 1 st Semester 2013/2014 Part

More information

Planar Point Location

Planar Point Location C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 04 Date: February 15, 1993 Scribe: John Bazik Planar Point Location 1 Introduction In range searching, a set of values,

More information

Picture Maze Generation by Repeated Contour Connection and Graph Structure of Maze

Picture Maze Generation by Repeated Contour Connection and Graph Structure of Maze Computer Science and Engineering 2013, 3(3): 76-83 DOI: 10.5923/j.computer.20130303.04 Picture Maze Generation by Repeated Contour Connection and Graph Structure of Maze Tomio Kurokawa Department of Information

More information

Retrieving Model for Design Patterns

Retrieving Model for Design Patterns Retrieving Model for Design Patterns 51 Retrieving Model for Design Patterns Sarun Intakosum and Weenawadee Muangon, Non-members ABSTRACT The purpose of this research is to develop a retrieving model for

More information

Indexing and Query Processing

Indexing and Query Processing Indexing and Query Processing Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu January 28, 2013 Basic Information Retrieval Process doc doc doc doc doc information need document representation

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Lecture 13. Lecture 13: B+ Tree

Lecture 13. Lecture 13: B+ Tree Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,

More information