Delta-K 2 -tree for Compact Representation of Web Graphs

Size: px
Start display at page:

Download "Delta-K 2 -tree for Compact Representation of Web Graphs"

Transcription

1 Delta-K 2 -tree for Compact Representation of Web Graphs Yu Zhang 1,2, Gang Xiong 1,, Yanbing Liu 1, Mengya Liu 1,2, Ping Liu 1, and Li Guo 1 1 Institute of Information Engineering, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences {zhangyu,xionggang,liuyanbing,liumengya,liuping,guoli}@iie.ac.cn Abstract. The World Wide Web structure can be represented by a directed graph named as the web graph. The web graphs have been used in a wide range of applications. However, the increasingly large-scale web graphs pose great challenges to the traditional memory-resident graph algorithms. In the literature, K 2 -tree can efficiently compress the web graphs while supporting fast querying in the compressed data. Inspired by K 2 -tree, we propose the Delta-K 2 -tree compression approach, which exploits the characteristics of similarity between neighbor nodes in the web graphs. In addition, we design a node reordering algorithm to further improve the compression ratio. We compare our approach with the state-of-the-art algorithms, including K 2 -tree,, and. Experimental results of web graph compression on four datasets show that our Delta-K 2 -tree approach outperforms the other three in compression ratio ( bits per link), and meanwhile supports fast forward and reverse querying in graphs. Keywords: Web graphs, Compact data structures, Graph compression, Adjacency matrix 1 Introduction In the applications of web management and mining, the World Wide Web structure can be represented by a directed graph, where each web page corresponds to a graph node and each hyperlink corresponds to a graph edge. Such a directed graph is known as web graph. Lots of basic algorithms and operations are based on the web graphs to analysis and mine the inner structure of the web. For example, some famous webpage ranking algorithms, such as Pagerank [1] and HITS [2] used in the primary search engines, are based on the web graph structure. Their key techniques are computing the out-degree and in-degree of each node and analysis the connected relations between different nodes. With the explosive development of the Internet, the scale of web graphs is growing at an amazing speed. To meet the need of large-scale graph data management, there Corresponding author.

2 Y. Zhang et al. is a trend towards studying efficient compression techniques and fast querying algorithms in recent years. Traditional methods for storing and manipulating the web graphs mostly store a graph in an adjacency matrix or list. In order to guarantee efficient querying, it requires the entire adjacency matrix or list to be loaded into the memory. However, it s not practical for the increasingly large scale of graph data with millions of nodes and edges to be memory-resident. According to the official report by CNNIC (China Internet Network Information Center) [3], the numbers of web pages and hyperlinks were about 86.6 billion and 1 trillion respectively by the end of 212 in China. This web graph has to be stored using adjacent list over 16TB. The huge memory space poses great challenge to the traditional storing methods. There exist three aspects of researches to solve excessive storage problem: (1) Storing the graph in external memory since the external memory is much cheaper and larger compared with main memory [4, 5]. (2) Using distributed system to partition the graph into small subgraphs and manipulating subgraphs in distributed computers [6 8]. (3) Converting the graph to compact form which requires less space while supporting fast querying [9 12]. In our research, we focus on the third aspect and aim to represent web graphs in highly compact form, thus manipulating huge graphs in main memory. In practice, such compression algorithm is beneficial for the former two aspects of research. For the external memory scheme, the locality of access will be promoted since much more compressed graph data is available in the main memory at one time. For the distributed system scheme, highly compact structure will allow fewer computers to do the same work and reduce the network traffic. Among all the algorithms for compressing graphs, K 2 -tree [11] is a representative algorithm with high compression ratio and fast querying performance. This algorithm uses an adjacency matrix to represent a graph and exploits its sparsity for effective compression. However, K 2 -tree ignores an important characteristic of the similarity between adjacent rows or columns in the adjacency matrix, which can be exploited for improving the compression ratio. In this paper, we proposed a new tree-form structure named as Delta-K 2 - tree. A series of experiments indicate that our approach outperforms K 2 -tree in compression ratio while still supporting fast querying. Furthermore, a node reordering algorithm is proposed to make better use of the similarity between nonadjacent rows or columns, which can further improve the compression ratio of Delta-K 2 -tree. 2 Related Works Researchers in the field of web graph compression are mostly interested in forming a compact representation which supports efficient querying operations, such as checking the connected relation between two page nodes, extracting the successors of any page node, etc. The most influential representative in this trend is [9] framework. When we use for compressing the web

3 Delta-K 2 -tree for Compact Representation of Web Graphs graphs made up by URLs (Uniform Resource Locator), the URLs have been previously sorted in lexicographical order aiming to make similar URLs appear in adjacent locations. According to the similarity between the adjacent URLs, the method achieves a good trade-off between compression ratio and querying speed. Variants of the [13 16] keep optimizing the storage space by more effective encoding and reordering techniques. With the same reordering process as in the previous stage, [12] further exploits the structural characteristics of the web graph adjacency matrix. In the research, six kinds of regular sub-graphs are extracted and compressed to achieve high compression ratio. Whereas the querying speed of finding all neighbors of the given page is particularly slow since the query requires numerous accesses to all the sub-graphs. Instead of using the lexicographical order, algorithm proposed in [1] reorders the web graph nodes based on the Bradth First Search (BFS) scheme. Taking the advantage of similarity between adjacent nodes in the adjacency list after node reordering, is competitive with in compression efficiency and querying speed. All approaches mentioned above just provide forward querying operation and that they can be simply converted into one that supports bidirectional querying operations. In [16], a web graph is divided into two sub-graphs, where one contains all bidirectional edges and the other contains all unidirectional edges. The method compresses both of the above two sub-graphs and a transposed graph of the unidirectional sub-graph. However, such extended methods require extra space to store the transposed graph. In [11], Brisaboa et. al. present a K 2 -tree structure that offers forward and reverse query without constructing the transposed graph. It highly considers the properties of large empty areas of the graph adjacency matrix and gives very good compression ratio. In this paper, we improve the performance of K 2 -tree via exploiting the similarity between adjacent nodes in the graph adjacency matrix and reordering nonadjacent nodes to further improve the compression ratio. We compared our method with the best alternatives in the literature, offering a series of space/time analysis according to the underlying experimental results. 3 Preliminary 3.1 Notation As used herein, a directed graph G = (V, E) indicates a web graph, where V represents the set of nodes and E represents the set of edges in the graph. Each node corresponds to a page and each edge corresponds to a link. Using n(n = V ) indicates the number of nodes and m = E indicates the number of edges. A square matrix {a i,j } only containing s and 1s indicates the adjacency matrix. a i,j is 1 if there is an edge from v i to v j and otherwise.

4 Y. Zhang et al. 3.2 K 2 -tree In [11], an unbalanced tree structure named K 2 -tree represents an adjacency matrix. In the K 2 -tree, each node stores 1 bit information, or 1. Every node in the last level of the K 2 -tree represents an element in the matrix and every other node represents a sub-matrix in the matrix. Except in the last level of the K 2 -tree, the node stored 1 corresponds to the sub-matrix containing at least one 1 and the node stored corresponds to the sub-matrix containing all s. In the phase of K 2 -tree construction, the n n adjacency matrix is divided into K 2 equal parts and each part is a n K n K sub-matrix. Each of the submatrixes corresponds to a child of the root of K 2 -tree. If and only if a submatrix contains at least one 1, the child is 1, otherwise the child is. For those children who are 1, go on dividing them into K 2 equal parts recursively until the sub-matrix contains all s or only one element. In real web graphs, m is far less than n 2 so that the adjacency matrix is extremely sparse. Due to the characteristic of sparsity, K 2 -tree achieves high compression ratio of the web graphs by using one node to represent a sub-matrix containing all s. [11] proves that, in the worst case, the total space of K 2 -tree is K 2 m(log K 2 n2 m + O(1)) bits which is asymptotically twice the informationtheoretic lower bound necessary to represent all the matrices of n n with m 1s. In the phase of query, for two given nodes v i and v j, we can use K 2 -tree to determine if a i,j is or 1. Using the root as the current node, find a child which represents the sub-matrix containing a i,j. a i,j is if the child stores. Otherwise, using the child as the current node, go on finding a child of the current node until the child stores or the current node has no child. a i,j is if the last node we find store and a i,j is 1 otherwise. In practice, if n is not a power of K, the matrix could be extended to K log K n K log K n by adding s at the right and the bottom. The K 2 -tree is stored in two bit arrays, T and L. T stores nodes except those in the last level via traversing the K 2 -tree level by level from left to right. L stores nodes in the last level from left to right. Fig. 1 shows an adjacency matrix and K 2 -tree according to the matrix when K = 2, and s in the grey area are added to solve the problem that n is not a power of K. In order to find a child of the given node of K 2 -tree efficiently by using T and L, T needs to permit Rank query. Rank(T, i)( i < T ) counts the number of 1s from position up to position i in T. The first position in T is. For example, a given node of K 2 -tree represented by T [i] has children if T [i] = 1, then the s-th child of the node is at position Rank(T, s) K 2 + s of T : L. T : L represent the connection of T and L. [17] proves that Rank can be calculated in constant time using sub-linear space. 3.3 Rank The implementation of Rank in [17] achieves very good results theoretically, however the realization is complicated. [18] proposes a simple implementation

5 Delta-K 2 -tree for Compact Representation of Web Graphs T = L = Fig. 1. The adjacency matrix and the corresponding K 2 -tree. and shows that in many practical cases the simpler solutions are better in terms of time and extra space. For a bit array T, the method uses an array R to store every B position of Rank, R[ i B ] = Rank(T, i B B), and uses an array popc to store number of 1s in all the different b-bit array. Then Rank(T, i) = R[ i B B]+ i b 1 k= popc[t [ i B B +k B, i B B +(k +1) B 1]+popc[T [ i b b, i b b + b 1]& }{{}... }{{} ], where T [i, j] indicates T from i-th position i mod b b (i mod b) to j-th position and B is a multiple of b. When the length of T is t, the length of R is t B and the length popc is 2b. Due to that we can use mm popcnt u64 in SSE (Streaming SIMD Extensions) to calculate the number of 1s in 64-bit integer, we set b to 64 and use T, an array of 64-bit integers, to store every 64 bits of the bit array in practical applications in Fig. 2. B is set to 2 w b for the convenience of the programming. As w increases, the computation increases and the space decreases simultaneously. procedure Rank(T, i) result := R[i>>(6+w)] // 2^6 is 64 for(k := (i>>(6+w))<<w, k < (i>>6)), k ++) result += _mm_popcnt_u64(t[k]) result += _mm_popcnt_u64(t[k]>>(x3f-i&x3f)) // x3f is 64 return result Fig. 2. The Rank algorithm. 4 Delta-K 2 -tree 4.1 Motivation By taking advantage of adjacency matrix s sparsity, K 2 -tree compresses the web graph efficiently and its space is k 2 m(log K 2 n2 m ) + O(1)) bits in the worst case. We prove Theorem 1 that as m decrease the total space of K 2 -tree, in the worst case, decreases when K and n are not changed. According to the theoretical analysis, if we can reduced the number of 1s and unchanged the size of the matrix simultaneously, it can reduce the space of K 2 -tree.

6 Y. Zhang et al. Theorem 1 The space of K 2 -tree of the sparse matrix, in the worst case, decreases with number of 1s decreases in the case of unchanging n and K. Proof. For y = K 2 m(log K 2 n2 m +O(1)), let a = K2, b = n 2, c = O(1), and x = n2 m, then y = a b x (log a x+c). The derivative of y is y. y = ab x 2 ln a (ln e a ln x). when c x > e a, y < and y decreases with x increases. According to sparsity of the c matrix, x is greater than e a obviously. c 4.2 Construction and Query The characteristic of similarity between neighbors of different pages has been found and is used widely in compression algorithms such as and. We also use the characteristic to reduce the number of 1s. We use a matrix named Delta-matrix to store the difference between adjacent rows or columns in the adjacency matrix. We take rows for example. The Delta-matrix can be constructed with the method in Fig. 3, where Count1s(matrix[i]) and Count- Dif(matrix[i], matrix[j]) represent the number of 1s in i-th row in the matrix and the number of differences between same positions in i-th row and j-th row in the matrix. D in Fig. 3 is a n-bit array to record which rows in the Delta-matrix represent the differences. According to the construction, the number of 1s in the Delta-matrix is not greater than that in the adjacency matrix. procedure Delta-matrix_Construction(matrix[n][n]) D[] := Delta-matrix[] := matrix[]. for(i := 1, i < n, i++) if(count1s(matrix[i]) < CountDif(matrix[i], matrix[i-1])) D[i] := Delta-matrix[i] := matrix[i] else D[i] := 1 create a n-bit array R for(k:=, k<n, k++) if(matrix[i][k] == matrix[i-1][k]) R[k] := else R[k] := 1 Delta-matrix[i] := R return Delta-matrix, D Fig. 3. The construction for Delta-matrix. The Delta-matrix and the n-bit array D instead of the adjacency matrix can be used to represent web graphs. We use {a i,j } to represent the adjacency matrix and {a i,j } to represent the Delta-matrix. Elements in the adjacency matrix can be obtained from the Delta-matrix and D by formulate (1) where means exclusive-or and s is the number of consecutive 1s in D from i-th position forward.

7 Delta-K 2 -tree for Compact Representation of Web Graphs a i,j = { a i,j, ifd[i] = a i,j a i 1,j... a i s,j, ifd[i] = 1 (1) We use K 2 -tree to compress the Delta-matrix instead of the adjacency matrix to reduce the space. However, we need to access the K 2 -tree of the Delta-matrix several times to obtain an element in the adjacency matrix. So if the number of consecutive 1s in D is very large, query will become very time-consuming. To resolve this problem, we propose two methods: (1) We replace nodes in the last level of K 2 -tree of the Delta-matrix with elements of the same positions in the adjacency matrix. We call the modified K 2 -tree Delta-K 2 -tree. For example in Fig. 4, the dotted line indicates nodes replaced. (2) When using Delta-K 2 -tree, if we access a node stored which is not in the last level, then it means all elements in the sub-matrix represented by the node are all s. So one access can obtain several elements. In practical applications using the above two methods, one query to obtain an element in the adjacency matrix merely needs about 2 accesses to Delta-K 2 -tree on average. In addition, Delta-K 2 -tree can use similarity between adjacent columns as same as adjacent rows, which can be selected according to the actual situation. D Delta-matrix K2-tree for Delta-matrix Delta-K2-tree Fig. 4. The K 2 -tree for the Delta-Matrix and the corresponding Delta-K 2 -tree 4.3 Nodes Reordering Delta-K 2 -tree uses the characteristic of similarity between adjacent nodes in web graphs. Actually, the similar nodes may not be adjacent. We can use nodes reordering method to change the order of nodes in the web graph to make better use of the characteristic. That is to find an order of nodes in order to obtain the Delta-matrix with the minimal 1s. We use a directed graph G = (V, E) to represent the similarity of nodes in the matrix. In this subsection, G does not represent the web graph. v i in V represents i-th node and the weights of e(v i, v j ) for every two different vertexes is the the minimum of the number of i-th node s neighbors and the number of difference between i-th node s neighbors and j-th node s neighbors. For an

8 Y. Zhang et al. n nodes web graph, there is a graph G containing n vertexes and n(n 1) edges. Every Hamiltonian path in G corresponds to an order of nodes in the web graphs and the weights of the path is the number of 1s in the Delta-matrix. So, the problem is transferred into the shortest Hamiltonian path problem. The shortest Hamiltonian path is a NP-complete problem, so we propose a heuristic algorithm to solve it. The algorithm randomly selects a starting vertex and traverses all vertexes once by edge of the current vertex with minimal value. The order of vertexes in the shortest Hamiltonian path is the order of nodes in the web graph. 5 Experiments 5.1 Experimental environment and test data Our test dataset are real web graphs obtained from the Laboratory for Web Algrithmics [9]. Table 1 describes the numbers of nodes and edges and the the filenames on their website [19].Our experiments are based on the operation system Red Hat Enterprise Linux 6. Server (64 bits) with Intel(R) Core(TM) i7-382cpu@3.6ghz and 32GB RAM. All tests use only one CPU core. The compilers used are gcc version and java version Table 1. Description of testing practical Web graphs. Web graphs Nodes Edges Filename uk 1, 3,5,615 uk-27-5@1 cnr 325,557 3,216,152 cnr-2 eu 862,664 19,235,14 eu-25 in 1,382,98 16,917,53 in-24 We compare Delta-K 2 -tree with the state-of-the-art algorithms, including K 2 -tree,, and, in memory space and querying speed over the test data. We implement K 2 -tree and Delta-K 2 -tree in C++. The version of we use is which is publicly available at [19]. The version of we use is.3.2 which is publicly available at [2]. and both are implemented in Java. 5.2 Memory space comparison with different options Table 2 shows the comparison in memory space between K 2 -tree and Delta- K 2 -tree with different options. Space is measured in bpe (bits per edge), by dividing the total space of the compressed data by the number of edges in the web graphs. We configure K 2 -tree and Delta-K 2 -tree with parameter K = 2, 4. Rank is configured with parameter B = 512. For Delta-K 2 -tree, we test four different

9 Delta-K 2 -tree for Compact Representation of Web Graphs options. Delta-K 2 -tree use similarity between adjacent rows or columns in the adjacency matrix are labeled with row and column. Nodes reordering before compression is labeled with reorder. Results show that our proposal leads to about 4% reduction in space with K 2 -tree. In different options, compression efficiency by using similarity of columns is better than rows. Compression efficiency can be improved significantly by our nodes reordering method. Table 2. Memory space comparison (in bpe) between K 2 -tree and Delta-K 2 -tree with different options. uk cnr eu in K=2 K=4 K=2 K=4 K=2 K=4 K=2 K=4 K 2 -tree Delta-K 2 -tree(row) Delta-K 2 -tree(column) Delta-K 2 -tree(row+reorder) Delta-K 2 -tree(column+reorder) Reduction in space 44% 38% 34% 42% 39% 33% 41% 35% 5.3 Memory space comparison with other approaches Table 3 shows the comparison in memory space among K 2 -tree,, and Delta-K 2 -tree. Space is measured in bpe. We configure with parameters w = 7 and m = 3, configure with parameters l = 1 and configure K 2 -tree and Delta-K 2 -tree with parameter K = 2 and B = 512 to favor compression over speed. As and are based on adjacency list, they only support forward querying. We use the technique proposed in [16] to solve the problem by using some extra space, which has been introduced in related work. Results show that the space of our proposal is minimal among all algorithms while supporting both forward and reverse querying. Table 3. Memory space comparison (in bpe) with other approaches. Web graphs K 2 -tree Delta-K 2 -tree uk cnr eu in

10 Y. Zhang et al. 5.4 Space/speed trade-off comparison with other approaches We do this experiment while and only supporting forward querying without any extra space. We test querying speed in tow aspects, query for link and query for neighbors. Query for link represents checking the connecting relation between two given nodes. Query for neighbors is to obtain all neighbors of the given node. Space is measured in bpe. Speed is measured in nspe (nanoseconds per edge). Speed of query for link is the time of one query. Speed of query for neighbors is calculated by dividing the time of one query by the number of the neighbors. Fig. 5 shows the space/speed trade-off comparison of query for link, and Fig. 6 shows the space/speed trade-off comparison of query for neighbors.we configure with parameters (w, m) = (1, 1), (3, 3), (7, 3), configure with parameters l = 4, 8, 16, 1 and configure K 2 -tree and Delta-K 2 -tree with parameter K = 2 and B = 64, 128, 256, 512. On querying speed, our proposal does not have advantages. When querying for link, K 2 -tree is the fastest. When querying for neighbors, is the fastest. However, our proposal shows better space/speed trade-off performance especially in querying for link. When querying for link, if we need high compression and fast speed at the same time, Delta-K 2 -tree is the best choice Querying for link over uk Delta Querying for link over cnr Delta Delta Querying for link over eu Querying for link over in Delta Fig. 5. Space/speed trade-off of querying for link over uk, cnr, eu and in.

11 Delta-K 2 -tree for Compact Representation of Web Graphs Querying for neighbors over uk Delta Querying for neighbors over cnr Delta Querying for neighbors over eu Delta Querying for neighbors over in Delta Fig. 6. Space/speed trade-off of querying for neighbors over uk, cnr, eu and in. 6 Conclusions and future work We have presented a new compression method, Delta-K 2 -tree, for web graphs by taking advantage of the characteristics of similarity of the hyperlinks and sparsity of the adjacency matrices and a node reordering algorithm to further improve compression. We compare it with the common used alternatives [9 11] in the field. Our experiments show that it achieves high compression ratio while supporting fast forward and reverse querying. When querying for checking the connecting relation between two given pages, it is a competitive method to satisfy the requirement of high compression and fast querying. The node reordering algorithm can improve compression of Delta-K 2 -tree, however it can not get the optimal solution. Thus, to design new heuristic n- ode reordering algorithm is one of our possible future works. How to improve querying speed using Delta-K 2 -tree is also a consideration for us. Acknowledgement. This research was supported by the National Natural Science Foundation of China (No ); the Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDA6362); the National High Technology Research and Development of China (863 Program) (No. 211AA175, 212AA1252).

12 Y. Zhang et al. References 1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Computer networks and ISDN systems 3(1), pp: (1998) 2. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Journal of the ACM (JACM) 46(5), pp: (1999) 3. China Internet Network Information Center, research/bgxz/tjbg/2121/t212116_23668.html 4. Vitter, J.S.: External memory algorithms and data structures: Dealing with massive data. In: ACM Computing surveys (CsUR) 33(2), pp: (21) 5. Vitter, J.S.: Algorithms and data structures for external memory. In: Foundations and Trends in Theoretical Computer Science2(4), pp: (28) 6. Badue, C., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: SPIRE 21.Proceedings. Eighth International Symposium on. IEEE, pp: 1-2 (21) 7. Tomasic, A., Garcia-Molina, H.: Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In: Parallel and Distributed Information Systems, pp: Proceedings of the Second International Conference on. IEEE (1993) 8. Yu, G., Gu, Y., Bao, Y. B., Wang, Z.G.: Large scale graph data processing on cloud computing environments. In: Chinese Journal of Computers 34(1), pp: (211) 9. Boldi, P, Vigna, S.: The Webgraph Framework I: Compression techniques. In: the 13th international conference on World Wide Web. ACM, pp: (24). 1. Apostolico, A., Drovandi, G.: Graph compression by BFS. In: Algorithms 2(3), pp: (29) 11. Brisaboa, N.R., Ladra, S., Navarro, G.: k2-trees for compact web graph representation. In: String Processing and Information Retrieval, pp: Springer Berlin Heidelberg (29) 12. Asano, Y., Miyawaki, Y., Nishizeki, T.: Efficient compression of web graphs. In: Computing and Combinatorics, pp: Springer Berlin Heidelberg (28) 13. Boldi, P., Vigna, S.: The Framework II: Codes For The World-Wide Web. In: the Conference on Data Compression, pp: 528. IEEE Computer Society (24) 14. Boldi, P., Santini, M., Vigna, S.: A large time-aware web graph. In: ACM 42(2), pp: ACM SIGIR Forum (28) 15. Boldi, P., Santini, M., Vigna, S.: Permuting web graphs. In: Algorithms and Models for the Web-Graph, pp: Springer Berlin Heidelberg (29) 16. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: the 2th international conference on World Wide Web. ACM, pp: (211) 17. Jacobson, G.: Space-efficient static trees and graphs. In: Foundations of Computer Science, pp: th Annual Symposium on. IEEE (1989) 18. Gonzalez, R., Grabowski, S., Makinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA 5), pp: (25) 19. Homepage, 2. Drovandi, G., PhD Web Site, software.php

Compressed Representation of Web and Social Networks via Dense Subgraphs

Compressed Representation of Web and Social Networks via Dense Subgraphs Compressed Representation of Web and Social Networks via Dense Subgraphs Cecilia Hernández 12 and Gonzalo Navarro 2 1 Dept. of Computer Science, University of Concepción, Chile, 2 Dept. of Computer Science,

More information

Compact Representation of Web Graphs with Extended Functionality

Compact Representation of Web Graphs with Extended Functionality Compact Representation of Web Graphs with Exted Functionality Nieves R. Brisaboa a, Susana Ladra a,, Gonzalo Navarro b a Database Laboratory, University of Coruña, Spain b Dept. of Computer Science, University

More information

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION

More information

Popularity of Twitter Accounts: PageRank on a Social Network

Popularity of Twitter Accounts: PageRank on a Social Network Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,

More information

A P2P-based Incremental Web Ranking Algorithm

A P2P-based Incremental Web Ranking Algorithm A P2P-based Incremental Web Ranking Algorithm Sumalee Sangamuang Pruet Boonma Juggapong Natwichai Computer Engineering Department Faculty of Engineering, Chiang Mai University, Thailand sangamuang.s@gmail.com,

More information

Analysis of Basic Data Reordering Techniques

Analysis of Basic Data Reordering Techniques Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu

More information

Compact Encoding of the Web Graph Exploiting Various Power Laws

Compact Encoding of the Web Graph Exploiting Various Power Laws Compact Encoding of the Web Graph Exploiting Various Power Laws Statistical Reason Behind Link Database Yasuhito Asano, Tsuyoshi Ito 2, Hiroshi Imai 2, Masashi Toyoda 3, and Masaru Kitsuregawa 3 Department

More information

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL International Journal of Mechanical Engineering & Computer Sciences, Vol.1, Issue 1, Jan-Jun, 2017, pp 12-17 HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL BOMA P.

More information

On Finding Power Method in Spreading Activation Search

On Finding Power Method in Spreading Activation Search On Finding Power Method in Spreading Activation Search Ján Suchal Slovak University of Technology Faculty of Informatics and Information Technologies Institute of Informatics and Software Engineering Ilkovičova

More information

A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET

A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET Shervin Daneshpajouh, Mojtaba Mohammadi Nasiri¹ Computer Engineering Department, Sharif University of Technology, Tehran, Iran daneshpajouh@ce.sharif.edu,

More information

Exploiting Progressions for Improving Inverted Index Compression

Exploiting Progressions for Improving Inverted Index Compression Exploiting Progressions for Improving Inverted Index Compression Christos Makris and Yannis Plegas Department of Computer Engineering and Informatics, University of Patras, Patras, Greece Keywords: Abstract:

More information

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola

Mathematical Methods and Computational Algorithms for Complex Networks. Benard Abola Mathematical Methods and Computational Algorithms for Complex Networks Benard Abola Division of Applied Mathematics, Mälardalen University Department of Mathematics, Makerere University Second Network

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Strong Bridges and Strong Articulation Points of Directed Graphs

Strong Bridges and Strong Articulation Points of Directed Graphs Strong Bridges and Strong Articulation Points of Directed Graphs Giuseppe F. Italiano Univ. of Rome Tor Vergata Based on joint work with Donatella Firmani, Luigi Laura, Alessio Orlandi and Federico Santaroni

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW

WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW ISSN: 9 694 (ONLINE) ICTACT JOURNAL ON COMMUNICATION TECHNOLOGY, MARCH, VOL:, ISSUE: WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK AN OVERVIEW V Lakshmi Praba and T Vasantha Department of Computer

More information

Efficient Range Query Processing on Uncertain Data

Efficient Range Query Processing on Uncertain Data Efficient Range Query Processing on Uncertain Data Andrew Knight Rochester Institute of Technology Department of Computer Science Rochester, New York, USA andyknig@gmail.com Manjeet Rege Rochester Institute

More information

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine

International Journal of Scientific & Engineering Research Volume 2, Issue 12, December ISSN Web Search Engine International Journal of Scientific & Engineering Research Volume 2, Issue 12, December-2011 1 Web Search Engine G.Hanumantha Rao*, G.NarenderΨ, B.Srinivasa Rao+, M.Srilatha* Abstract This paper explains

More information

Generating edge covers of path graphs

Generating edge covers of path graphs Generating edge covers of path graphs J. Raymundo Marcial-Romero, J. A. Hernández, Vianney Muñoz-Jiménez and Héctor A. Montes-Venegas Facultad de Ingeniería, Universidad Autónoma del Estado de México,

More information

All-Pairs Nearly 2-Approximate Shortest-Paths in O(n 2 polylog n) Time

All-Pairs Nearly 2-Approximate Shortest-Paths in O(n 2 polylog n) Time All-Pairs Nearly 2-Approximate Shortest-Paths in O(n 2 polylog n) Time Surender Baswana 1, Vishrut Goyal 2, and Sandeep Sen 3 1 Max-Planck-Institut für Informatik, Saarbrücken, Germany. sbaswana@mpi-sb.mpg.de

More information

Indexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table

Indexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table Indexing Web pages Web Search: Indexing Web Pages CPS 296.1 Topics in Database Systems Indexing the link structure AltaVista Connectivity Server case study Bharat et al., The Fast Access to Linkage Information

More information

Report Seminar Algorithm Engineering

Report Seminar Algorithm Engineering Report Seminar Algorithm Engineering G. S. Brodal, R. Fagerberg, K. Vinther: Engineering a Cache-Oblivious Sorting Algorithm Iftikhar Ahmad Chair of Algorithm and Complexity Department of Computer Science

More information

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of

More information

An Edge-Swap Heuristic for Finding Dense Spanning Trees

An Edge-Swap Heuristic for Finding Dense Spanning Trees Theory and Applications of Graphs Volume 3 Issue 1 Article 1 2016 An Edge-Swap Heuristic for Finding Dense Spanning Trees Mustafa Ozen Bogazici University, mustafa.ozen@boun.edu.tr Hua Wang Georgia Southern

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

Inverted List Caching for Topical Index Shards

Inverted List Caching for Topical Index Shards Inverted List Caching for Topical Index Shards Zhuyun Dai and Jamie Callan Language Technologies Institute, Carnegie Mellon University {zhuyund, callan}@cs.cmu.edu Abstract. Selective search is a distributed

More information

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

An Extended Byte Carry Labeling Scheme for Dynamic XML Data Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 5488 5492 An Extended Byte Carry Labeling Scheme for Dynamic XML Data YU Sheng a,b WU Minghui a,b, * LIU Lin a,b a School of Computer

More information

On Compressing Social Networks. Ravi Kumar. Yahoo! Research, Sunnyvale, CA. Jun 30, 2009 KDD 1

On Compressing Social Networks. Ravi Kumar. Yahoo! Research, Sunnyvale, CA. Jun 30, 2009 KDD 1 On Compressing Social Networks Ravi Kumar Yahoo! Research, Sunnyvale, CA KDD 1 Joint work with Flavio Chierichetti, University of Rome Silvio Lattanzi, University of Rome Michael Mitzenmacher, Harvard

More information

Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model

Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model 356 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 Generating All Solutions of Minesweeper Problem Using Degree Constrained Subgraph Model Hirofumi Suzuki, Sun Hao, and Shin-ichi Minato Graduate

More information

arxiv: v1 [cs.na] 27 Apr 2012

arxiv: v1 [cs.na] 27 Apr 2012 Revisiting the D-iteration method: runtime comparison Dohy Hong Alcatel-Lucent Bell Labs Route de Villejust 91620 Nozay, France dohy.hong@alcatel-lucent.com Gérard Burnside Alcatel-Lucent Bell Labs Route

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase

Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Quadrant-Based MBR-Tree Indexing Technique for Range Query Over HBase Bumjoon Jo and Sungwon Jung (&) Department of Computer Science and Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107,

More information

Mosaic: Processing a Trillion-Edge Graph on a Single Machine

Mosaic: Processing a Trillion-Edge Graph on a Single Machine Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper @ EuroSys

More information

Constructions of hamiltonian graphs with bounded degree and diameter O(log n)

Constructions of hamiltonian graphs with bounded degree and diameter O(log n) Constructions of hamiltonian graphs with bounded degree and diameter O(log n) Aleksandar Ilić Faculty of Sciences and Mathematics, University of Niš, Serbia e-mail: aleksandari@gmail.com Dragan Stevanović

More information

Compressed representations for web and social graphs. Cecilia Hernandez and Gonzalo Navarro Presented by Helen Xu 6.

Compressed representations for web and social graphs. Cecilia Hernandez and Gonzalo Navarro Presented by Helen Xu 6. Compressed representations for web and social graphs Cecilia Hernandez and Gonzalo Navarro Presented by Helen Xu 6.886 April 6, 2018 Web graphs and social networks Web graphs represent the link structure

More information

Parallelization of Graph Isomorphism using OpenMP

Parallelization of Graph Isomorphism using OpenMP Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)

More information

To Index or not to Index: Time-Space Trade-Offs in Search Engines with Positional Ranking Functions

To Index or not to Index: Time-Space Trade-Offs in Search Engines with Positional Ranking Functions To Index or not to Index: Time-Space Trade-Offs in Search Engines with Positional Ranking Functions Diego Arroyuelo Dept. of Informatics, Univ. Técnica F. Santa María. Yahoo! Labs Santiago, Chile darroyue@inf.utfsm.cl

More information

THE WEB SEARCH ENGINE

THE WEB SEARCH ENGINE International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) Vol.1, Issue 2 Dec 2011 54-60 TJPRC Pvt. Ltd., THE WEB SEARCH ENGINE Mr.G. HANUMANTHA RAO hanu.abc@gmail.com

More information

Data Structures and Algorithms for Counting Problems on Graphs using GPU

Data Structures and Algorithms for Counting Problems on Graphs using GPU International Journal of Networking and Computing www.ijnc.org ISSN 85-839 (print) ISSN 85-847 (online) Volume 3, Number, pages 64 88, July 3 Data Structures and Algorithms for Counting Problems on Graphs

More information

COMP5331: Knowledge Discovery and Data Mining

COMP5331: Knowledge Discovery and Data Mining COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank

More information

Two-Dimensional Block Trees

Two-Dimensional Block Trees Two-Dimensional Block Trees Nieves R. Brisaboa, Travis Gagie, Adrián Gómez-Brandón, and Gonzalo Navarro Database Laboratory EIT Dept. of Computer Science Universidade da Coruña Diego Portales University

More information

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,

More information

An O(n 2.75 ) algorithm for online topological ordering 1

An O(n 2.75 ) algorithm for online topological ordering 1 Electronic Notes in Discrete Mathematics 25 (2006) 7 12 www.elsevier.com/locate/endm An O(n 2.75 ) algorithm for online topological ordering 1 Deepak Ajwani a,2, Tobias Friedrich a and Ulrich Meyer a a

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

Multicasting in the Hypercube, Chord and Binomial Graphs

Multicasting in the Hypercube, Chord and Binomial Graphs Multicasting in the Hypercube, Chord and Binomial Graphs Christopher C. Cipriano and Teofilo F. Gonzalez Department of Computer Science University of California, Santa Barbara, CA, 93106 E-mail: {ccc,teo}@cs.ucsb.edu

More information

HIPRank: Ranking Nodes by Influence Propagation based on authority and hub

HIPRank: Ranking Nodes by Influence Propagation based on authority and hub HIPRank: Ranking Nodes by Influence Propagation based on authority and hub Wen Zhang, Song Wang, GuangLe Han, Ye Yang, Qing Wang Laboratory for Internet Software Technologies Institute of Software, Chinese

More information

Cache Oblivious Matrix Transpositions using Sequential Processing

Cache Oblivious Matrix Transpositions using Sequential Processing IOSR Journal of Engineering (IOSRJEN) e-issn: 225-321, p-issn: 2278-8719 Vol. 3, Issue 11 (November. 213), V4 PP 5-55 Cache Oblivious Matrix s using Sequential Processing korde P.S., and Khanale P.B 1

More information

Fast algorithm for generating ascending compositions

Fast algorithm for generating ascending compositions manuscript No. (will be inserted by the editor) Fast algorithm for generating ascending compositions Mircea Merca Received: date / Accepted: date Abstract In this paper we give a fast algorithm to generate

More information

Ranking web pages using machine learning approaches

Ranking web pages using machine learning approaches University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2008 Ranking web pages using machine learning approaches Sweah Liang Yong

More information

A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS

A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 A SURVEY ON WEB FOCUSED INFORMATION EXTRACTION ALGORITHMS Satwinder Kaur 1 & Alisha Gupta 2 1 Research Scholar (M.tech

More information

KEYWORD search is a well known method for extracting

KEYWORD search is a well known method for extracting IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 7, JULY 2014 1657 Efficient Duplication Free and Minimal Keyword Search in Graphs Mehdi Kargar, Student Member, IEEE, Aijun An, Member,

More information

Pushing the Envelope in Graph Compression

Pushing the Envelope in Graph Compression Pushing the Envelope in Graph Compression Panagiotis Liakos University of Athens Athens, Greece p.liakos@di.uoa.gr Katia Papakonstantinopoulou University of Athens Athens, Greece katia@di.uoa.gr Michael

More information

Testing Isomorphism of Strongly Regular Graphs

Testing Isomorphism of Strongly Regular Graphs Spectral Graph Theory Lecture 9 Testing Isomorphism of Strongly Regular Graphs Daniel A. Spielman September 26, 2018 9.1 Introduction In the last lecture we saw how to test isomorphism of graphs in which

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

An Improved PageRank Method based on Genetic Algorithm for Web Search

An Improved PageRank Method based on Genetic Algorithm for Web Search Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 2983 2987 Advanced in Control Engineeringand Information Science An Improved PageRank Method based on Genetic Algorithm for Web

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science and Center of Simulation of Advanced Rockets University of Illinois at Urbana-Champaign

More information

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific

More information

Permuting Web and Social Graphs

Permuting Web and Social Graphs Permuting Web and Social Graphs Paolo Boldi Massimo Santini Sebastiano Vigna Dipartimento di Scienze dell Informazione, Università degli Studi di Milano, Italy Abstract Since the first investigations on

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

DATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305

DATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305 Q.1 If h is any hashing function and is used to hash n keys in to a table of size m, where n

More information

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Proximity Prestige using Incremental Iteration in Page Rank Algorithm Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration

More information

Searching the Web [Arasu 01]

Searching the Web [Arasu 01] Searching the Web [Arasu 01] Most user simply browse the web Google, Yahoo, Lycos, Ask Others do more specialized searches web search engines submit queries by specifying lists of keywords receive web

More information

Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler

Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler Term-Frequency Inverse-Document Frequency Definition Semantic (TIDS) Based Focused Web Crawler Mukesh Kumar and Renu Vig University Institute of Engineering and Technology, Panjab University, Chandigarh,

More information

An Improved Computation of the PageRank Algorithm 1

An Improved Computation of the PageRank Algorithm 1 An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.

More information

The Constellation Project. Andrew W. Nash 14 November 2016

The Constellation Project. Andrew W. Nash 14 November 2016 The Constellation Project Andrew W. Nash 14 November 2016 The Constellation Project: Representing a High Performance File System as a Graph for Analysis The Titan supercomputer utilizes high performance

More information

Evaluating find a path reachability queries

Evaluating find a path reachability queries Evaluating find a path reachability queries Panagiotis ouros and Theodore Dalamagas and Spiros Skiadopoulos and Timos Sellis Abstract. Graphs are used for modelling complex problems in many areas, such

More information

On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval

On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval On Dimensionality Reduction of Massive Graphs for Indexing and Retrieval Charu C. Aggarwal 1, Haixun Wang # IBM T. J. Watson Research Center Hawthorne, NY 153, USA 1 charu@us.ibm.com # Microsoft Research

More information

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION Evgeny Kharitonov *, ***, Anton Slesarev *, ***, Ilya Muchnik **, ***, Fedor Romanenko ***, Dmitry Belyaev ***, Dmitry Kotlyarov *** * Moscow Institute

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

Improving Suffix Tree Clustering Algorithm for Web Documents

Improving Suffix Tree Clustering Algorithm for Web Documents International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal

More information

Compact and Efficient Representation of General Graph Databases 1

Compact and Efficient Representation of General Graph Databases 1 This is a post-peer-review, pre-copyedit version of an article published in Knowledge and Information Systems. The final authenticated version is available online at: http://dx.doi.org/.7/ s5-8-275-x Compact

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

All About Bitmap Indexes... And Sorting Them

All About Bitmap Indexes... And Sorting Them http://www.daniel-lemire.com/ Joint work (presented at BDA 08 and DOLAP 08) with Owen Kaser (UNB) and Kamel Aouiche (post-doc). February 12, 2009 Database Indexes Databases use precomputed indexes (auxiliary

More information

The application of Randomized HITS algorithm in the fund trading network

The application of Randomized HITS algorithm in the fund trading network The application of Randomized HITS algorithm in the fund trading network Xingyu Xu 1, Zhen Wang 1,Chunhe Tao 1,Haifeng He 1 1 The Third Research Institute of Ministry of Public Security,China Abstract.

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Set Cover with Almost Consecutive Ones Property

Set Cover with Almost Consecutive Ones Property Set Cover with Almost Consecutive Ones Property 2004; Mecke, Wagner Entry author: Michael Dom INDEX TERMS: Covering Set problem, data reduction rules, enumerative algorithm. SYNONYMS: Hitting Set PROBLEM

More information

Double-precision General Matrix Multiply (DGEMM)

Double-precision General Matrix Multiply (DGEMM) Double-precision General Matrix Multiply (DGEMM) Parallel Computation (CSE 0), Assignment Andrew Conegliano (A0) Matthias Springer (A00) GID G-- January, 0 0. Assumptions The following assumptions apply

More information

Code Compaction Using Post-Increment/Decrement Addressing Modes

Code Compaction Using Post-Increment/Decrement Addressing Modes Code Compaction Using Post-Increment/Decrement Addressing Modes Daniel Golovin and Michael De Rosa {dgolovin, mderosa}@cs.cmu.edu Abstract During computation, locality of reference is often observed, and

More information

Histogram-Aware Sorting for Enhanced Word-Aligned Compress

Histogram-Aware Sorting for Enhanced Word-Aligned Compress Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes 1- University of New Brunswick, Saint John 2- Université du Québec at Montréal (UQAM) October 23, 2008 Bitmap indexes SELECT

More information

Ranking Web Pages by Associating Keywords with Locations

Ranking Web Pages by Associating Keywords with Locations Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn

More information

1 Motivation for Improving Matrix Multiplication

1 Motivation for Improving Matrix Multiplication CS170 Spring 2007 Lecture 7 Feb 6 1 Motivation for Improving Matrix Multiplication Now we will just consider the best way to implement the usual algorithm for matrix multiplication, the one that take 2n

More information

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs

An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs An Cross Layer Collaborating Cache Scheme to Improve Performance of HTTP Clients in MANETs Jin Liu 1, Hongmin Ren 1, Jun Wang 2, Jin Wang 2 1 College of Information Engineering, Shanghai Maritime University,

More information

n 2 ( ) ( ) + n is in Θ n logn

n 2 ( ) ( ) + n is in Θ n logn CSE Test Spring Name Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply an m n matrix and a n p matrix is in: A. Θ( n) B. Θ( max(

More information

Indexing Variable Length Substrings for Exact and Approximate Matching

Indexing Variable Length Substrings for Exact and Approximate Matching Indexing Variable Length Substrings for Exact and Approximate Matching Gonzalo Navarro 1, and Leena Salmela 2 1 Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl 2 Department of

More information

Information Cloaking Technique with Tree Based Similarity

Information Cloaking Technique with Tree Based Similarity Information Cloaking Technique with Tree Based Similarity C.Bharathipriya [1], K.Lakshminarayanan [2] 1 Final Year, Computer Science and Engineering, Mailam Engineering College, 2 Assistant Professor,

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity

More information

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node

A Modified Algorithm to Handle Dangling Pages using Hypothetical Node A Modified Algorithm to Handle Dangling Pages using Hypothetical Node Shipra Srivastava Student Department of Computer Science & Engineering Thapar University, Patiala, 147001 (India) Rinkle Rani Aggrawal

More information

Planarity Algorithms via PQ-Trees (Extended Abstract)

Planarity Algorithms via PQ-Trees (Extended Abstract) Electronic Notes in Discrete Mathematics 31 (2008) 143 149 www.elsevier.com/locate/endm Planarity Algorithms via PQ-Trees (Extended Abstract) Bernhard Haeupler 1 Department of Computer Science, Princeton

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Acolyte: An In-Memory Social Network Query System

Acolyte: An In-Memory Social Network Query System Acolyte: An In-Memory Social Network Query System Ze Tang, Heng Lin, Kaiwei Li, Wentao Han, and Wenguang Chen Department of Computer Science and Technology, Tsinghua University Beijing 100084, China {tangz10,linheng11,lkw10,hwt04}@mails.tsinghua.edu.cn

More information

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page

Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page International Journal of Soft Computing and Engineering (IJSCE) ISSN: 31-307, Volume-, Issue-3, July 01 Weighted Page Rank Algorithm Based on Number of Visits of Links of Web Page Neelam Tyagi, Simple

More information

Elements of Graph Theory

Elements of Graph Theory Elements of Graph Theory Quick review of Chapters 9.1 9.5, 9.7 (studied in Mt1348/2008) = all basic concepts must be known New topics we will mostly skip shortest paths (Chapter 9.6), as that was covered

More information

A Parallel Algorithm for Finding Sub-graph Isomorphism

A Parallel Algorithm for Finding Sub-graph Isomorphism CS420: Parallel Programming, Fall 2008 Final Project A Parallel Algorithm for Finding Sub-graph Isomorphism Ashish Sharma, Santosh Bahir, Sushant Narsale, Unmil Tambe Department of Computer Science, Johns

More information

Exploiting Computation-Friendly Graph Compression Methods for Adjacency-Matrix Multiplication

Exploiting Computation-Friendly Graph Compression Methods for Adjacency-Matrix Multiplication Exploiting Computation-Friendly Graph Compression Methods for Adjacency-Matrix Multiplication Alexandre P Francisco, Travis Gagie, Susana Ladra, and Gonzalo Navarro INESC-ID / IST Universidade de Lisboa

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Solutions to Exam Data structures (X and NV)

Solutions to Exam Data structures (X and NV) Solutions to Exam Data structures X and NV 2005102. 1. a Insert the keys 9, 6, 2,, 97, 1 into a binary search tree BST. Draw the final tree. See Figure 1. b Add NIL nodes to the tree of 1a and color it

More information