Hadoop Based Link Prediction Performance Analysis
|
|
- Randall Harris
- 5 years ago
- Views:
Transcription
1 Hadoop Based Link Prediction Performance Analysis Yuxiao Dong, Casey Robinson, Jian Xu Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, USA Abstract Link prediction is an important problem in social network analysis and has been applied in a variety of fields. Link prediction aims to estimate the likelihood of the existence of links between nodes by the known network structure. The time complexity of link prediction algorithms in huge-scale networks remains unexplored and unsolved, especially for sparse networks. In this project, we will explore how parallel computing speeds up link prediction in huge-scale networks. We implemented similarity based link prediction algorithms based on MapReduce, which have the time complexity of O(n) in sparse networks. We analyzed the performance of our algorithms on the Data Intensive Science Cluster at University of Notre Dame. Weevaluate the performance with different configurations, monitor the resource utilization of the distributed computation, and optimize accordingly. After analyzing the efficiency with different configurations, we present the fastest approach of performing parallelized link prediction, which is particularly suited for real-world big data. Index Terms Social network analysis, Link prediction, MapReduce, Parallelization I. INTRODUCTION Social networks are an important part of our society. These networks are in constant flux and understanding how nodes relate is of great interest. Barabási demonstrated that networks expand continuously by the addition of new vertices which preferentially attach to existing, well connected verticies [1]. Many researchers have studied the network evolution and modeling the dynamic network structure [2], [3], [4]. Link prediction is used to understand and identify the mechanisms of network growth and evolution. Link prediction aims to estimate the likelihood of the existence of links between nodes based on the known network structure information. The classical problem of link prediction is the prediction of existing yet unknown links - called missing links. Most of previous work on link prediction employs cross-validation by splitting the data into two sets: training and testing. Consider this motivating example. People in the real world meet new friends. The relationship is represented by the appearance of a new connection in his or her social graph. Through the new relationship, both people s social circle enlarges. Predicting these relationships before they are formed is vital to the success of a social networking service. Link prediction attempts to address the issue of discovering future connections. We can experience the results of link prediction through the friend recommendation engine on Facebook. However, there are now more than 1 billion users on Facebook. The massive scale is an impediment to successful prediction. A scalable and efficient solution is needed to accurately recommend friends. Challenge: The major challenge of link prediction stems from the sparse, yet gigantic, nature of social networks. A sparse network implies that the existing links between nodes represent a small fraction of the total possible links. The data size requires either years or many computers. To solve the strongly unbalanced data between unexisting links and existing links, we can undersample the holdout test set [5] or only sample negative instances in the test set [6]. Modifying the sampling method changes the data distribution so that it no longer presents the same challenges at the real-world distribution. Since the algorithm no longer reflects the capabilities and limitations of the prediction model, the results are uninterpretable [7]. Thus, parallelization is the only feasable and meaningful method for studying link formation and consequently providing the motivation for our work. Three designed patterns, based on MapReduce, have been proposed to speed up network analysis algorithms[8]. PEGA- SUS is a MapReduce based framework for graph mining which implements most of the classical graph mining algorithms[9]. Although a large amount research has been conducted on MapReduce based graph mining, no MapReduce framework exists for link prediction. Our goal in this project is to design, implement, and analyze the performance of similarity based link prediction algorithms on Data Intensive Science Cluster at University of Notre Dame. The experimental results on several large-scale datasets of variety network types show that the MapReduce based link prediction algorithm is more effective and scalable than traditional ones and its running time decreases with more compute units for appropriately sized data chunks. More work will be done to improve performance and gain further insight into these findings. II. RELATED WORK Our work is related to link prediction and graph mining in huge-scale networks. Link prediction has attracted considerable attention in recent years both from computer science and physics community. Existing work can be classified into two categories: unsupervised methods [10] and supervised methods [7], [11], [12], [13]. Most unsupervised
2 link prediction algorithms are based on a similarity measure between nodes of graph. A seminal work by Liben-Nowell and Kleinberg for unsupervised methods addresses the problem from an algorithmic point of view. The authors investigate how different proximity features can be exploited to predict the occurrence of new links in social networks [10]. For the supervised methods, Lichtenwalter et al. motivated the use of a binary classification framework and vertex collocation profiles [7], [11]. Place features can be exploited into the supervised model for link prediction on location-based social networks [12]. To recommend friends on Facebook, a supervised random walk is designed for link prediction and recommendation [13]. Existing work focuses on link prediction in a particular network without consideration for general parallelized design of large-scale networks. Recently, the focus of graph mining is huge-scale networks. In 2004, Google presented its MapReduce framework for large-scale data indexing and mining [14], which leads the direction of analyzing big data. Three design patterns, bsed on MapReduce, to speed up nework analysis algorithms have been proposed [8]. Moreover, Kang et al. propose two frameworks for huge-scale graph management and analysis: one is GBASE [15]: a scalable and general graph management and mining system based on MapReduce, the other one is a MapReduce based spectral analysis system in billion-scale graphs [16]. Then Yang et al. propose a Self Evolving Distributed Graph Management Environment for partition management of large graphs [17]. The existing work give solutions for general framework for graph mining in large-scale graph. Here we use MapReduce based methods to implement predicting links in huge-scale networks. III. OVERVIEW In this project, we demonstrate how link prediction algorithms benefit from parallel computing. We evaluate the performance of our Hadoop implementation on Data Intensive Science Cluster (DISC) at University of Notre Dame. We design the parallelized strategy for link prediction algorithms using MapReduce model, test the the validity and performance of our MapReduce implementations on DISC, and demonstrate how the number of Mapper tasks and Reducer tasks affect the overall performance. In addition, we explore how graph properties influence the performance of the link prediction problem by testing graphs of different size, type, and clustering coefficient. Finally, we seek to analyze the microscopic details of our implementations. We will monitor the resource utilization of the program, find computation bottlenecks, and attempt to improve our implementation. We will monitor the load balance by comparing completion time in different nodes. Communication time between nodes is an important factor which will also be explored. Big data is divided into small parts for distributed computation and parallel computation can be utilized to reduce the link prediction time. We disover issues that will affect the performance and propose the best approach of performing Fig. 1: The core MapReduce process for our algorithms. parallelized link prediction for big data using DISC. The core parallelized solution for our framework is shown in Figure 1. DISC contains 26 nodes, consisting of 32 GB RAM, 12 x 2 TB SATA disks, 2 x 8-core Intel Xeon E GHz, and Gigabit Ethernet, which is sufficient for big-data manipulation. The software required is Hadoop and the Java runtime environment, both are already installed. IV. EVALUATION SETUP The main thrust of our research is to investigate the performance of link prediction algorithms. Our evaluation approach is divided into three parts. Each part will be focused on reducing the number of variables to explore at the next level down. We will start with a macroscop analysis by treating our link prediction implementation as a black box. The lowest level looks at individual machines and attempts to find performance bottlenecks. We use five data sets as the basis for our evaluation. The data sets range from small (12,000 nodes, 237,000 connections, and 3MB) to large (4.8M nodes, 68M connections, and 1GB). The data sets each represent different types of graphs: citation networks, collaboration networks, social networks, and web networks. See Appendix A for a detailed description of each data set. We analyze the performance at a variety of levels; each providing a unique perspective of the system. To evaluate the macroscopic behavior of our link predictor we wrapped the entire system with a timer; allowing us to obtain a measurement of the completion time of each run. We chose to use this metric for performance because it is the most relevant to an end user. At the next level down, we measure the running time of each hadoop submission. Our implementation consists of ten consecutive Hadoop jobs as demonstrated in Figure 2. The breakdown of time provided at this level allows us to focus our detailed analysis at the next level. Focused testing is important because each evaluation run lasts many hours. The lowest level we are currently exploring is the performance of the most time consuming Hadoop submissions. In this level we analyze the disk i/o, network traffic, and CPU usage. The information gained from these tests will allow us to tune the performance of our implementation to the specific machines we are using.
3 Fig. 2: Steps involved in Link Prediction V. RESULTS In this section, we examine the scalability and efficiency of our MapReduce based link prediction framework from three aspects: overall performance, graph influence, and breakdown performance. A. Overall performance The highest level analyzes the total running time while varying the number of Reducers as well as the data set under observation. Before discussing the performance of our framework, we first analyze the tradeoff of the number of reducers. Fig. 3: Total running time with different number of Reducers, for ND Web data set. Tradeoff. By varying the number of Reducers from 1 to 50 for the ND Web data set, in Figure 3 we see that the average completion time follows three distinct trends. First is the rapid decrease occuring between 1 and 7 reducers. Here we witness the benefits of parallelization: with few reducers the work is under-parallelized. In other words, each Reducer is operating at maximum throughput. The second trend is a steady state where the average time does not increase or decrease. At this stage the benefits of additional Reducers is approximately equal to the additional overhead. On the right portion of the graph, 25 to 50 reducers, we observe an increase in completion time. Here, the amount of data in each chunk is sufficently small, causing the setup time to dominate the total run time. The performance for the ND Web data set has a large variance. We attribute the unpredictable behavior to overhead of disk access and network I/O in this distributed computation platform. This overhead is important for the smaller data sets since the data processing portion of our framework is much shorter than in the large data sets. However, while the overhead of Hadoop based link prediction still exists for a larger data set, the Live Journal, the variance of performance is much smaller, which is shown in Figure 4(a). Because the Live Journal data set is of magnitudes larger than the ND Web one, more time is spent on the actual computing the scores for potential links, thus the variation caused by the overhead of distributed computing is a small percentage of the total run time. As a consequence, our implementation of Hadoop based link prediction has been shown to be suitable for big data. Scalability. We use Live Journal data set as a representative of big data to see how our approach of parallel computing can help speed up the performance. Ideally, if the overhead of distributed computation can be ignored, and the job is evenly distributed to the computing nodes, the job will complete in t = T N = T N 1, where T is the time to complete the job on a single machine, and N is the number of Reducers which is no more than the number of computers. However, as the overhead of distributed computation such as distributing the job to different computers and collecting the results through network does exist, the completion of time cannot be as good as a inversely propotional function, in other words, the power of N cannot be ideally 1, but should lie between 1 and 0. With this observation, we fit the plots with Reducer count smaller than 25 using a power distribution t = T N α, and the power α = , which meets our analysis. Note that 1 < α < 0 delivers a concave curve, the speed-up increases first rapidly then slowly as the number of Reducers increases. This indicates that if we have multiple jobs running on the same distributed computing cluster, for each job we can set a relatively small number of Reducers for optimal overall performance. For example, if we want to simultaneously run link prediction on four data sets that are of similar size as the Live Journal data set, we can set the number of Reducers R = 6 for every job, so that every job sees the benefits of parallelization, while 4 R = 24 < 25 minimizes the overhead of running multiple reducing procedures on a single machine. We also examine the speedup of our algorithm. Theoretically we expect that the number of reducers can directly indicate the degree of concurrence. Thus the theoretical overall throughout by n reducers is denoted as n T, where T is the throughout by one reducer. In Figure 4(b), we plot speedup of our algorithms. Basically, we find our algorithm shows a much better speedup than linear speedup. The experiments on Live Journal network indicate that our MapReduce based link prediction framework has excellent scalability. B. The impact of graph properties Besides the Reducer parameter, we also investigate the impact of graph properties on computation time in the highest level. We control the variables by using fixed 25 Reducers and analyze the performance on small, medium and large networks, shown in Table I.
4 (a) Total running time for Live Journal dataset. (b) Speedup on Live Journal dataset. Fig. 4: Running time and speedup. TABLE I RUNNING TIME ON DIFFERENT DATA SETS WITH 25 REDUCERS Data set Time (s) Nodes Edges HepPh Collaboration 94.9 ± ND Web 1089 ± LiveJournal 6818 ± Graph Size. From Table I we see that the time consumed for different graphs is proportional to the number of nodes in the graph, or we can say, our approach has a time complexity of O(N). The reason is that the link prediction algorithm we used is based on common neighbors. As a consequence, if two nodes do not have a common neighbor, the score of connection between these two nodes must be 0. In other words, if node a and b have a common neighbor c, then a and b must simultaneously exist in Adj[c], otherwise a and b will never coexist in the adjacency list of any node. With this fact, we only have to calculate the scores of connections in the adjacency list, rather than to calculate the scores of connections between every two nodes. Time complexity analysis. The observation that we do not have to compute scores for every potential connection significantly reduces the time complexity of our approach. If the average degree of nodes in the network is k = 2E/N, then for the adjacency list of every node, the number of pairs to deal with is k(k 1) 2, and the total numbers of scores to compute for the whole network is k(k 1)N 2. As a result, the time complexity for link prediction based on common neighbors is O(Nk 2 ), and space complexity O(N k). Barabási and Albert proposes the power law distribution of degrees in real-life networks [1], i.e., the probability that a node has a degree of k is inversely propotional to k. As a consequence, the degrees of most nodes in empirical networks are small, giving us a small average degree. Therefore, for reallife networks which are big but sparse, the time complexity of link prediction based on common neighbors is O(N). In our Hadoop based link prediction implmentation, since the data will be distributed to multiple processing units, the time complexity is O(N/U), where U is the number of computing nodes. C. Breakdown of jobs To analyze which procedures take the most time in our implementation, we breakdown the jobs for a more insightful performance analysis. Our implmentation consists of ten consecutive Hadoop jobs which are shown in Figure 2. Again we control the variables by using fixed 25 Reducers and analyze the performance of the ten consecutive procedures for small, medium and large networks, as shown in Figure 5(a), 5(b), 5(c). Heavyweight jobs. In all three of these breakdown graphs, the seventh procedure in light blue, getlpscore, and the tenth procedure in light red, getauc, occupy the majority of the time. The prominence of these two procedures is within expectation: getlpscore is the procedure that actually computes the scores for potential connections with the algorithm based on common neighbors. As analyzed above, the time complexity of getlpscore is O(N). The reason that getauc takes another big share of total time is different from that of getlpscore. In this last step, the scores of potential links that are stored on the 25 machines need to be transfered via the network to the controlling machine, during which the overhead of disk operations and network communication is heavy and non-negligible. Also, after collecting and merging the results from those 25 machines, the calculation of AUC score must be completed on the controlling machine. The calculation of AUC score further makes the procedure getauc under-parallelized, therefore this step grows in superlinear time against the network size. In a word, the larger the network, the more proportion the last step getauc will take. Procedures 1 6 can be completed in constant time, such as splitdata, or in sublinear time, so getlpscore takes a larger proportion of time compared with procedures 1 6 when dealing with bigger data sets. Lightweight jobs. Based on the analysis above, procedures 1 6 can be completed in constant time or in sublinear
5 time, which coincides with the actual experiment running time. We can read more from the breakdown of jobs. For example, in data sets HepPh Collaboration and Live Journal, getadjlist (Procedure 6, the grey bar) is slightly higher than getdegreestats (Procedure 6, the purple bar), while in ND Web, getadjlist is noticeably lower than getdegreestats. This phenomenon is due to time complexity of getadjlist is O(E), while the complexity of getdegreestats is O(N). As the average degree is proportional to E/N, we can infer from the breakdown that the average degree in ND Web data set is smaller than that of the other two data sets. Our inference can be verified that the average degree of nodes of ND Web data set ( 4.6) is indeed less than that of HepPh Collaboration ( 19.7) and that of Live Journal ( 14.2). As we can see, the breakdown graphs reflects the nature of algorithms we used as well as the intrinsic properties of networks. Last but not least, the breakdown graphs confirm our approach works better on big data sets than on small ones. The error bars in the median-sized network ND Web and in the large-sized network Live Journal are good, while in the small-sized network HepPh Collaboration the variation is not acceptable. Also, the actual two steps of performing link prediction (Procedure 7 and 10) in the small data set comprises of less than half (44%) of the total time, indicating the efficiency of Hadoop based link prediction on small data sets is unsatisfactory, while on large networks such as Live Journal the efficiency is much higher (85%). VI. CONCLUSION AND FUTURE WORK In order to improve the performance we need to dive deeper into the implementation. We will add instrumentation designed to study specific elements of the system including: the I/O and network bandwidth of Hadoop distributed file system (HDFS), and the computing performance of CPU in different nodes. We will use the data to create a breakdown of time spent and find bottlenecks. Once the bottlenecks are found we will determine possible methods for reducing the effect, consequently improving performance. To evaluate our hypothesis that the I/O and communication among distributed computing nodes is the main bottleneck, we will reduce the CPU availability and measure the effect on total run time. Though we have been running jobs on DISC day and night for more than one month and collected considerable amount of data, we do not have statistically significant data for the Live Journal data set at most configurations. Performing link prediction on big data is not easy by nature, especially for the Live Journal data set which is larger than 1GB. We will keep running jobs on DISC clusters continuously and collect as much as data as possible. REFERENCES [1] A.-L. Barabási and R. Albert, Emergence of Scaling in Random Networks, Science, [2] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, Group formation in large social networks: membership, growth, and evolution, in KDD 06, 2006, pp [3] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins, Microscopic evolution of social networks, in KDD 08, 2008, pp [4] J. E. Hopcroft, T. Lou, and J. Tang, Who will follow you back? reciprocal relationship prediction, in CIKM 11, [5] S. S. Mohammad AI Hasan, Vineet Chaoji and M. Zaki, Link prediction using supervised learning, in Workshop on LACS of SDM 06, 2006, pp [6] C. Wang, V. Satuluri, and S. Parthasarathy, Local probabilistic models for link prediction, in ICDM 07, 2007, pp [7] R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla, New perspectives and methods in link prediction, in KDD 10. ACM, [8] J. Lin and M. Schatz, Design patterns for efficient graph algorithms in MapReduce, in MLG 10, [9] U. Kang, C. E. Tsourakakis, and C. Faloutsos, Pegasus: A peta-scale graph mining system implementation and observations, in ICDM 09, [10] D. Liben-Nowell and J. Kleinberg, The link prediction problem for social networks, in CIKM 03. ACM, [11] R. N. Lichtenwalter and N. V. Chawla, Vertex collocation profiles: subgraph counting for link analysis and prediction, in WWW 12, [12] A. Scellato, Salvatore. Noulas and C. Mascolo, Exploiting place features in link prediction on location-based social networks, in KDD 11. ACM, [13] L. Backstrom and J. Leskovec, Supervised random walks: predicting and recommending links in social networks, in WSDM 11, 2011, pp [14] J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, in OSDI 04, 2004, pp [15] U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos, GBASE: a scalable and general graph management system, in KDD 11, [16] U. Kang, B. Meeder, and C. Faloutsos, Spectral analysis for billionscale graphs: discoveries and implementation, in PAKDD 11, [17] S. Yang, X. Yan, B. Zong, and A. Khan, Towards effective partition management for large graphs, in SIGMOD 12, APPENDIX The data used for evaluation are publicly available at Stanford Network Analysis Project (SNAP). 1 In the ND Web data set, nodes represent pages from University of Notre Dame (domain nd.edu) and directed edges represent hyperlinks between them. The data was collected in 1999 by Albert, Jeong and Barabasi. Live Journal is a free on-line community with almost 10 million members; a significant fraction of these members are highly active. (For example, roughly 300,000 update their content in any given 24-hour period.) Live Journal allows members to maintain journals, individual and group blogs, and it allows people to declare which other members are their friends they belong. Arxiv HEP-PH collaboration (High Energy Physics - Phenomenology) network is from the e-print arxiv and covers scientific collaborations between authors papers submitted to High Energy Physics - Phenomenology category. If an author i co-authored a paper with author j, the graph contains a undirected edge from i to j. If the paper is co-authored by k authors this generates a completely connected (sub)graph on k nodes. The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arxiv, and thus represents essentially the complete history of its HEP-PH section. Arxiv HEP-PH citation (high energy physics phenomenology ) graph is from the e-print arxiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge 1
6 (a) HepPh Collaboration dataset (b) ND Web dataset (c) Live Journal dataset Fig. 5: Running time with 25 Reducers. from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this. The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arxiv, and thus represents essentially the complete history of its HEP-PH section. The data was originally released as a part of 2003 KDD Cup. Epinions social network is a who-trust-whom online social network of a a general consumer review site Epinions.com. Members of the site can decide whether to trust each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user.
An Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationLink Prediction and Recommendation across Heterogeneous Social Networks
2012 IEEE 12th International Conference on Data Mining Link Prediction and Recommendation across Heterogeneous Social Networks Yuxiao Dong,JieTang,SenWu, Jilei Tian, Nitesh V. Chawla, Jinghai Rao, Huanhuan
More informationCS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks
CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,
More information6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS
Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationarxiv: v1 [cs.si] 12 Jan 2019
Predicting Diffusion Reach Probabilities via Representation Learning on Social Networks Furkan Gursoy furkan.gursoy@boun.edu.tr Ahmet Onur Durahim onur.durahim@boun.edu.tr arxiv:1901.03829v1 [cs.si] 12
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationJure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah
Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationAn Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization
An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationThe Complex Network Phenomena. and Their Origin
The Complex Network Phenomena and Their Origin An Annotated Bibliography ESL 33C 003180159 Instructor: Gerriet Janssen Match 18, 2004 Introduction A coupled system can be described as a complex network,
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationSimilarity Joins in MapReduce
Similarity Joins in MapReduce Benjamin Coors, Kristian Hunt, and Alain Kaeslin KTH Royal Institute of Technology {coors,khunt,kaeslin}@kth.se Abstract. This paper studies how similarity joins can be implemented
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationA New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader
A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationA Novel Parallel Hierarchical Community Detection Method for Large Networks
A Novel Parallel Hierarchical Community Detection Method for Large Networks Ping Lu Shengmei Luo Lei Hu Yunlong Lin Junyang Zou Qiwei Zhong Kuangyan Zhu Jian Lu Qiao Wang Southeast University, School of
More informationEdge Classification in Networks
Charu C. Aggarwal, Peixiang Zhao, and Gewen He Florida State University IBM T J Watson Research Center Edge Classification in Networks ICDE Conference, 2016 Introduction We consider in this paper the edge
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationINTRODUCTION. Chapter GENERAL
Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which
More informationCOMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS
Annales Univ. Sci. Budapest., Sect. Comp. 43 (2014) 57 68 COMMUNITY SHELL S EFFECT ON THE DISINTEGRATION OF SOCIAL NETWORKS Imre Szücs (Budapest, Hungary) Attila Kiss (Budapest, Hungary) Dedicated to András
More informationGraph Data Management
Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationCS224W Final Report Emergence of Global Status Hierarchy in Social Networks
CS224W Final Report Emergence of Global Status Hierarchy in Social Networks Group 0: Yue Chen, Jia Ji, Yizheng Liao December 0, 202 Introduction Social network analysis provides insights into a wide range
More informationDistance Estimation for Very Large Networks using MapReduce and Network Structure Indices
Distance Estimation for Very Large Networks using MapReduce and Network Structure Indices ABSTRACT Hüseyin Oktay 1 University of Massachusetts hoktay@cs.umass.edu Ian Foster, University of Chicago foster@anl.gov
More informationSMCCSE: PaaS Platform for processing large amounts of social media
KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important
More informationPredict Topic Trend in Blogosphere
Predict Topic Trend in Blogosphere Jack Guo 05596882 jackguo@stanford.edu Abstract Graphical relationship among web pages has been used to rank their relative importance. In this paper, we introduce a
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationLink Prediction and Anomoly Detection
Graphs and Networks Lecture 23 Link Prediction and Anomoly Detection Daniel A. Spielman November 19, 2013 23.1 Disclaimer These notes are not necessarily an accurate representation of what happened in
More informationUsing PageRank in Feature Selection
Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy fienco,meo,bottag@di.unito.it Abstract. Feature selection is an important
More informationMeasurements on (Complete) Graphs: The Power of Wedge and Diamond Sampling
Measurements on (Complete) Graphs: The Power of Wedge and Diamond Sampling Tamara G. Kolda plus Grey Ballard, Todd Plantenga, Ali Pinar, C. Seshadhri Workshop on Incomplete Network Data Sandia National
More informationCS 224W Final Report Group 37
1 Introduction CS 224W Final Report Group 37 Aaron B. Adcock Milinda Lakkam Justin Meyer Much of the current research is being done on social networks, where the cost of an edge is almost nothing; the
More informationA Parallel Algorithm for Finding Sub-graph Isomorphism
CS420: Parallel Programming, Fall 2008 Final Project A Parallel Algorithm for Finding Sub-graph Isomorphism Ashish Sharma, Santosh Bahir, Sushant Narsale, Unmil Tambe Department of Computer Science, Johns
More informationCharacteristics of Preferentially Attached Network Grown from. Small World
Characteristics of Preferentially Attached Network Grown from Small World Seungyoung Lee Graduate School of Innovation and Technology Management, Korea Advanced Institute of Science and Technology, Daejeon
More informationSupervised classification of law area in the legal domain
AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms
More informationChapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han
Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application
More informationCS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS
CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network
More informationPredicting Messaging Response Time in a Long Distance Relationship
Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationAn Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based
An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science & Engineering University of Notre Dame UIUC - NSF Workshop
More informationarxiv: v1 [cs.si] 8 Jun 2012
Multi-Scale Link Prediction Donghyuk Shin Si Si Inderjit S. Dhillon arxiv:1206.1891v1 [cs.si] 8 Jun 2012 Abstract The automated analysis of social networks has become an important problem due to the proliferation
More informationGraph Structure Over Time
Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationOnline Social Networks and Media
Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ
More informationImplementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b
International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory
More informationLarge Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System
Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationA priority based dynamic bandwidth scheduling in SDN networks 1
Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems
More informationA Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop
A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,
More informationScienceDirect A NOVEL APPROACH FOR ANALYZING THE SOCIAL NETWORK
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 48 (2015 ) 686 691 International Conference on Intelligent Computing, Communication & Convergence (ICCC-2015) (ICCC-2014)
More informationCLASSIFICATION FOR SCALING METHODS IN DATA MINING
CLASSIFICATION FOR SCALING METHODS IN DATA MINING Eric Kyper, College of Business Administration, University of Rhode Island, Kingston, RI 02881 (401) 874-7563, ekyper@mail.uri.edu Lutz Hamel, Department
More informationA Parallel Community Detection Algorithm for Big Social Networks
A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic
More informationCS 345A Data Mining. MapReduce
CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes
More informationarxiv: v1 [cs.si] 8 Apr 2016
Leveraging Network Dynamics for Improved Link Prediction Alireza Hajibagheri 1, Gita Sukthankar 1, and Kiran Lakkaraju 2 1 University of Central Florida, Orlando, Florida 2 Sandia National Labs, Albuquerque,
More informationOptimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*
Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationCS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014
CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions
More informationTopic mash II: assortativity, resilience, link prediction CS224W
Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience
More informationAutomatic Scaling Iterative Computations. Aug. 7 th, 2012
Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics
More informationParallel computation performances of Serpent and Serpent 2 on KTH Parallel Dator Centrum
KTH ROYAL INSTITUTE OF TECHNOLOGY, SH2704, 9 MAY 2018 1 Parallel computation performances of Serpent and Serpent 2 on KTH Parallel Dator Centrum Belle Andrea, Pourcelot Gregoire Abstract The aim of this
More informationSocial & Information Network Analysis CS 224W
Social & Information Network Analysis CS 224W Final Report Alexandre Becker Jordane Giuly Sébastien Robaszkiewicz Stanford University December 2011 1 Introduction The microblogging service Twitter today
More informationOpen Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments
Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing
More informationEpilog: Further Topics
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Epilog: Further Topics Lecture: Prof. Dr. Thomas
More informationParallel Performance Studies for a Clustering Algorithm
Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,
More informationLink prediction in multiplex bibliographical networks
Int. J. Complex Systems in Science vol. 3(1) (2013), pp. 77 82 Link prediction in multiplex bibliographical networks Manisha Pujari 1, and Rushed Kanawati 1 1 Laboratoire d Informatique de Paris Nord (LIPN),
More informationNetwork embedding. Cheng Zheng
Network embedding Cheng Zheng Outline Problem definition Factorization based algorithms --- Laplacian Eigenmaps(NIPS, 2001) Random walk based algorithms ---DeepWalk(KDD, 2014), node2vec(kdd, 2016) Deep
More informationEfficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction
Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction Takuya Akiba U Tokyo Takanori Hayashi U Tokyo Nozomi Nori Kyoto
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationApache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM
Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value
More informationA Parallel Evolutionary Algorithm for Discovery of Decision Rules
A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must
More informationSurvey on MapReduce Scheduling Algorithms
Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationGraph Exploitation Testbed
Graph Exploitation Testbed Peter Jones and Eric Robinson Graph Exploitation Symposium April 18, 2012 This work was sponsored by the Office of Naval Research under Air Force Contract FA8721-05-C-0002. Opinions,
More informationRecommendation System for Location-based Social Network CS224W Project Report
Recommendation System for Location-based Social Network CS224W Project Report Group 42, Yiying Cheng, Yangru Fang, Yongqing Yuan 1 Introduction With the rapid development of mobile devices and wireless
More informationParallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem
I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **
More informationMitigating Data Skew Using Map Reduce Application
Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,
More informationThe Barnes-Hut Algorithm in MapReduce
The Barnes-Hut Algorithm in MapReduce Ross Adelman radelman@gmail.com 1. INTRODUCTION For my end-of-semester project, I implemented an N-body solver in MapReduce using Hadoop. The N-body problem is a classical
More informationDetecting and Analyzing Communities in Social Network Graphs for Targeted Marketing
Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,
More informationFault Identification from Web Log Files by Pattern Discovery
ABSTRACT International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 2 ISSN : 2456-3307 Fault Identification from Web Log Files
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationCOMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
International Journal of Computer Engineering and Applications, Volume IX, Issue VIII, Sep. 15 www.ijcea.com ISSN 2321-3469 COMPARATIVE ANALYSIS OF POWER METHOD AND GAUSS-SEIDEL METHOD IN PAGERANK COMPUTATION
More informationMMap: Fast Billion-Scale Graph Computation on a PC via Memory Mapping
2014 IEEE International Conference on Big Data : Fast Billion-Scale Graph Computation on a PC via Memory Mapping Zhiyuan Lin, Minsuk Kahng, Kaeser Md. Sabrin, Duen Horng (Polo) Chau Georgia Tech Atlanta,
More informationDelegated Access for Hadoop Clusters in the Cloud
Delegated Access for Hadoop Clusters in the Cloud David Nuñez, Isaac Agudo, and Javier Lopez Network, Information and Computer Security Laboratory (NICS Lab) Universidad de Málaga, Spain Email: dnunez@lcc.uma.es
More informationContent-based Modeling and Prediction of Information Dissemination
Content-based Modeling and Prediction of Information Dissemination Kathy Macropol Department of Computer Science University of California Santa Barbara, CA 9316 USA kpm@cs.ucsb.edu Ambuj Singh Department
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationThe Oracle Database Appliance I/O and Performance Architecture
Simple Reliable Affordable The Oracle Database Appliance I/O and Performance Architecture Tammy Bednar, Sr. Principal Product Manager, ODA 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.
More informationSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS Ren Wang, Andong Wang, Talat Iqbal Syed and Osmar R. Zaïane Department of Computing Science, University of Alberta, Canada ABSTRACT
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationCHAPTER 4 ROUND ROBIN PARTITIONING
79 CHAPTER 4 ROUND ROBIN PARTITIONING 4.1 INTRODUCTION The Hadoop Distributed File System (HDFS) is constructed to store immensely colossal data sets accurately and to send those data sets at huge bandwidth
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part
More informationAnalyzing Dshield Logs Using Fully Automatic Cross-Associations
Analyzing Dshield Logs Using Fully Automatic Cross-Associations Anh Le 1 1 Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697, USA anh.le@uci.edu
More information