Algorithms for Frequent Pattern Mining of Big Data

Similar documents
Concurrent Apriori Data Mining Algorithms

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

Load Balancing for Hex-Cell Interconnection Network

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Outline. CHARM: An Efficient Algorithm for Closed Itemset Mining. Introductions. Introductions

A Binarization Algorithm specialized on Document Images and Photos

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

A Combined Approach for Mining Fuzzy Frequent Itemset

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Cluster Analysis of Electrical Behavior

Association Rule Mining with Parallel Frequent Pattern Growth Algorithm on Hadoop

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Support Vector Machines

An Optimal Algorithm for Prufer Codes *

Problem Set 3 Solutions

Parallel matrix-vector multiplication

Mining Vehicles Frequently Appearing Together from Massive Passing Records

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Machine Learning: Algorithms and Applications

The Shortest Path of Touring Lines given in the Plane

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

A METHOD FOR FACTOR SCREENING OF SIMULATION EXPERIMENTS BASED ON ASSOCIATION RULE MINING

Related-Mode Attacks on CTR Encryption Mode

Hierarchical clustering for gene expression data analysis

Available online at Available online at Advanced in Control Engineering and Information Science

Wireless Sensor Networks Fault Identification Using Data Association

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

F Geometric Mean Graphs

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments

Load-Balanced Anycast Routing

ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Smoothing Spline ANOVA for variable screening

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

On Some Entertaining Applications of the Concept of Set in Computer Science Course

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

A User Selection Method in Advertising System

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

ASSOCIATION RULE MINING BASED ON IMAGE CONTENT

Fast Computation of Shortest Path for Visiting Segments in the Plane

Feature Selection as an Improving Step for Decision Tree Construction

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Simulation Based Analysis of FAST TCP using OMNET++

Partial Restreaming Approach for Massive Graph Partitioning.

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Unsupervised Learning and Clustering

Unsupervised Learning

Machine Learning. Topic 6: Clustering

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

CE 221 Data Structures and Algorithms

PARTITIONING POSITIONAL AND NORMAL SPACE FOR FAST OCCLUSION DETECTION

Keyword-based Document Clustering

Application of Clustering Algorithm in Big Data Sample Set Optimization

Programming in Fortran 90 : 2017/2018

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Cracking of the Merkle Hellman Cryptosystem Using Genetic Algorithm

Classifier Selection Based on Data Complexity Measures *

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

GIVEN two graphs G and H, the subgraph isomorphism

USING GRAPHING SKILLS

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Efficient Distributed File System (EDFS)

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Video Proxy System for a Large-scale VOD System (DINA)

Loop Transformations, Dependences, and Parallelization

CS 534: Computer Vision Model Fitting

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

METHODS FOR BATCH PROCESSING OF DATA MINING QUERIES

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

A Heuristic for Mining Association Rules In Polynomial Time

A Heuristic for Mining Association Rules In Polynomial Time*

Mathematics 256 a course in differential equations for engineering students

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Array transposition in CUDA shared memory

Security Vulnerabilities of an Enhanced Remote User Authentication Scheme

The Codesign Challenge

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Keywords: classifier, Association rules, data mining, healthcare, Associative Classifiers, CBA, CMAR, CPAR, MCAR

OPTIMAL CONFIGURATION FOR NODES IN MIXED CELLULAR AND MOBILE AD HOC NETWORK FOR INET

CS1100 Introduction to Programming

Transcription:

Research Inda Publcatons. http://www.rpublcaton.com Algorthms for Frequent Pattern Mnng of Bg Data Syed Zubar Ahmad Shah 1, Mohammad Amjad 1, Ahmad Al Habeeb 2, Mohd Huzafa Faruqu 1 and Mudasr Shaf 3 1 Department of Computer Engneerng, Jama Mlla Islama, New Delh, Inda. 2 Department of Informaton Technology & Cloud, Ercsson Inda Global Servces, Gurgaon, Haryana, Inda. 3 Department of Electroncs and Communcaton Engneerng, Maharsh Dayanand Unversty, Rohtak, Haryana, Inda. 1 Orcd ID: -3-1847-8216 Abstract Frequent pattern mnng s a feld of data mnng amed at unsheathng frequent patterns n data n order to deduce knowledge that may help n decson makng. Numerous algorthms for frequent pattern mnng have been developed durng the last two decades most of whch have been found to be non-scalable for Bg Data. For Bg Data some scalable dstrbuted algorthms have also been proposed. In ths paper, we analyze such algorthms n detal. We start wth age old algorthms lke Count Dstrbuton and move on to traverse other algorthms one by one and fnally to algorthms based on MapReduce programmng model. Keywords: Frequent Pattern Mnng, Dstrbuted Systems, Bg Data Analytcs INTRODUCTION Data mnng s the task of scannng large datasets wth the am to generate new nformaton or wth the am of knowledge dscovery. Broadly speakng, data mnng ncludes felds lke clusterng, frequent pattern mnng (FPM), classfcaton and outler detecton [1, 2, 3]. FPM s a popular data mnng approach [4] whch s used to fnd attrbutes that occur together frequently. Some of ts mportant applcatons are market basket analyss, share market analyss, drug dscovery, etc. [5]. Several FPM algorthms have been proposed n the last few decades such as Apror [6], Eclat [7], FP-Growth [8], etc. [9, 1]. Identfyng all frequent patterns s a complex task and s also computatonally costly due to a large number of canddate patterns and so most algorthms become neffcent when t comes to the knd of data we deal wth n ths age, the Bg Data. Socal networkng places lke Facebook produce over 3TB of data every sngle day [11]. Amazon, Walmart and other gants regster bllons of transactons every year. Gene sequencng platforms can generate terabytes of data. To cope up wth the problem of huge volumes of data, researchers proposed several parallel algorthms for the FPM problem. Ths paper revews some of the classcal as well as recent parallel FPM algorthms. Varous parallel FPM algorthms have been dscussed and analyzed n detal startng wth the classcal algorthms and then movng onto the state of the art algorthms. ALGORITHMS FOR PERFORMING FREQUENT PATTERN MINING ON A DISTRIBUTED ENVIRONMENT A number of algorthms have been proposed for effcent frequent pattern mnng on a dstrbuted system whch nclude Count Dstrbuton, Data Dstrbuton and Canddate Dstrbuton algorthms by Agrawal and Shafer [12], Parallel Eclat by Zak el. al. [13], Parallel FP-growth algorthm by L el. al. [14], Sngle Pass Countng, Fxed Passes Combnedcountng and Dynamc Passes Combned-countng algorthms by Ln et. al. [15] and Dstrbuted Eclat algorthm by Moens et. al. [16]. We dscuss a few of these heren. Count Dstrbuton Algorthm Notatons: k-temset Freq k Cand k Proc DS DSR Cand k Itemset wth k tems Set of frequent k-temsets Canddate temset of sze k Processor Dataset local to Proc Dataset local to Proc after re-partton Canddate set of Proc durng pass k The Count Dstrbuton (CD) algorthm tres to mnmze nteracton. It parttons the dataset n horzontal manner nto equal parts for the varous processors. Each processor autonomously runs the Apror algorthm and creates the hashtree on the locally stored transactons. After each teraton, global count of canddates s calculated by addng all counts 7355

Research Inda Publcatons. http://www.rpublcaton.com over all processors. Ths algorthm has a lmtaton that every processor has to keep the whole hash-tree n ts memory. Data dstrbuton algorthm has to be used f the whole hash-tree cannot be stored n the memory. Ths algorthm permts superfluous calculatons n parallel to prevent nteracton. In the frst pass, every processor produces ts local canddate temset Cand 1. For remanng passes, the algorthm s gven below: 1. Every processor Proc produces canddate set Cand k by makng use of the frequent temset Freq k-l that was produced n the prevous teraton. 2. Proc performs a pass on ts DS and to fnd local support counts for canddates n Cand k. 3. Proc synchronzes wth all other processors and broadcasts local Cand k counts to develop global Cand k counts. 4. Every processor Proc calculates Freq k from Cand k. usng the global support count. 5. If Freq k. s a null set then the algorthm termnates, or else t contnues. Every processor Proc wll ndepently make ths decson. Fgure 1 shows the speedup of CD algorthm (on datasets D1456K.T15.14 and D114K.T2.16) as the number of processors s ncreased. Response Tme(sec) 16 14 12 1 8 6 4 2 1 2 4 6 8 1 12 14 Number of processors Fgure 1: Speedup of Count Dstrbuton Algorthm wth ncrease n the number of processors Data Dstrbuton Algorthm D1456K.T15.14 D114K.T2.16 Ths algorthm uses memory of the system extra effcently. The temsets are parttoned and shared wth processors. Each processor must fnd the count of ts local subset of canddate temsets and broadcast local data to remanng processors. Parttonng and the dstrbuton of temsets may affect performance of the algorthm, so t s desrable to have balanced load. Pass 1: Ths step s same as the step for k=1 n CD algorthm. Pass k>1: 1. Processor Proc produces canddate temset Cand k from Freq k-1 but keeps only a subset Cand k whch has only 1 N temsets of Cand k. Unon of all subsets generated by every processor s Cand k and ther ntersecton s a. 2. Proc uses both local data and data receved from remanng processors to develop support counts for the temsets n ts Cand k. 3. Processor Proc calculates Cand k usng the local canddate set Cand k after gettng complete support count data. Agan, the ntersecton of all Cand k sets s and ther unon s Freq k. 4. All processors synchronze and exchange ther local Cand k wth other processors. Every processor has the complete Freq k and can decde whether to contnue the algorthm or to termnate t ndepently. Processors fnd the support count for local temsets n step 2 and broadcast or receve local data. Network congeston may become a problem n ths setup so steps must be taken to prevent that. Parallel Eclat In Parallel Eclat (Par-Eclat), data s n vertcal layout whch s done by transformng the horzontal data nto vertcal temset lst. The two phases n the Parallel Eclat are gven below: Intalzaton Phase: There are 3 sub-steps n the ntalzaton step: 1. The support count for 2-temsets are read after the preprocessng step has been done and all the frequent temsets havng count>= mnmum support are ntroduced nto Freq 2. (Freq k s the set of frequent k-temsets). 2. Any of the clusterng schemes between these two - the equvalence class or maxmal hypergraph clque clusterng - s appled to Freq 2 to produce the set of possble maxmal frequent temsets whch are then parttoned among all the processors to acheve load-balancng. 3. The database s reparttoned so that every processor obtans the td-lsts of all the 1-temsets n the cluster apponted to t. Asynchronous Phase: After the ntalzaton phase, the td-lsts are accessble on local storage on every processor, so every processor can use 7356

Research Inda Publcatons. http://www.rpublcaton.com ts assgned maxmal cluster to generate frequent temsets wthout synchronzng wth any other processor. A cluster s processed completely before the processor moves onto the subsequent cluster. Local database s examned one tme to complete ths step. Therefore, there s less I/O overhead. Snce a sublattce s nduced by each cluster, ether bottom-up traversal s used to produce all frequent temsets or hybrd traversal s used to produce the maxmal frequent temsets, depng on the algorthm. Intally only td-lst for 1-temset are avalable locally usng whch, the td-lsts for 2-temsets clusters can be generated. These clusters are generally not bg and so keepng the emergng td-lsts n memory creates no problem. 3-temsets are produced by ntersecton of the tdlsts for 2-temsets n the bottom-up approach. If the cardnalty of the emergng td-lst >= mnmum support, the new temset s added to Freq 3. Then the resultng frequent 3- temsets, Freq 3 are dvded nto sets of equvalence classes on the bass of common pre-fxes of sze 2. Intersecton of all pars of 3-temsets wthn an equvalence class s used to determne Freq 4. The process can be repeated tll all frequent temsets are found. After fndng Freq k Freq k-1 s deleted. Memory space s thus needed for the temsets n Freq k-1 only wthn one maxmal cluster. Ths algorthm s thus man memory space effcent. For top-down stage of the hybrdtraversal, memory needs to keep track of only the maxmal element seen so far, along wth the temsets not yet seen. Par-Eclat Algorthm 1. Create Freq 2 usng 2-temset support counts 2. Produce clusters from Freq 2 usng Equvalence Classes 3. Assgn clusters to processors 4. Scan local dataset part 5. Share approprate td-lsts wth remanng processors 6. Procure td-lsts from remanng processors 7. foreach assgned cluster C fnd frequent temsets usng Bottom-Up traversal or Hybrd traversal Fgure 2 compares the executon tme of Par-Eclat algorthm wth CD algorthm [12] (for T1.I4.D284K dataset). Par- Eclat clearly outperforms CD algorthm. Fgure 2: Comparson of Executon Tme of Par-Eclat and Count Dstrbuton Sngle Pass Countng It s the parallel mplementaton of Apror algorthm usng MapReduce. It makes Apror effcent for mnng assocaton rules from massve datasets. The man step n Apror algorthm s fndng the frequency of canddate temsets. Ths process of countng can be parallelzed. In each step, the mapper program emts <x, 1> for every canddate x that s there n the transacton. The reducer collects the key-value par emtted by the mapper and sums the value for each key, whch gves the total frequency of the canddate n the database. The reducer outputs all the canddates whch have enough support count. Sngle Pass Countng (SPC), at k-th pass of database scannng n a MapReduce phase, fnds out the frequent k-temsets. Map(key, value = temset n transacton t ): Input : Dataset D foreach transacton tr D do foreach tem j tr do Executon Tme (sec) 12 1 8 6 4 2 emt <j, 1> X1.Y1.Z1 X2.Y2.Z2 X2.Y1.Z2 T1.I4.D248K X1.Y4.Z4 X2.Y2.Z4 X4.Y1.Z4 X2.Y4.Z8 X4.Y2.Z8 X8.Y2.Z16 X= No. of Hosts, Y= No. of Processors, Z=H*P X8.Y4.Z32 CD Par-Eclat Reduce(key=tem, value = count): foreach key k do foreach value val n k do k.count+=val f k.count>=mn. supp. count 7357

Research Inda Publcatons. http://www.rpublcaton.com emt <k, k.count> Phase-1 of SPC algorthm Dstrbuted Eclat Dstrbuted Eclat (Dst-Eclat) algorthm s based on Eclat algorthm for mnng frequent temsets from large datasets n parallel usng MapReduce. Ths algorthm focuses on speed. Dst-Eclat s qute fast but t requres huge man memory as t stores the complete dataset n memory n vertcal format. In phase-2, the algorthm generates frequent 2-temset usng frequent temsets Freq 1 whch are located at the Dstrbuted Cache n Hadoop. Map(key, value = temset n transacton t ): Input : Dataset D and Freq k-1(k>=2). Fetch Freq k-1 from DstrbutedCache. Buld a hash tree for Cand k=gen-apror(freq k-1). foreach transacton tr D do Cand t = subset(cand k, tr ) foreach tem Cand t do emt <, 1> Reduce(key=tem, value = count): foreach key k do foreach value val n k do k.count+=val f k.count>=mnmum_support_count emt <k, k.count> Phase-2 of SPC algorthm Algorthm SPC: 1. Run Phase- 1 to fnd Freq 1 2. Run Phase-2 to fnd Freq 2 3. forall k>2 and Freq k-1 φ Map functon of phase-2 Reduce functon of phase-2 Fgure 3: Eclat on MapReduce Framework Dst-Eclat s 3-step method (fgure 3). Each step can be dstrbuted between mappers to maxmse effcency. 1. Fndng the Frequent Items: In ths step, the vertcal database s parttoned nto equal-szed sub-databases called shards and dstrbuted among several mappers. Mappers extract the frequent sngletons from ther shard and s them to reducer. All frequent tems are gathered n the reduce phase. 2. k-frequent Itemsets Generaton: The set of frequent k- temsets, Freq k, s generated. Frst, each mapper s assgned the combned form of local frequent tem sets. Mappers fnd the frequent supersets of sze k of the tems usng Eclat algorthm and runnng t up to level k. Then a reducer assgns the frequent temsets to a new set of mappers. Dstrbuton to mappers s performed by employng Round-Robn algorthm. 3. Subtree Mnng: Ths step uses Eclat algorthm to mne the prefx tree from the assgned subsets. Subtrees can be mned ndepently by mappers snce sub-trees do not need nformaton from other sub-trees. 7358

Research Inda Publcatons. http://www.rpublcaton.com Tme(sec) 12 1 8 6 4 2 Fgure 4: Comparson of Computaton Tme of Dst-Eclat and PFP Fgure 4 compares the runtme of Dst-Eclat wth Parallel FP- Growth (PFP) [14] over del.co.us dataset [17]. Mnmum support threshold s denoted by the x-axs and the y-axs denotes the tme taken by computaton n seconds. Dst-Eclat s faster of the two algorthms. CONCLUSION In ths paper, we traversed through many dstrbuted frequent pattern mnng algorthms n order to examne them and to present a pcture of how these algorthms grew over tme. We also presented a comparson of ther runtmes on dfferent datasets. We presented both types of algorthms those based on message passng and those based on MapReduce approaches. REFERENCES MnSup [1] Han, J., Pe, J., and Kamber, M., 211, Data Mnng: Concepts and Technques, Morgan Kaufmann Publshers, Calforna, USA. [2] Shah, S. Z. A., and Amjad, M., 216, Classcal and Relearnng based Clusterng Algorthms for Huge Data Warehouses, Proc. 2nd Internatonal Conference on Computatonal Intellgence & Communcaton Technology, Ghazabad, pp. 29-213. [3] Aggarwal, C. C., 215, Data Mnng, Sprnger Internatonal Publshng, Swtzerland. [4] Aggarwal, C. C., and Han, J., 214, Frequent Pattern Mnng, Sprnger Internatonal Publshng, Swtzerland. [5] Berry, M. J. A., and Lnoff, G. S., 24, Data Mnng Technques: For Marketng, Sales, and Customer Relatonshp Management, John Wley & Sons Inc., Indana, USA. [6] Agrawal, R., and Srkant, R., 1994, Fast Algorthms for Mnng Assocaton Rules, Proc. 2th Internatonal Conference on Very Large Data Bases, Santago, Chle, PFP Dst-Eclat 1215, pp. 487-499. [7] Han, J., Pe, J., and Yn, Y., 2, Mnng Frequent Patterns Wthout Canddate Generaton, Proc. ACM SIGMOD Internatonal Conference on Management of Data, Dallas, Texas, pp. 1-12. [8] Zak, M. J., 2, Scalable Algorthms for Assocaton Mnng, IEEE Transactons on Knowledge and Data Engneerng, 12(3), pp. 372-39. [9] Sucahyo, Y. G., and Gopalan, R. P, 24, CT-PRO: A Bottom Up Non Recursve Frequent Itemset Mnng Algorthm usng Compressed FP-Tree Data Structures, Proc. ICDM Workshop on Frequent Itemset Mnng Implementatons, Brghton. [1] Uno, T., Kyom, M., and Armura, H., 24, Effcent Mnng Algorthms for Frequent/Closed/Maxmal Itemsets, Proc. ICDM Workshop on Frequent Itemset Mnng Implementatons, Brghton. [11] Ln, J., and Ryaboy, D., 213, Scalng Bg Data Mnng Infrastructure: The Twtter Experence, ACM SIGKDD Exploratons Newsletter, 14(2), pp. 6-19. [12] Agrawal, R., and Shafer, J., 1996, Parallel Mnng of Assocaton Rules, IEEE Transactons on Knowledge and Data Engneerng, 8(6), pp. 962-969. [13] Zak, M. J., Parthasarathy, S., Oghara, M., and L, W., 1997, Parallel Algorthms for Dscovery of Assocaton Rules, Data Mnng and Knowledge Dscovery, 1(4), pp. 343-373. [14] L, H., Wang, Y., Zhang, D., Zhang, M., and Chang, E. Y., 28, Pfp: Parallel FP-growth for Query Recommaton, Proc. ACM Conference on Recommer Systems, Lausanne, pp. 17-114. [15] Ln, M. Y., Lee, P. Y., and Hsueh, S. C., 212, Aprorbased Frequent Itemset Mnng Algorthms on MapReduce, Proc. 6th Internatonal Conference on Ubqutous Informaton Management and Communcaton, Kuala Lumpur, Artcle no. 76. [16] Moens, S., Aksehrl, E., and Goethals, B., 213, Frequent Itemset Mnng for Bg Data, Proc. Internatonal Conference on Bg Data, Santa Clara, Calforna, pp. 111-118. [17] Wetzker, R., Zmmermann C., and Bauckhage, C., 28, Analyzng Socal Bookmarkng Systems: A del. co. us Cookbook, Proc. Mnng Socal Data Workshop, Patras, pp. 26-3. 7359